AnlightenDiff Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement (1)
AnlightenDiff Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement (1)
33, 2024
Abstract— Low-light image enhancement aims to improve the Low-light conditions introduce a range of complexities,
visual quality of images captured under poor illumination. including presents of image artifacts, low signal-to-noise ratio
However, enhancing low-light images often introduces image (SNR), and the need to balance camera settings, such as ISO,
artifacts, color bias, and low SNR. In this work, we pro-
pose AnlightenDiff, an anchoring diffusion model for low light aperture, and exposure time. While increasing ISO or exposure
image enhancement. Diffusion models can enhance the low light time can improve image brightness, these adjustments often
image to well-exposed image by iterative refinement, but require come at the cost of amplifying image artifacts, introducing blur
anchoring to ensure that enhanced results remain faithful to the due to camera shake, or overexposing certain areas. Conse-
input. We propose a Dynamical Regulated Diffusion Anchoring quently, these trade-offs have motivated researchers to develop
mechanism and Sampler to anchor the enhancement process.
We also propose a Diffusion Feature Perceptual Loss tailored for novel computational photography techniques for enhancing
diffusion based model to utilize different loss functions in image low-light images, encompassing illumination enhancement.
domain. AnlightenDiff demonstrates the effect of diffusion models Traditional approaches to low-light image enhancement
for low-light enhancement and achieving high perceptual quality have relied on techniques such as histogram equalization [1],
results. Our techniques show a promising future direction for [2], retinex-based methods [3], [4], [5], and dehazing the-
applying diffusion models to image enhancement.
ory [6]. These methods aim to improve the dynamic range,
Index Terms— Low light image enhancement, image process- separate illumination and reflectance components, or refine
ing, deep learning.
refraction maps to enhance the visibility of low-light images.
While these approaches have demonstrated some success, they
I. I NTRODUCTION often fall short in capturing the complex interplay of local and
global features present in images.
A DVANCEMENTS in imaging technology have made it
possible for people to capture and record memorable
moments in their lives with increased ease and convenience.
In recent years, researchers have been exploring diffusion
probabilistic models [7], [8], [9], [10], which are a class
However, one persistent challenge faced by both professional of generative models that can be used for image generation
and amateur photographers alike is the degradation of image and Image-to-Image synthesis. They model the process of
quality under low-light conditions. Images taken in such diffusion, where noise perturbation is gradually removed from
environments are often dim and noisy, making it difficult to the input signal over time through a diffusion process. These
recognize scenes or objects and compromising the overall models define a probability distribution over the clean signal
visual appeal. In this context, low-light image enhancement at different points in time, with the variance of the distribution
has become an area of significant interest, with researchers decreasing over time as the signal becomes less noisy. They
exploring various techniques to improve visibility and sup- are able to exploit the gradual reduction in noise perturbation
press image artifacts while addressing the inherent challenges to reconstruct fine details and textures.
associated with low-light imaging. Diffusion models have exhibited remarkable performance
across various tasks, including super-resolution [11], [12],
Received 5 August 2023; revised 29 May 2024 and 23 September 2024; inpainting [13], [14], [15], and low-light image enhancement
accepted 10 October 2024. Date of publication 31 October 2024; date of cur- (LLIE) [16], [17], [18]. The success of diffusion models can be
rent version 8 November 2024. This work was supported in part by St. Francis
University under Grant ISG200206; and in part by the University Grant attributed to their ability to capture the intricate distributions
Committee (UGC), Hong Kong, SAR, under Grant UGC/IDS(C)11/E01/20. of images and generate high-quality results, making them
The associate editor coordinating the review of this article and approving it for a promising approach for probabilistic generative modeling.
publication was Prof. Aline Roumy. (Corresponding author: Wan-Chi Siu.)
Cheuk-Yiu Chan and Wan-Chi Siu are with the Department of Electrical Although a limited number of prior works have investigated
and Electronic Engineering (EEE), The Hong Kong Polytechnic Univer- the application of diffusion models to LLIE, there remains
sity (PolyU), Hong Kong, and also with the School of Computing and substantial room for improvement by incorporating domain-
Information Sciences (SCIS), Saint Francis University (SFU), Hong Kong
(e-mail: [email protected]; [email protected]). specific knowledge. By leveraging the power of diffusion
Yuk-Hee Chan is with the Department of EEE, PolyU, Hong Kong (e-mail: models in conjunction with expertise in LLIE, researchers can
[email protected]). unlock new possibilities and push the boundaries of what can
H. Anthony Chan is with SCIS, SFU, Hong Kong (e-mail:
[email protected]). be achieved in this particular task, opening up exciting avenues
Digital Object Identifier 10.1109/TIP.2024.3486610 for further exploration in the field.
© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6325
Fig. 2. AnlightenDiff overview. AnlightenDiff consists of a Dynamical Regulated Diffusion Anchoring (DRDA) mechanism, Dynamical Regulated Diffusion
Sampler (DRDS) and Diffusion Feature Perceptual Loss (DFPL) design. DRDA √ anchors the diffusion process to the target distribution with domain knowledge
1− ᾱt
feature φ, which is computed by center encoder (see Fig. 3), by N (mt := √ φ, β̃t I) rather than the standard N (0, I) to conditional diffusion model’s
1−ᾱt
noise predictor ϵ θ (see Fig. 3). Collaboratively, DRDS utilizes anchor information in reverse diffusion. In addition, DFPL tailored for diffusion models, which
effectively processes perceptual features to calculate gradients for back-propagation and outperforms ℓ1 or ℓ2 loss.
expressed as a Markov chain: Utilizing the forward equation Eq. (3), the predicted mean
T µ̄θ (x t , t) is formulated to approximate the original data
x 0 according to:
Y
q(x 1:T | x 0 ) = q(x t | x t−1 ) (1)
βt
t=1
1
q(x t | x t−1 ) =
√
αt x t−1 + βt ϵ t
p µ̄θ (x t , t) = √ xt − √ ϵ θ (x t , t) (7)
(2) αt 1 − ᾱt
where x t denotes the data at time step t, and αt and βt By inserting Eq. (7) s.t. x 0 := µ̄θ (x t , t) to Eq. (5), we can
represent the noise perturbation schedule such that αt + βt = obtain the final reverse equation:
1 and ϵ t is the noise perturbation sampled from the standard
βt
normal distribution N (0, I) at time t. The forward process of 1
x t−1 = N √ xt − √ ϵ θ (x t , t) , β̃t I (8)
an arbitary t can be further simplified [8] as: αt (1 − ᾱt )
p p
q(x t | x 0 ) = ᾱt x 0 + 1 − ᾱt ϵ (3) By applying the reverse process, the diffusion model can
Qt recover the clean data x 0 from the pure Gaussian noise
where ᾱt = s=1 αs and ϵ ∼ N (0, I). Thus, the learning pro- x T ∼ N (0, I). The whole process can be optimized end-to-
cess can be formulated as a noise perturbation prediction task. end with neural networks that parameterize the forward and
Specifically, a noise predictor network ϵ θ (x t , t) is employed reverse chains.
to learn and to estimate the conditional probability pθ (x t |t−1 ), Compared to previous models that require a separate
which is used in the reverse diffusion process to reconstruct inference network [36], this learning process is more straight-
the clean data x 0 from x T by minimizing a noise perturbation forward and stable [7]. As a result, diffusion models have
prediction objective: achieved state-of-the-art results in various image generation
min Et,x 0 ,ϵ ∥ϵ − ϵ θ (x t , t)∥22 , where t ∼ U(1, T ) (4) tasks [9], [11] and generate high-quality and coherent samples
θ without the mode collapse issue.
The noise predictor network ϵ θ (x t , t) takes the noisy data x t To learn conditional diffusion models [7], [11], [41], the
and time step t as input, and predicts the noise perturbation conditional information c can be concatenated with the input
ϵ that is added to x t according to the forward process. for the noise prediction objective:
To invert the noise perturbation injection (forward) process
and reconstruct the image, referred to as the reverse process, min Et,x 0 ,c,ϵ ∥ϵ − ϵ θ (x t , t, c)∥22 , (9)
θ
the following reverse equation has been proposed in [8], [40]:
√ √ and the reverse equation is defined as:
αt (1 − ᾱt−1 ) ᾱt−1 βt
x t−1 = N xt + x 0 , β̃t I (5)
1
βt
(1 − ᾱt ) (1 − ᾱt ) x t−1 = N √ xt − √ ϵ θ (x t , t, c) , β̃t I
1 − ᾱt−1 αt (1 − ᾱt )
β̃t = βt (6)
1 − ᾱt (10)
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6327
appendix A accordingly: natural light distribution. This direct incorporation helps pre-
vent the introduction of color artifacts and noise, while
x t = ᾱt x 0 + 1 − ᾱt ϵ ⋆t
p p
(12) ensuring that enhancements preserve fine image details and
where ϵ ⋆t ∼ N (mt , β̃t I), maintain a realistic appearance. By embedding color informa-
√ tion as a constraint via the anchored x T , DRDA effectively
1 − ᾱt 1 − ᾱt−1 imposes domain-specific priors that lead to more realistic and
mt = √ φ and β̃t = βt . (13)
1 − ᾱt 1 − ᾱt high-quality image enhancements.
The two equations described above allow the diffusion
model progressively to map complex empirical distributions D. Architecture of AnlightenDiff
to simple parametric distribution with a flexible, learned mean Figure 3 illustrates the architecture of AnlightenDiff.
vector that incorporates prior knowledge. As the training phase As determining a suitable representative feature for the per-
of AnlightenDiff with DRDA, as illustrated in Figure Fig. 2, turbation is challenging, we utilize a trainable center encoder
employs two distinct strategies, the training from scratch network φ e to obtain the non-zero mean perturbation vector
and the two-step training approach will be elucidated in φ. In this work, we provide φ e with the low-light input image
Section IV-B and IV-C respectively. x L and multiple illumination-invariant components, including:
• histogram equalized image h(x L ),
C. Anchoring Mechanism in AnlightenDiff and LLIE • channel weighted mapped image c(x L ) to normalize or
The Dynamical Regulated Diffusion Anchoring (DRDA) weight the contribution of a specific color channel based
mechanism in AnlightenDiff significantly enhances Low-Light on the overall brightness or intensity of the pixel, and
Image Enhancement (LLIE) performance by imposing • the maximum gradient map g(x L ) that considers high
task-specific constraints on the diffusion process. DRDA incor- frequency components in the image.
porates domain knowledge through a designed mean vector φ The channel weighted map c(x L ) is defined as:
in the noise perturbation ϵ ⋆t , encoding pixel-level enhancement xi, j
c(xi, j ) = (14)
information. By introducing a new initial noise perturbation (Ri, j + G i, j + Bi, j )/3
x T that includes a color map (see Section IV-C), DRDA
embeds domain-specific priors directly into the diffusion tra- where the variables Ri, j , G i, j , and Bi, j represent the red,
jectory. This color information acts as a constraint that guides green, and blue channel values, respectively, for the pixel at
the generative process, ensuring that the enhanced images row i and column j in the image.
maintain accurate color representations and realistic lighting Similarly, the maximum gradient map is defined as:
adjustments essential for high-quality LLIE. g(xi, j ) = max ∇x c(xi, j ) , ∇ y c(xi, j )
(15)
Unlike other diffusion-based approaches such as RetinexD-
iff [16], which utilizes a dual DDPM setup to separately where ∇x , and ∇ y are the image gradients in horizontal and
enhance reflectance and illumination maps, AnlightenDiff vertical direction. Therefore, the perturbations in this work is
employs the DRDA mechanism to integrate color information computed by a trainable encoder network φ e as:
directly into the diffusion trajectory. By embedding the color
φ = φ e (x L , h(x L ), c(x L ), g(x L )) (16)
map within the diffusion process, DRDA provides more direct
and efficient control over the enhancement process, ensuring When selecting these components, we strike a balance
that color accuracy and realistic lighting adjustments are between their computational efficiency and their ability to
consistently maintained throughout the generation steps. This represent important aspects in LLIE. By using simple mathe-
direct incorporation of a color map as a domain-specific prior matical equations, we ensure that the components are easy
allows AnlightenDiff to produce superior performance and to process and formulate, freeing up computing power for
more realistic outcomes compared to methods that handle model training and making them efficient to implement within
different aspects of image enhancement independently. the proposed framework. During forward propagation, the
input features x L , h(x L ), c(x L ), g(x L ) are concatenated into
Fig. 12 demonstrates DRDA’s effectiveness by comparing
the initial noise perturbation x T and the resulting enhanced a 12-dimensional vector. This concatenated input is passed
pr ed
image x H with and without anchoring. The results clearly through a U-shaped convolutional neural network architecture
show that DRDA achieves superior preservation of image for further processing.
details and color mapping, significantly improving lighting Each 2D convolutional block consists of a 2D convolutional
and details, while enhancement without anchoring produces layer followed by a Mish activation function [42] to introduce
less detailed results with limited color information. This non-linearity. The 2D convolutional layers extract salient fea-
comparison underscores how DRDA guides the diffusion pro- tures from the input while the residual connections facilitate
cess towards realistic enhancements by maintaining a strong efficient training of deep networks. Two such 2D convolutional
connection to injected pixel-level color constraints in noise blocks with a skip connection [43] constitute a residual block.
perturbation xt . Similarly, two residual blocks with a downsampling layer form
The rationale behind DRDA’s effectiveness is its integra- a level in the U-shaped network. The downsampling layers
tion of color maps as domain-specific priors, which guide are 2D convolutional layers with stride 2. Analogous to the
the diffusion process to accurately adjust color balance and U-Net [44], the U-shaped network has 3 levels. Finally, the
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6329
Fig. 3. The architecture of AnlightenDiff conditional diffusion noise predictor ϵ θ and center encoder φ e . The notation c, 2c and 4c after the block name
means the channel size of each block w.r.t c. “Conv Block”, “Res Block”, “Downsample” and “Upsample” denote 2D-Convolution block, residual block,
downsampling layer and upsampling layer respectively.
features are passed through a final convolutional block to geometry of the data distribution. We thus propose the
generate the output φ. reverse process in Eq. (17) to (19) and appendix appendix B
The center output φ is used to compute the dynamically accordingly:
regulated mean vector mt in Eq. (13). The mean vector mt √ √
αt (1 − ᾱt−1 ) ᾱt−1 βt ⋆
then allows calculation of the anchoring noise perturbation x t−1 = N xt + µθ (x t , t), β̃t I
(1 − ᾱt ) (1 − ᾱt )
ϵ ⋆t and the input x t using Eq. (12) and 11 respectively. The
input x t and conditional information c := x L are concatenated (17)
βt
and passed through the conditional diffusion model’s noise 1
µ⋆θ (x t , t) = √ xt − √ ϵ θ (x t , t) + φ̃ (18)
predictor. The noise predictor has a similar architecture to the αt 1 − ᾱt
√ √
center encoder described previously. It is trained to predict 1 − ᾱt + ᾱt−1 (αt − 1) + αt (ᾱt−1 − 1)
the anchoring noise perturbation ϵ ⋆t added to x t , denoted φ̃ = φ
1 − ᾱt
as the predicted noise perturbation ϵ θt .
(19)
The inference phase utilizes the proposed equations to
E. Dynamical Regulated Diffusion Sampler (DRDS) iteratively denoise the input image by incorporating prior
The diffusion model builds a link between the empirical knowledge through the non-zero mean vector φ, as illustrated
data distribution and the simpler parametric distribution by in Figure 2. At each timestep, the equations are applied to pro-
progressively adding noise perturbations at each iteration in gressively refine the estimate. Figure 4 depicts the intermediate
the forward process and progressively removing noise per- denoising results obtained using the proposed DRDS. Further
turbations at each iteration in the reverse process. At each details on the inference procedure and the reverse diffusion
iteration, the diffusion model, based on ϵ θ (x t , t), samples the process can be found in Section IV-D and Algorithm 3
previous image x t−1 conditioned on the current image x t . respectively. Notably, setting φ = 0 reduce the equation to
In reverse process, the generated samples exhibit progres- the same form as DDPM [8].
sive improvements in quality, ultimately getting closer to Compared to the original diffusion model, the DRDS
the ground truth. As shown in Fig. 2, as more iterations has two key benefits. Domain expertise can be incorporated
are performed, the generated samples become progressively to inform the generative process, providing guidance for
refined, achieving enhanced quality, thereby approaching the enhanced model performance. For instance, in the context of
empirical data distribution. image generation, the incorporation of domain knowledge such
Many properties of the diffusion model also apply to the as segmentation maps enables the synthesis of perceptually
proposed Dynamical Regulated Diffusion Sampler (DRDS). realistic samples. By leveraging information that constrains the
The DRDS introduces the non-zero mean vector φ to effec- output space to semantically and structurally coherent images,
tively incorporate prior knowledge and better match the the model is able to generate higher-fidelity samples that more
6330 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024
Algorithm 1 Training From Scratch (With Pretrained φ e ) Algorithm 2 Training of Center Encoder φ e in Two-Step (TS)
Training
Algorithm 3 Inference
TABLE I
Q UANTITATIVE F ULL -R EFERENCE C OMPARISON ON LOL [5], VELOL [47] AND LOLV 2 [48] DATASETS IN T ERMS OF PSNR, SSIM [50], AND
LPIPS [19]. ↑ (↓) D ENOTES T HAT, L ARGER (S MALLER ) VALUES L EAD TO B ETTER Q UALITY.
(RED: BEST; BLUE: THE 2nd BEST, PURPLE: THE 3r d BEST)
TABLE II
Q UANTITATIVE N ON -R EFERENCE C OMPARISON ON LIME [20], NPE [21] AND VV DATASETS IN T ERMS OF H YPER IQA [52], NIMA [53] AND
TR E S [54]. ↑ (↓) D ENOTES T HAT, L ARGER (S MALLER ) VALUES L EAD TO B ETTER Q UALITY.(BOLD R EPRESENTS THE BEST)
and TReS, and TorchMetrics [57] for the LPIPS, ensur- highly competitive performance in PSNR, closely following
ing a fair and standardized comparison with state-of-the-art the top-performing DLN [31]. Although the PSNR results
approaches. Notably, we compare our method with other of our method are slightly lower compared to one or two
generative low-light image enhancement models, including other approaches, the difference is expected as PSNR depends
EnlightenGAN [36] which applies a Generative Adversarial strongly on luminance changes for which perception can
Network (GAN) architecture, and GDP [18] which employs vary subjectively between individuals. SSIM and LPIPS are
a diffusion model. Results of the comparison show that our more perceptually-driven metrics better reflecting visual qual-
approach is superior. ity perception. Our superior SSIM and LPIPS demonstrate
For the FR evaluation, we compare model performances compelling enhanced images with preserved details.
on the LOL [5], VE-LOL (Real) [47], and LOLv2 (Real) When compared to other generative models, our method
[48] datasets (Table I), where VE-LOL and LOLv2 share the significantly outperforms both EnlightenGAN [36], which
same testing dataset. Our method consistently achieves state- employs a GAN-based architecture, and GDP [18], another
of-the-art performance across all datasets, surpassing both diffusion-based model, by a substantial margin in all met-
traditional and deep learning-based approaches. On the LOL rics. Similarly, on the VE-LOL/LOLv2 (Real) dataset, our
dataset, our two-step training approach yields the best results two-step training approach demonstrates superior performance
in SSIM and LPIPS among all models, while maintaining across almost all metrics, achieving the highest PSNR and
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6333
Fig. 6. Visual comparison of 55.png on LOL dataset [5], where FS and TS stand for “from scratch” and “two step” respectively.
Fig. 7. Visual comparison of 23.png on LOL dataset [5], where FS and TS stand for “from scratch” and “two step” respectively.
SSIM among all models, and a very competitive LPIPS while remaining competitive in TReS. These results highlight
score slightly behind KinD [26]. Compared to EnlightenGAN our model’s ability to generate visually appealing enhanced
and GDP, our method showcases a significant improvement images with better perceptual quality, aesthetics, and overall
in all metrics, further validating the effectiveness of our image quality in this challenging zero-shot setting, validating
diffusion-based approach in enhancing low-light images across its strong generalization capability and effectiveness in pro-
different datasets. Moreover, even our from-scratch model ducing high-quality enhanced images that align with human
surpasses both EnlightenGAN and GDP by a considerable perception and aesthetic preferences.
margin, highlighting the robustness and generalizability of our
method. These results demonstrate the state-of-the-art perfor- B. Qualitative Results
mance of our diffusion-based generative model in low-light This section presents a visual comparison of various
image enhancement, showcasing its superiority over exist- low-light image enhancement methods on the LOL and VE-
ing approaches, including both non-generative and generative LOL/LOLv2 (Real) datasets. As observed in Fig. 6 to 11,
models. The substantial improvements over other generative our proposed method, AnlightenDiff, significantly enhances
models, particularly GDP, which is also a diffusion-based the brightness and details of the input low-light images while
model, underscore the effectiveness of our proposed work. maintaining a natural appearance and preserving the original
For the NR evaluation, we also make use of the most color scheme. In contrast, other methods suffer from various
challenging datasets, including LIME [5], NPE [47], and issues, such as insufficient brightness enhancement, loss of
VV (Table II). These datasets only provide low-light images details, or unnatural color shifts.
without their normal-light counterparts, making it impossible Among the compared methods, KinD [26] and DLN [31]
to train a model directly on them. As a result, the evaluation produce relatively better results, but they still introduce some
on these datasets is inherently zero-shot, requiring the use of color distortions and fail to restore some details. Enlighten-
pre-trained models without any further fine-tuning. As shown GAN [36], a generative adversarial network-based method,
in Table II, our approach consistently achieves the best results improves the brightness but generates unnatural artifacts and
across all datasets, outperforming both traditional and deep color deviations. GDP [18], another diffusion-based generative
learning-based approaches, as well as other generative mod- model, enhances the overall brightness but introduces an
els such as EnlightenGAN or GDP. Our model attains the unnatural yellowish tint and fails to restore fine details. Other
highest scores in all three NR metrics (HyperIQA, NIMA, methods, such as RUAS [28], SCI [29], Zero-DCE [35],
and TReS) on the LIME and NPE datasets, and the best RetinexNet [5], and SGZ [51], also exhibit various limitations
performance in HyperIQA and NIMA on the VV dataset, in their enhanced results.
6334 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024
Fig. 8. Visual comparison of low00702.png on VE-LOL/LOLv2 (Real) dataset [47], [48], where FS and TS stand for “from scratch” and “two step”
respectively.
Fig. 9. Visual comparison of low00716.png on VE-LOL/LOLv2 (Real) dataset [47], [48], where FS and TS stand for “from scratch” and “two step”
respectively.
Fig. 10. Enlarged Visual comparison of 111.png on LOL dataset [5], where FS and TS stand for “from scratch” and “two step” respectively.
Fig. 11. Enlarged Visual comparison of low00706.png on VE-LOL/LOLv2 (Real) dataset [47], [48], where FS and TS stand for “from scratch” and “two
step” respectively.
difficult to optimize. As a result, optimizing the residual Dynamical Regulated Diffusion Sampler (DRDS): The
is much easier than the original direct learning problem, model without DRDS (denoted as “Ours w/o DRDS” in
allowing our residual learning approach to achieve superior Table IV) achieves a PSNR of 13.145 dB, SSIM of 0.411, and
performance. LPIPS of 0.434. By incorporating the DRDS module (denoted
as “Ours”), the model gains significant improvements, with
B. Effect of Dynamical Regulated Diffusion Anchoring PSNR increasing to 21.726 dB (an improvement of 8.581 dB),
(DRDA) and Sampler (DRDS) SSIM increasing to 0.814 (an increase of 0.403), and LPIPS
To validate the effectiveness of the diffusion modules decreasing to 0.141 (a decrease of 0.293).
(DRDA and DRDS) in our proposed model, we conducted an Joint Effect: When the model is trained without the DRDA
ablation study by removing each diffusion module separately and DRDS modules, it applies the forward and reverse diffu-
and jointly. sion processes of DDPM [8] without the support of the center
Dynamical Regulated Diffusion Anchoring (DRDA): The feature. The absence of these modules (denoted as “Ours w/o
model without DRDA (denoted as “Ours w/o DRDA” in DRDA & DRDS” in Table IV) results in a PSNR of 16.602 dB,
Table IV) achieves a PSNR of 8.143 dB, SSIM of 0.289, an SSIM of 0.726, and an LPIPS of 0.254. In comparison
and LPIPS of 0.609. By incorporating the proposed DRDA with Fig. 12, the full proposed model achieves substantial
module (denoted as “Ours”), the model achieves significant performance improvements, with the PSNR increasing to
performance gains, improving PSNR to 21.726 dB (an increase 21.726 dB (a gain of 5.124 dB), the SSIM increasing to 0.814
of 13.583 dB), SSIM to 0.814 (an increase of 0.525), and (an improvement of 0.088), and the LPIPS decreasing to 0.141
reducing LPIPS to 0.141 (a decrease of 0.468). (a reduction of 0.113).
6336 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024
TABLE IV
A BLATION S TUDY FOR DRDA AND DRDS
TABLE VI
A BLATION S TUDY FOR C ENTER E NCODER φ e
Fig. 12. Comparison between our method without DRDA & DRDS and
our proposed method with anchoring. (a) and (c) show x T , the initial noise
pr ed
perturbation. (b) and (d) show x H , the enhanced image. With anchoring
(DRDA) via x T , our proposed method (right) demonstrates superior preserva-
tion of image details and color mapping, achieving significant improvement in
lighting and detail. In contrast, the enhanced image without anchoring (left)
produces a less detailed result with limited color information, tending towards
a white filter effect. individually removing the histogram equalized image h(x L ),
channel weighted mapped image c(x L ), and maximum gradi-
TABLE V ent map g(x L ). As shown in Fig. 13 and Table VI, removing
A BLATION S TUDY FOR DFPL any of these components leads to a noticeable degradation
in the enhanced image quality and a decrease in PSNR,
SSIM, and LPIPS scores. The absence of h(x L ) results in
a loss of contrast and brightness balance, removing c(x L )
causes color distortions and an unnatural appearance, and the
lack of g(x L ) leads to a loss of fine details and textures.
These findings emphasize the importance of each illumination
invariant feature in enabling the center encoder to extract
These ablation studies clearly demonstrate the synergistic robust center, which is invariant to changes in illumination,
effects of the DRDA and DRDS modules, both individually resulting in high-quality enhanced images with well-preserved
and jointly. The proposed full model achieved a significant details, natural colors, and balanced brightness.
performance gains over the model without these modules,
affirming that the DRDA and DRDS modules have VII. C ONCLUSION
complementary advantages for denoising that are enhanced In conclusion, AnlightenDiff leverages Dynamical Regu-
when used together. lated Diffusion Anchoring and Sampling to incorporate prior
knowledge and to match the data distribution. The proposed
C. Effectiveness of Diffusion Feature Perceptual Loss (DFPL) Diffusion Feature Perceptual Loss further improves perceptual
We have evaluated the effectiveness of our proposed dif- quality. Experimental results demonstrate state-of-the-art per-
fusion feature perceptual loss (DFPL) by comparing against formance on perceptual metrics, producing enhanced images
two common losses: ℓ1 and ℓ2 . As shown in Table V, models aligning with human perception. AnlightenDiff shows the
trained with either ℓ1 or ℓ2 loss obtain inferior performance potential of anchoring diffusion models for low light enhance-
compared to our model trained with DFPL loss. Specifically, ment through high perceptual quality results matching human
the DFPL loss leads to improvements of 2.565 dB and perception. This provides a promising direction for applying
3.176 dB in PSNR, 0.139 and 0.131 in SSIM and 0.268 and diffusion models to image enhancement. Future work will
0.244 in LPIPS over the ℓ1 and ℓ2 respectively. explore anchoring for other tasks like super resolution. Code
The considerable improvements validate the efficacy of is available at https://github.com/allanchan339/AnlightenDiff.
DFPL for enhancing perceptual quality and global consistency
of reconstructed images. DFPL effectively preserves the image A PPENDIX A
structural similarity and perceptual information, thus achieving D ERIVATION OF THE DRDA
superior performance compared to the baselines.
Given x 0 and a mean vector φ, inductively we define two
sequences
D. Effect on Illumination Invariant Feature on Center
√ p
Encoder x t = αt x t−1 + 1 − αt ϵ t ; ϵ t ∼ N (µt , I ) (A.1)
√
An ablation study was conducted to evaluate the impact of 1 1 − αt
µt = √ φ (A.2)
illumination invariant features on the center encoder φ e by 1 − αt
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6337
D√ √ √
√ √
and via solving Eq. (A.1) we obtain a closed form αt x t − αt (1− αt )φ ᾱt−1 x 0 + 1− ᾱt−1 φ E
− 1−αt + 1−ᾱt−1 , x t−1
t
s
p X ᾱt p + const.
x t = ᾱt x 0 + 1 − αjϵ j (A.3)
ᾱ j
j=1 Since p (x t−1 | x t , x 0 ) ∝ p (x t | x t−1 , x 0 ) p (x t−1 | x 0 ),
where ϵ j ∼ N (µ j , I ) is a random perturbation. Taking we compare the above equation with
expectation conditional on x 0 , we have 1 1
⋆
µt
⋆ 2
t
s √ x t−1 − µt = 2
∥x t−1 ∥ − , x t−1 + const.
p X ᾱt p 1 − αj 2β̃t 2β̃t β̃t
E [x t | x 0 ] = ᾱt x 0 + 1 − αj · p φ
ᾱ j 1 − αj Therefore, we obtain
j=1
t r
αt αt (1−αt )(1−ᾱt−1 )
β̃t (x t , x 0 ) =
p X r
= ᾱt x 0 + − φ αt (1−ᾱt−1 )+(1−αt )
ᾱ j α j−1 (1−αt )(1−ᾱt−1 )
= ( 1−ᾱt−1
1−ᾱ )
p
j=1
p = αt −ᾱt +1−αt t
βt
= ᾱt x 0 + 1 − ᾱt φ → φ as t → +∞
as in Eq. (6), and
µ⋆t (x t , x 0 )
Moreover, by the law of total variance, we have √ √ √
√ √
αt x t − αt (1− αt )φ ᾱt−1 x 0 + 1− ᾱt−1 φ
t = + 1−ᾱt−1 β̃t
ᾱt ᾱt 1−αt
X
Var (x t | x 0 ) = − I = (1 − ᾱt ) I (A.4) √
ᾱ βt
√
αt (1−ᾱt−1 )
ᾱ j ᾱ j−1 = 1−t−1
j=1 ᾱt x 0 + 1−ᾱt xt
√ √ √
1− ᾱt−1 (1−αt )− αt (1− αt )(1−ᾱt−1 )
√
1− ᾱt
Let us denote mt := √
1−ᾱt
φ and define a sequence of + 1−ᾱt φ
random perturbation by √ √
ᾱ βt αt (1−ᾱt−1 )
√ = 1−t−1 ᾱt x 0 + 1−ᾱt xt
x t − ᾱt x 0 √
ϵ ⋆t := √ (A.5) √
1−ᾱt + ᾱt−1 (αt −1)+ αt (ᾱt−1 −1)
1 − ᾱt + 1−ᾱt φ (B.1)
From the above, we can see that ϵ ⋆t is normally distributed By letting x 0 := µ̄θ (x t , t) in Eq. (7), we have
where
√ √ µ⋆t x t , µ̄θ (x t , t)
E[x t | x 0 ] − ᾱt x 0 1 − ᾱt
⋆
E[ϵ t ] = E √ =E √ φ = mt 1
βt
1 − ᾱt 1 − ᾱt =√ xt − √ ϵ θ (x t , t)
αt 1 − ᾱt
and √ √
1 − ᾱt + ᾱt−1 (αt − 1) + αt (ᾱt−1 − 1)
1 + φ
Var ϵ ⋆t = Var (x t | x 0 ) = I. 1 − ᾱt
1 − ᾱt
as in Eq. (18) and (19).
That is to say, this means ϵ ⋆t ∼ N (mt , I )
A PPENDIX B R EFERENCES
D ERIVATION OF THE DRDS [1] K. Singh, R. Kapoor, and S. K. Sinha, “Enhancement of low exposure
images via recursive histogram equalization algorithms,” Optik, vol. 126,
Now let us discuss the reverse process of our proposed no. 20, pp. 2619–2625, Oct. 2015.
AnlightenDiff, for which we call DRDS. According to Bayes [2] Q. Wang and R. Ward, “Fast image/video contrast enhancement based
Theorem, the conditional distribution of x t−1 given x t and on weighted thresholded histogram equalization,” IEEE Trans. Consum.
Electron., vol. 53, no. 2, pp. 757–764, May 2007.
x 0 is given by
[3] E. H. Land and J. J. McCann, “Lightness and retinex theory,”
p (x t | x t−1 , x 0 ) p (x t−1 | x 0 ) J. Org. Soc. Amer., vol. 61, no. 1, pp. 1–11, Jan. 1971, doi:
p (x t−1 | x t , x 0 ) = 10.1364/JOSA.61.000001.
p (x t | x 0 ) [4] Z. Rahman, D. J. Jobson, and G. A. Woodell, “Multi-scale retinex for
color image enhancement,” in Proc. 3rd IEEE Int. Conf. Image Process.,
Since p (x t | x t−1 , x 0 ) and p (x t−1 | x 0 ) are both density vol. 3, Sep. 1996, pp. 1003–1006.
functions of Gaussian distributions, x t−1 | x t , x 0 is also [5] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition for
normally distributed. Thus, we can let µ⋆t := µ⋆t (x t , x 0 ) and low-light enhancement,” presented at the Brit. Mach. Vis. Conf., 2018.
β̃t := β̃t (x t , x 0 ) be functions such that x t−1 | x t , x 0 ∼
[6] Q. Tang, J. Yang, X. He, W. Jia, Q. Zhang, and H. Liu, “Nighttime
⋆
image dehazing based on Retinex and dark channel prior using Taylor
N µt (x t , x 0 ) , β̃t (x t , x 0 ) I . series expansion,” Comput. Vis. Image Understand., vol. 202, Jan. 2021,
From p (x t | x t−1 , x 0 ) p (x t−1 | x 0 ), we consider Art. no. 103086, doi: 10.1016/j.cviu.2020.103086.
[7] P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image
√ √ 2
2(1−αt ) x t − αt x t−1 − 1 − αt φ
1 synthesis,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021,
pp. 8780–8794.
√ √ 2
+ 2 1−1ᾱ x − ᾱt−1 x 0 − 1 − ᾱt−1 φ [8] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”
( t−1 ) t−1 in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 6840–6851.
α (1−ᾱt−1 )+(1−αt ) [9] A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilis-
= t2(1−α ∥x t−1 ∥2
t )(1−ᾱt−1 ) tic models,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 8162–8171.
6338 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024
[10] C.-Y. Chan, W.-C. Siu, Y.-H. Chan, and H. A. Chan, “Generative strategy [31] L.-W. Wang, Z.-S. Liu, W.-C. Siu, and D. P. K. Lun, “Lightening
for low and normal light image pairs with enhanced statistical fidelity,” network for low-light image enhancement,” IEEE Trans. Image Process.,
in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), vol. 33, Jan. 2024, vol. 29, pp. 7984–7996, 2020.
pp. 1–3, doi: 10.1109/ICCE59016.2024.10444437. [32] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection
[11] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, networks for super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis.
“High-resolution image synthesis with latent diffusion models,” in Proc. Pattern Recognit., Jun. 2018, pp. 1664–1673.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, [33] Z.-S. Liu, L.-W. Wang, C.-T. Li, and W.-C. Siu, “Hierarchical back
pp. 10674–10685. projection network for image super-resolution,” in Proc. IEEE/CVF
[12] C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019,
“Image super-resolution via iterative refinement,” IEEE Trans. Pattern pp. 2041–2050.
Anal. Mach. Intell., vol. 45, no. 4, pp. 4713–4726, Apr. 2023, doi: [34] Z.-S. Liu, L.-W. Wang, C.-T. Li, W.-C. Siu, and Y.-L. Chan, “Image
10.1109/TPAMI.2022.3204461. super-resolution via attention based back projection networks,” in Proc.
[13] C.-C. Hui, W.-C. Siu, N.-F. Law, and H. A. Chan, “Intelligent painter: IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019,
New masking strategy and self-referencing with resampling,” in Proc. pp. 3517–3525, doi: 10.1109/ICCVW.2019.00436.
24th Int. Conf. Digit. Signal Process. (DSP), Jun. 2023, pp. 1–5, doi: [35] C. Guo et al., “Zero-reference deep curve estimation for low-light image
10.1109/DSP58604.2023.10167925. enhancement,” in Proc. CVPR, Jun. 2020, pp. 1777–1786.
[14] B. Xia et al., “DiffIR: Efficient diffusion model for image restora- [36] Y. Jiang et al., “EnlightenGAN: Deep light enhancement with-
tion,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, out paired supervision,” IEEE Trans. Image Process., vol. 30,
pp. 13049–13059, doi: 10.1109/ICCV51070.2023.01204. pp. 2340–2349, 2021.
[15] Y. Zhu et al., “Denoising diffusion models for plug-and-play
[37] J. Cai, S. Gu, and L. Zhang, “Learning a deep single image
image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pat-
contrast enhancer from multi-exposure images,” IEEE Trans.
tern Recognit. Workshops (CVPRW), Jun. 2023, pp. 1219–1229, doi:
Image Process., vol. 27, no. 4, pp. 2049–2062, Apr. 2018, doi:
10.1109/cvprw59228.2023.00129.
10.1109/TIP.2018.2794218.
[16] X. Yi, H. Xu, H. Zhang, L. Tang, and J. Ma, “Diff-retinex: Rethink- [38] R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng, and J. Jia,
ing low-light image enhancement with a generative diffusion model,” “Underexposed photo enhancement using deep illumination estimation,”
in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
pp. 12268–12277, doi: 10.1109/ICCV51070.2023.01130. Jun. 2019, pp. 6842–6850, doi: 10.1109/CVPR.2019.00701.
[17] J. Hou, Z. Zhu, J. Hou, H. Liu, H. Zeng, and H. Yuan, “Global [39] H. Li et al., “SRDiff: Single image super-resolution with diffusion
structure-aware diffusion process for low-light image enhancement,” in probabilistic models,” Neurocomputing, vol. 479, pp. 47–59, Mar. 2022.
Proc. Adv. Neural Inf. Process. Syst., vol. 36, A. Oh, T. Naumann,
A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds. Red Hook, [40] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”
NY, USA: Curran Associates, pp. 79734–79747. 2020, arXiv:2010.02502.
[41] J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in Proc.
[18] B. Fei et al., “Generative diffusion prior for unified image restoration and
NeurIPS Workshop Deep Generative Models Downstream Appl.,
enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
Dec. 2021, pp. 1–8.
(CVPR), Jun. 2023, pp. 9935–9946.
[19] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unrea- [42] D. Misra, “Mish: A self regularized non-monotonic activation function,”
sonable effectiveness of deep features as a perceptual metric,” in in Proc. 31st Brit. Mach. Vis. Conf., 2020, pp. 1–14.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, [43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
pp. 586–595. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
[20] X. Guo, Y. Li, and H. Ling, “LIME: Low-light image enhancement nit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778, doi:
via illumination map estimation,” IEEE Trans. Image Process., vol. 26, 10.1109/CVPR.2016.90.
no. 2, pp. 982–993, Feb. 2017, doi: 10.1109/TIP.2016.2639450. [44] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
networks for biomedical image segmentation,” in Medical Image Com-
[21] S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved
puting and Computer-Assisted Intervention—MICCAI 2015, N. Navab,
enhancement algorithm for non-uniform illumination images,” IEEE
J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham, Switzerland:
Trans. Image Process., vol. 22, no. 9, pp. 3538–3548, Sep. 2013, doi:
Springer, pp. 234–241.
10.1109/TIP.2013.2261309.
[22] K. G. Lore, A. Akintayo, and S. Sarkar, “LLNet: A deep autoencoder [45] S. Lao et al., “Attentions help CNNs see better: Attention-based hybrid
approach to natural low-light image enhancement,” Pattern Recognit., image quality assessment network,” in Proc. IEEE/CVF Conf. Comput.
vol. 61, pp. 650–662, Jan. 2017, doi: 10.1016/j.patcog.2016.06.008. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 1139–1148,
[23] F. Lv, F. Lu, J. Wu, and C. Lim, “MBLLEN: Low-light image/video doi: 10.1109/CVPRW56347.2022.00123.
enhancement using CNNs,” in Proc. Brit. Mach. Vis. Conf., 2018, p. 4. [46] K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality
assessment: Unifying structure and texture similarity,” IEEE Trans.
[24] W. Ren et al., “Low-light image enhancement via a deep hybrid
Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2567–2581, May 2022,
network,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4364–4375,
doi: 10.1109/TPAMI.2020.3045810.
Sep. 2019, doi: 10.1109/TIP.2019.2910412.
[47] J. Liu, D. Xu, W. Yang, M. Fan, and H. Huang, “Benchmarking low-
[25] L. Tao, C. Zhu, G. Xiang, Y. Li, H. Jia, and X. Xie, “LLCNN: A
light image enhancement and beyond,” Int. J. Comput. Vis., vol. 129,
convolutional neural network for low-light image enhancement,” in Proc.
no. 4, pp. 1153–1184, Apr. 2021, doi: 10.1007/s11263-020-01418-8.
IEEE Vis. Commun. Image Process. (VCIP), Dec. 2017, pp. 1–4, doi:
10.1109/VCIP.2017.8305143. [48] W. Yang, W. Wang, H. Huang, S. Wang, and J. Liu, “Sparse gradient
[26] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical regularized deep retinex network for robust low-light image enhance-
low-light image enhancer,” in Proc. 27th ACM Int. Conf. Multimedia. ment,” IEEE Trans. Image Process., vol. 30, pp. 2072–2086, 2021, doi:
New York, NY, USA: Association for Computing Machinery, Oct. 2019, 10.1109/TIP.2021.3050850.
pp. 1632–1640, doi: 10.1145/3343031.3350926. [49] X. Chen et al., “Symbolic discovery of optimization algorithms,” 2023,
arXiv:2302.06675.
[27] Y. Zhang, X. Guo, J. Ma, W. Liu, and J. Zhang, “Beyond brightening
low-light images,” Int. J. Comput. Vis., vol. 129, no. 4, pp. 1013–1037, [50] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
Apr. 2021, doi: 10.1007/s11263-020-01407-x. similarity for image quality assessment,” in Proc. 37th Asilomar Conf.
Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.
[28] R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, “Retinex-inspired
unrolling with cooperative prior architecture search for low-light image [51] S. Zheng and G. Gupta, “Semantic-guided zero-shot learning for
enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. low-light image/video enhancement,” in Proc. IEEE/CVF Win-
(CVPR), Jun. 2021, pp. 10561–10570. ter Conf. Appl. Comput. Vis. Workshops (WACVW), Jan. 2022,
pp. 581–590.
[29] L. Ma, T. Ma, R. Liu, X. Fan, and Z. Luo, “Toward fast, flexible, and
robust low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. [52] S. Su et al., “Blindly assess image quality in the wild guided by a
Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 5637–5646. self-adaptive hyper network,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2020, pp. 3664–3673.
[30] W. Yang, S. Wang, Y. Fang, Y. Wang, and J. Liu, “From fidelity to
perceptual quality: A semi-supervised approach for low-light image [53] H. Talebi and P. Milanfar, “NIMA: Neural image assessment,” IEEE
enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Trans. Image Process., vol. 27, no. 8, pp. 3998–4011, Aug. 2018, doi:
(CVPR), Jun. 2020, pp. 3060–3069. 10.1109/TIP.2018.2831899.
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6339
[54] S. A. Golestaneh, S. Dadsetan, and K. M. Kitani, “No-reference Yuk-Hee Chan (Member, IEEE) received the B.Sc.
image quality assessment via transformers, relative ranking, and self- degree (Hons.) in electronics from The Chinese
consistency,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. University of Hong Kong in 1987 and the Ph.D.
(WACV), Jan. 2022, pp. 3209–3218. degree in signal processing from The Hong Kong
[55] S. van der Walt et al., “Scikit-image: Image processing in Python,” PeerJ, Polytechnic University in 1992. From 1987 to 1989,
vol. 2, Jun. 2014, Art. no. e453. he was an Research and Development Engineer with
[56] C. Chen and J. Mo, “IQA-PyTorch: PyTorch toolbox Elec & Eltek Group, Hong Kong. He joined The
for image quality assessment,” 2022. [Online]. Available: Hong Kong Polytechnic University in 1992, where
https://github.com/chaofengc/IQA-PyTorch he is currently an Associate Professor with the
[57] N. Detlefsen et al., “TorchMetrics—Measuring reproducibility in Department of Electrical and Electronic Engineer-
PyTorch,” J. Open Source Softw., vol. 7, no. 70, p. 4101, Feb. 2022, ing. He has published over 165 research papers in
doi: 10.21105/joss.04101. various international journals and conferences. His research interests include
image processing and deep learning. He was the Chair of the IEEE Hong Kong
Section in 2015. He is the Treasurer of Asia–Pacific Signal and Information
Processing Association (APSIPA) Headquarters.