WINNet_Wavelet-Inspired_Invertible_Network_for_Image_Denoising
WINNet_Wavelet-Inspired_Invertible_Network_for_Image_Denoising
Abstract— Image denoising aims to restore a clean image from even more important as they can also be used to solve other
an observed noisy one. Model-based image denoising approaches image restoration problems by acting as a powerful prior.
can achieve good generalization ability over different noise levels In this paper, we assume the noise is additive, white and
and are with high interpretability. Learning-based approaches
are able to achieve better results, but usually with weaker Gaussian. The observed noisy image y is expressed as:
generalization ability and interpretability. In this paper, we pro-
pose a wavelet-inspired invertible network (WINNet) to combine y = x + n, (1)
the merits of the wavelet-based approaches and learning-based
approaches. The proposed WINNet consists of K -scale of lifting where x is the clean image and n ∼ N (0, σ 2 ) represents the
inspired invertible neural networks (LINNs) and sparsity-driven
denoising networks together with a noise estimation network. The measurement noise with variance σ 2 .
network architecture of LINNs is inspired by the lifting scheme Image denoising has a rich literature. Depending on how
in wavelets. LINNs are used to learn a non-linear redundant priors have been exploited, image denoising algorithms can be
transform with perfect reconstruction property to facilitate noise generally classified into model-based methods e.g., [8]–[20],
removal. The denoising network implements a sparse coding and learning-based methods e.g., [21]–[36].
process for denoising. The noise estimation network estimates the
noise level from the input image which will be used to adaptively The model-based methods [8]–[20] use optimization strate-
adjust the soft-thresholds in LINNs. The forward transform gies based on well-defined image priors or noise statis-
of LINNs produces a redundant multi-scale representation for tics which lead to algorithms with good interpretability and
denoising. The denoised image is reconstructed using the inverse strong generalization ability. The typical priors used for image
transform of LINNs with the denoised detail channels and the denoising include for instance image-domain smoothness,
original coarse channel. The simulation results show that the
proposed WINNet method is highly interpretable and has strong transform-domain sparsity [8]–[13], patch-domain non-local
generalization ability to unseen noise levels. It also achieves self-similarity [15], [16], [19] and low-rank [17], [18], [20].
competitive results in the non-blind/blind image denoising and The wavelet transform has been very effective in
in image deblurring. many imaging applications. It provides a versatile
Index Terms— Image denoising, wavelet transform, invertible multi-resolution analysis with perfect reconstruction property
neural networks. and time-frequency localization property. Therefore, the
wavelet transform has been applied for a wide range of
I. I NTRODUCTION image restoration problems, including image denoising [8],
[10], [11], image deconvolution [37], [38] and image
I MAGE denoising is a classical and fundamental inverse
problem in image processing and computer vision. Image
denoising algorithms aim to restore a noiseless image from
inpainting [39], [40]. The wavelet transform is a fixed
transform and, in some cases, learning a transformation
noisy observations obtained by digital cameras. Given that the better adapted to the data at hand may lead to more effective
observations are inevitably noisy due to the random nature of solutions. Dictionary learning [12] tries to achieve that by
the photon emission and sensing process, and the imperfection learning a (redundant) sparsifying transform from training
of the signal conversion process [1], [2]; image denoising is data. However, both the analytical transforms and the learned
an essential step for further image processing and computer dictionaries are linear transformations. A suitable non-linear
vision applications. With the plug-and-play and the unfolding transform with perfect reconstruction property has the
technique [3]–[7], image denoising algorithms have become potential to achieve better performances.
Besides the nice properties of the wavelet transform, the
Manuscript received 12 September 2021; revised 3 February 2022, 8 May noise adaptive non-linear operator also contributes to the
2022, and 3 June 2022; accepted 7 June 2022. Date of publication 27 June effectiveness of wavelet-based methods. Donoho and John-
2022; date of current version 1 July 2022. This work was supported by the
Engineering and Physical Sciences Research Council (EPSRC) under Grant stone [8] proposed soft-thresholding operator and applied it
EP/R032785/1. The associate editor coordinating the review of this manuscript with the optimal “universal threshold” T = 2σ 2 log N
and approving it for publication was Dr. Ran He. (Corresponding author: (where N is the number of samples) to the wavelet-domain
Jun-Jie Huang.)
Jun-Jie Huang is with the College of Computer Science, National coefficients to remove noise. Chang et al. [10] proposed a
University of Defense Technology, Changsha 410073, China (e-mail: BayesShrink threshold T = σ̂ 2 /σ̂ X (where σ̂ and σ̂ X are
[email protected]). the estimated standard deviation of noise and signal, respec-
Pier Luigi Dragotti is with the Department of Electrical and Electronic
Engineering, Imperial College London, London SW7 2AZ, U.K. tively) for soft-thresholding with a Bayesian framework for
Digital Object Identifier 10.1109/TIP.2022.3184845 wavelet-domain image denoising. Both the “universal thresh-
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
4378 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 31, 2022
old” and BayesShrink threshold are adaptively adjusted for tion property, and the denoising network implements a
different noise levels. basis pursuit denoising process.
The learning-based methods [21]–[36] construct a denoising • The proposed method is able to achieve blind image
model by learning from noisy-clean image pairs. The flex- denoising. The propose model-inspired noise estima-
ibility of deep network structure design and the availability tion network estimates noise from the input image.
of large training dataset and high computational resources To adapt the learned WINNet to the test noise level,
boosts the performance, while the learned models have more the soft-thresholds in the network are adaptively adjusted
restricted generalization ability compared to the model-based with respect to the estimated noise level leading to strong
methods and are usually treated as black-box systems. There- generalization ability to unseen noise levels.
fore, researchers mainly aim to explore learning-based meth- • The proposed WINNet is with high interpretability and
ods with more efficient and effective architecture as well as strong generalization ability. It achieves performances
strategies to improve the generalization ability of the learned comparable to those of state-of-the-art algorithms on non-
denoising model. blind/blind image denoising and image deblurring despite
In this paper, we aim to combine the merits of it simplicity and the small number of parameter to train.
the learning-based and the model-based image restoration The rest of the paper is organized as follows: Section II
approaches. The overall network structure follows the prin- discusses related works in image denoising and invertible
ciples of wavelet thresholding and consists of a multi-scale neural networks, Section III provides an overview of WINNet.
forward transform, denoising of the detail coefficients, and Section IV describes the architecture of LINNs which are
inverse transform. However, instead of using fixed and linear the building blocks of WINNet, whereas Section V focuses
transform, we propose to learn a non-linear transform based on the denoising network, the noise estimation strategy and
on the lifting scheme. In our design we want our non-linear the training strategy. Section VI shows the numerical results,
transform to inherit the sparsifying ability, perfect reconstruc- compares the proposed method with other image denoising
tion property as well as the multi-scale property of the wavelet methods and demonstrates the application of WINNet on
transform. image deblurring. Finally Section VII concludes the paper.
Here we propose a novel wavelet-inspired invertible net-
work (WINNet) with redundant invertible sparsifying trans- II. R ELATED W ORK
forms by leveraging the invertible neural network framework.
By following a strategy similar to wavelet domain thresh- A. Deep Neural Networks for Image Denoising
olding, we aim to enhance the generalization ability and Image denoising methods based on deep neural networks
interpretability of the learning-based image denoising method. construct a differential non-linear model, and learn model
We propose to learn a non-linear wavelet-like transform with parameters from training samples. Schuler et al. [43] pro-
perfect reconstruction (PR) property using invertible neural posed to train a multi-layer perceptron (MLP) for image
networks with a structure inspired by the lifting scheme [41], denoising and the MLP method achieves image denoising per-
[42] rather than learning features with unconstrained CNNs. formance on par with that of the classical BM3D method [16].
We call these lifting inspired networks LINNs. The proposed The denoising convolutional neural networks (DnCNN)
WINNet is made of several LINNs, one per scale. With PR method [25] learns a deep CNN model with batch normal-
property, each learned LINN can serve as a versatile transform ization layers and a skip connection. The fast and flexible
which can transform the input image to sparse transform denoising network (FFDNet) [26] takes the noise level map
coefficients using its forward pass and then inversely trans- and the noisy image as input to the model, and therefore
form the denoised coefficients back to image domain using can handle spatially varying noise. The convolutional blind
its backward pass. For denoising of transform coefficients, denoising network (CBDNet) [27] consists of two convolution
a sparsity-driven denoising network is applied to remove the subnetworks: a noise estimation subnetwork which learns to
noise. Moreover, to achieve good generalization ability, all infer a noise level map from the noisy image and a non-blind
the soft-thresholds in WINNet are set to be noise adaptive denoising subnetwork which has a U-Net like structure.
and can be adjusted according to the estimated noise level. Mohan et al. [28] proposed to remove all bias terms in the
In this way, the proposed denoising network achieves good denoising CNN to eliminate the model bias to training noise
generalization ability even to unseen noise. The noise level levels. The bias-free CNN (BF-CNN) has a scale homogeneity
is estimated using a model-inspired noise estimation network property and shows consistently better generalization ability
which exploits low-rank patches on the input noisy image and than its counterpart with bias terms. Helou and Süsstrunk [29]
estimates noise levels as the minimum singular value of the proposed a blind universal image fusion denoiser (BUIFD)
weighted patches. whose structure is derived from a Bayesian framework and
The contribution of this paper is three-fold: consists of a noise level CNN, a prior CNN and a fusion
network. By explicitly incorporating the noise level, BUIFD
• We propose an invertible thresholding network for image shows stronger generalization ability than that of conventional
denoising. It is designed based on the principles of denoising CNNs. The self-guided network (SGN) [34] adopts
wavelet-based methods, therefore leads to a model with a top-down self-guidance strategy to exploit the multi-scale
high interpretability. The LINNs at various scales produce information for image denoising which leads to a highly
a non-linear redundant transform with perfect reconstruc- memory and runtime efficient model. Ren et al. [36] propose
HUANG AND DRAGOTTI: WINNet: WAVELET-INSPIRED INVERTIBLE NETWORK FOR IMAGE DENOISING 4379
Fig. 2. Overview of the proposed wavelet-inspired invertible network (WINNet). It consists of K levels of lifting inspired invertible neural networks (LINN)
and denoising network. The forward transform of LINN non-linearly converts the input noisy image into coarse part (green) and detail parts (black). Denoising
network will perform denoising operation on the detail part while the coarse version is decomposed again using a second level and the decomposition and
denoising steps are repeated K times. The backward transform of the LINN will reconstruct the denoised image using the denoised detail parts and the original
coarse part. The estimated noise level from the noise estimation network will be used to adjust the soft-thresholds of the soft-thresholding non-linearity to
make the WINNet to adapt well to the current noise level.
TABLE I
T HE N OTATIONS U SED IN T HIS PAPER
for scalars and bold capital letters for matrices (or tensors). for example, the Cayley transform [59] of a skew-symmetric
The notation ()km has been used to specify the m-th features matrix − T is an orthogonal matrix:
or network at the k-th scale. −1
K = I − ( − T ) I + ( − T ) , (6)
A. Splitting/Merging Operator where I is the identity matrix. The corresponding tight frame
The splitting operator is used to separate the input image K S can therefore be a sub-matrix (i.e., extracting p2 columns)
into two parts in the forward pass of LINNs, and the merging of the learned orthogonal matrix K parameterized by learnable
operator performs the inverse of the splitting operator in the parameters ∈ Rc×c .
backward pass of LINNs. The splitting/merging operators in
current methods [44]–[50], [52], [57] act as a non-redundant
B. Network Architecture of LINN
transform and keep the same number of input and output
coefficients. At each scale, the Predict/Update networks are shared
In this paper, we propose to learn LINNs with redun- among the forward and inverse transform of LINN, but are
dant representations in analogy with the redundant wavelet connected with different signs and directions. Fig. 3(a) and
transform which provides better performance compared to the Fig. 3(b) shows the schematics of the forward transform
decimated wavelet transform in image restoration tasks [11], and the inverse transform of the k-th scale LINN, respec-
[58]. Therefore, the splitting operator will lead to a redundant tively. In the forward transform, C k , D k will be non-linearly
representation, and merging operator recovers the input image transformed to a representation which is more suitable for
from the redundant representation. denoising. After denoising, the denoised detail part and the
For an input image Y k ∈ R1×N1 ×N2 at the k-th scale, the original coarse part will be transformed back to the original
splitting operator S is parameterized by a convolutional kernel domain using the inverse transform of LINN.
K S ∈ Rc×1× p× p where the subscript S represents splitting, In the forward transform (shown in Fig. 3(a)), Predict
c denotes the number of channels and p denotes the spatial network conditioned on the coarse part aims to predict the
filter size. A convolution is performed to obtain multi-channel detail part to make the resultant residual of the detail part
features F k ∈ Rc×N1 ×N2 : sparse. The Update network conditioned on the detail part is
used to adjust the coarse part to make it a smoother version
Fk = K S ⊗ Y k , (2) of the input image. There are M pairs of Predict and Update
where ⊗ denotes the convolution operator. networks. The Predict and Update networks are applied alter-
The split operation then divides the first h channels of F k natively
k k to update
the coarse and detail parts. Let us denote
into coarse part C k ∈ Rh×N1 ×N2 , and the remaining c − h C 0 , D 0 = C k , D k . The m-th pair (m ∈ [1, M]) of update
channels into detail part D k ∈ R(c−h)×N1 ×N2 . In this paper, and predict operation can be expressed as:
⎧
we denote the splitting operator as: ⎨ Dk = Dk
m−1 − Pm C m−1 ,
k k
m
(7)
S K S (Y k ) = (C k , D k ). (3) ⎩ Ck = Ck + m ,
k k
m m−1 U m D
The merging operator M will be used in the backward pass
of LINNs and represents the inverse of the splitting opera- where D km and C km denotes the updated detail part and coarse
tor, and is parameterized by a convolutional kernel K M ∈ part using the m-th Predict network Pmk (·) and Update network
R1×c× p× p . It first concatenates C k and Dk and this results in Umk (·), respectively.
k k The inverse transform of the k-th scale LINN (shown in
F̂ . A convolution is performed to recover Ŷ :
k
Fig. 3(b)) is constructed using the same set of M pairs of
Ŷ = K M ⊗ [C k , D k ], (4) Predict and Update networks {Pmk (·), Umk (·)}m=1
M used
in the
forward transform. The representation D m−1 , C m−1
k k
for m ∈
where [·] denotes the concatenation operation. The merging
operator can then be denoted as: [1, · · · , M] can be estimated based on Dkm , C km and Pmk (·)
and Umk (·) as follows:
M K M ([C k , D k ]) = Ŷ .
k
(5) ⎧
⎨ Ck = C k
− U k
D k
,
m m
To achieve invertibility, we can pick K S so that it can
m−1
m (8)
⎩ Dk
m−1 = D m + Pm C m−1 .
k k k
be reshaped to a tight frame with c ≥ p2 . The merging
convolutional kernel then simply corresponds to the transpose Eqn. (7) and
of the splitting convolutional kernel. This will ensure the Eqn. (8) imply that when no lossy operation
is applied on C kM , D kM , the inverse transform of LINN can,
splitting and merging operators are invertible. We can use by construction, perfectly recover the inputs of the forward
orthogonal transforms of size c ×c such as orthogonal wavelet transform with the Predict and Update networks.
transforms, Discrete Cosine Transforms (DCT) and other ana-
lytical transforms as the splitting and merging operators. The
tight frame is then obtained by picking p2 columns of the C. Predict/Update Networks
original orthogonal matrix. The Predict and Update networks can be any functions and
Besides the analytical transforms, it is also possible to learn their properties will not affect the invertibility of LINN. In this
an orthogonal matrix with proper parameterization methods, paper, we use the same structure for each Predict/Update
4382 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 31, 2022
Fig. 6. The CLISTA denoising network. At the t-th layer, the sparse feature
G t is estimated using a soft-thresholding operator. The denoised detail channel
is estimated from G T with the synthesis convolutional layer.
To ensure λ are non-negative, a Softplus function1 is applied where G = [G 1 , . . . , G N ] are the sparse features, D =
on each learned thresholds. [ D1 , . . . , D N ] is the over-parameterized convolutional dictio-
3) Multi-Scale Property: Multi-resolution signal decompo- nary, and λ = [λ1 , . . . , λ N ] are the regularization parameters.
sition is an essential property of the wavelet transform. For The above l1 -norm minimization problem can be
an input image, the wavelet transform provides a multi-scale solved using Iterative Shrinkage-Thresholding Algorithm
analysis which captures the information at different scales. (ISTA) [61]:
In order to mimic the wavelet transform, we iterate the 1
LINN-based decomposition on the coarse part. Moreover, G t = Tλ/μ G t −1 + D T ⊗ DkM − D ⊗ G t −1 , (11)
μ
we apply à trous convolution with dilation rate 2k−1 for the
k-th scale LINN even though the redundancy factor in our where μ is the step size.
setting is not two. As a result, the larger scale LINN will have The LISTA [62] algorithm parameterizes the unknown dic-
a bigger receptive field. Fig. 5 shows the exemplars of dilated tionaries in ISTA as learnable parameters. We apply a con-
3 × 3 filter at two different levels. volutional LISTA (CLISTA) as our sparsity-driven denoising
network. The schematic of the CLISTA denoising network
1 Softplus(x) = 1 · log(1 + exp (β · x)). is shown in Fig. 6. There are T layers of soft-thresholding
β
HUANG AND DRAGOTTI: WINNet: WAVELET-INSPIRED INVERTIBLE NETWORK FOR IMAGE DENOISING 4383
Fig. 8. Visualization results of the basis functions learned by LINNs. The amplitude of the non-zero value v nz is shown on the upper left corner of each
figure. For better visualization, the maximum absolute value in each visualized basis function has been rescaled by a factor which is shown on the lower left
corner of each figure.
a global averaging pooling and a Sigmoid function to map the VI. E XPERIMENTAL R ESULTS
weight in the range of [0, 1]. The parameters of SENet can In this section, we perform experiments to show the prop-
be learned using backpropagation. For the noise estimation erties and validate the effectiveness of the proposed WINNet.
network, the MSE noise loss is used: We will first introduce the experimental settings, visualize the
N learned transform and the sparsity-driven denoising network
1 2
Ln = σi − σ̂i , (16) results, perform ablation studies on the key components and
2N finally show comparisons with other methods.
i=1
where σi and σ̂i denotes the ground-truth noise level and the
A. Implementation Details
estimated noise level, respectively.
We follow the training and evaluation settings in [25] to
perform experiments. Before training and testing, the clean
D. Training Details images are normalized to [0, 255]. The noisy images are
WINNet can learn from noisy-clean image pairs of a single obtained by adding additive white Gaussian noise to the clean
noise level or a range of noise levels as in [25]–[27]. The image with respect to Eq. (1) with variance σ 2 .
average mean squared error (MSE) between the restored image 1) WINNet Configuration: For default setting, the convo-
and the clean image is used as the reconstruction loss: lution kernel K S ∈ Rc×1× p× p of the splitting operator are
reshaped from a DCT transform matrix T ∈ R p × p (i.e. the
2 2
N
1 i -th row of T is the reshaped i -th p × p filter in K S ). We set
Lr = X i − X̂ i 22 , (17) c = p2 and p = 4. The convolution kernel for the merging
2N
i=1 operator can be constructed in a similar manner. Since by
default p = 4, there will be 1 coarse channel and 15 detail
where X i and X̂ i denotes the input clean image and the
channels.
denoised image, respectively.
The number of update and predict network pairs is set to
The overall training objective of our WINNet combines
M = 4, and the number of residual blocks in each PUNet is
the reconstruction loss Lr , the spectral norm loss Ls and the
set to J = 4. The number of feature channels in PUNet is
orthogonal loss Lo :
set to 32 and the spatial filter size in SepConv layers is set to
Lall = Lr + λ1 Ls + λ2 Lo , (18) q = 5. For CLISTA denoising network, there are T = 3 layers
with 64 channels. The support of the spatial filter for W a and
where λ1 , and λ2 are regularization parameters. W s is set to r = 3.
HUANG AND DRAGOTTI: WINNet: WAVELET-INSPIRED INVERTIBLE NETWORK FOR IMAGE DENOISING 4385
Fig. 9. Visualization of the coarse and detail channels. (a) and (f) show the coarse component at two different scales. (b) - (e) and (g) - (j) show the
detail channels before and after denoising by CLISTA denoising network in WINNet. Each sub-figure shows the feature map of a detail channel and the
corresponding histogram. ( DkM ( j) denotes the j-th channel of the detail part).
2) Training and Testing Settings: The 400 training images center pixel on one of the feature maps, and then reconstruct
from the Berkeley segmentation dataset (BSD) [68] of size the image using the backward pass of LINN. As the PUNets
180 × 180 are used for training. The training patch size for represent highly non-linear functions and may have different
non-blind and blind denoising setting is 40 × 40 and 50 × 50, responses for signals with varying amplitudes, the elementary
respectively. atoms is visualized using different non-zero values. Since
For training with a single noise level, three noise levels our transform is highly non-linear, it is incorrect to call the
are considered, i.e., σ N = 15, 25 and 50. For blind image reconstructed functions “basis functions”or elementary atoms,
denoising scenario, the training noise level σ N is uniformly however, to keep the intuition that what we are visualizing
drawn from [0, 55]. The regularization parameter for spectral is a variation of the basis functions related to linear trans-
norm loss and orthogonal loss is set to λ1 = 0.1 and λ2 = 10, forms, with a slight abuse, we keep calling them “elementary
respectively. The spectral norm loss is evaluated once every atoms”.
10 iterations. As discussed in Section V-B, when training Fig. 8 shows the atoms corresponding to different channels
blind image denoising models, all the soft-thresholds will be at level 1 and level 2 in WINNet. For better visualization,
rescaled with respect to the training noise level. the maximum absolute value in each visualized basis function
The weights of the convolution layers are initialized using has been rescaled by a factor shown on the lower left corner
the Kaiming initialization method [69]. The stochastic gradient of each figure. Fig. 8 (a) - (d) show the elementary atoms
descent with Adam optimizer [70] is used for training with of 4 representative channels from level 1 LINN. We can
initial learning rate lr = 1 × 10−3 and β = (0.9, 0.999). The see that the basis functions have compact support. For the
total number of training epochs is set to 50 and the learning coarse channel, the basis function gradually changes to a delta
rate decays from 1 × 10−3 to 1 × 10−4 at the 30-th epoch. The function when the amplitude increases from 0.01 to 1.25. For
batch size N is set to 32. the detail channels, the shape of the elementary atoms only
The testing images include the 12 images from Set12 [25], slightly changes when we increase the amplitude of the input
and the 68 natural images from the BSD68 [68]. PSNR is used pixel. Fig. 8 (e) - (h) show the atoms of 4 representative
as the evaluation metric. channels from level 2 LINN. Different from what we observed
at level 1, the basis functions in level 2 have larger support,
B. Visualizing the Intermediate Results and the shapes change from concentrated to spread.
In this section, we will visualize the output produced by Both level 1 and level 2 functions have non-linear responses
components of a WINNet with K = 2 levels which is trained to the input amplitude, while the atoms of the level 2 have
on data with noise level σ N = 25. much larger support compared to that of the level 1. The
1) Elementary Building Blocks Learned by WINNet: different support size of the functions at level 1 and level 2
The proposed invertible network is inspired by the lifting is possibly due to their different functionality in WINNet,
scheme [41], [42] in wavelets, therefore, it would be mean- and is also a consequence of the dilation of the filter support
ingful to visualize the learned elementary atoms. discussed before and depicted in Fig. 5.
The basis functions or elementary atoms of the wavelet 2) Noisy and Denoised Feature Maps: The CLISTA denois-
transform can be visualized by setting to zero all the wavelet ing network is the only non-invertible component in WINNet,
coefficients with the exception of one coefficient and then and is responsible for removing the feature components cor-
by reconstructing the corresponding signal with the synthesis responding to noise. By visualizing the feature maps before
filter bank. In analogy with the wavelet case, we set all the and after the denoising network, we can further gain insights
k
feature maps in (C kM , D̂ M ) to zero with the exception of the into the workings of the proposed WINNet.
4386 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 31, 2022
TABLE II
A BLATION S T U DY: T HE E FFECT OF U SING D IFFERENT S PLITTING /M ERGING O PERATORS E VALUATED ON Set12, I NCL U DING H AAR T RANSFORM W ITH
D ECIMATION ( D ) AND U N -D ECIMATION (U D ), DCTc (DCT T RANSFORM W ITH S IZE c × c) AND L EARNc (L EARNED O PERATOR W ITH S IZE c × c)
TABLE III
A BLATION S TUDY: T HE E FFECT OF U SING D IFFERENT
N UMBER OF C OARSE C HANNELS h W HEN U SING DCT16
AS S PLITTING /M ERGING O PERATOR
Fig. 11. Ablation Study: the influence of the number of layers T in CLISTA
Fig. 10. Visualization of the selected patches for noise level estimation and network.
the generalization ability of the NENet.
selection network (SENet) for different noise levels, respec-
tively. We can see that SENet tends to select different regions
In Fig. 9, we show the feature maps of an exemplar Butterfly for noise estimation. For low noise level, SENet carefully
image from Set12. Each sub-figure shows the feature map selects smooth regions, while for high noise level SENet
of a channel and the corresponding histogram. Fig. 9 (a) avoids highly textured regions and selects more patches. The
(b) (d) show the feature map of 3 exemplar channels output results are consistent with that in [67] which uses a heuristic
from the level 1 LINN in WINNet. We can see that the and iterative algorithm to select low-rank patches. Fig. 10
coarse channel feature map looks like a natural image but is (c) further shows the noise estimation results for images with
still with artifacts and the detail channel feature maps before noise level σT ∈ [0, 100]. We can find that the proposed
denoising contain both noise and some image contents and the NENet is able to provide highly accurate noise estimation for
histogram is spread. Fig. 9 (c) - (e) show the corresponding images with σT ∈ [0, 100] though the training data is only
detail channels after denoising. We can see that the noise with noise level σ N ∈ [0, 55].
has been significantly reduced, the edges are sharper and the
histograms become more concentrated around the origin. This
indicates that noise has been effectively removed by CLISTA C. Ablation Studies
denoising network. In this subsection, we conduct ablation studies to analyze
In Fig. 9 (f), the coarse channel feature map contains different components of the proposed WINNet. The training
high-frequency enhanced features. For the detail channels, and testing noise level is set to 50, and the analysed WINNet
there is no significant differences on the feature maps before is with 1-scale and with M = 2 and J = 2.
and after the CLISTA denoising network since the 1-st level Splitting/merging operator determines the initial coarse
coarse channel feature C 1M (shown in Fig. 9(a)) mainly con- and detail part of LINNs in terms of signal characteristics and
tains minor artifacts. Therefore, we only show the feature redundancy, therefore is essential in the overall framework. In
maps before denoising in Fig. 9 (g) (i), and also include the Table II, we show the performance of WINNet using different
difference of the feature maps before and after denoising in splitting/merging operators, including Haar transform with
Fig. 9 (h) (j). We can observe that the 2-nd level CLISTA decimation (d) and without decimation (ud), DCT transform
makes minor modifications to the feature maps. of different sizes, and the learned operators of different sizes
3) Noise Estimation Network: The proposed noise estima- using Cayley transform. The number of coarse channel is set
tion network is based on the idea that the Gaussian noise level to 1. We can see that a redundant Haar transform (i.e. Haar
can be estimated as the smallest eigenvalue of the low-rank (ud)) can lead to significantly better performance than using
patches. Therefore, NENet would have highly interpretable a non-redundant Haar transform (i.e. Haar (u)). This result
results and strong generalization ability. implies that it is important to use a redundant representation.
Fig. 10 (a) shows an exemplar clean image, and (b) - We therefore compared the performance of using DCT trans-
(e) shows the region of the selected low-rank patches by the forms with different sizes. We can see that the average PSNR
HUANG AND DRAGOTTI: WINNet: WAVELET-INSPIRED INVERTIBLE NETWORK FOR IMAGE DENOISING 4387
TABLE IV
T HE M ODEL S IZE AND AVERAGE PSNR ( D B) OF D IFFERENT NON-BLIND I MAGE D ENOISING M ETHODS T ESTED ON BSD68 AND Set12 D ATASET W ITH
N OISE L EVEL σ = 15, 25, 50. (T HE B EST R ESULT IN E ACH C OLUMN I S IN B OLD AND THE R ESULTS W ITHIN 0.05 D B D IFFERENCE TO THE B EST
R ESULT A RE W ITH U NDERLINE )
Fig. 14. Image denoising results of different methods on image “test044” from BSD68 dataset with noise level 50.
on noise level σ N = 25, and tested on testing noise levels Fig. 13 further shows the performance and number of para-
σT ∈ [5, 200]. We can see that the performance of WINNet meters of comparison methods evaluated on Set12 with noise
w/o orthogonal loss would deteriorate in the low noise region. levels σ = [15, 25, 50]. In general, the proposed WINNet
When using the proposed orthogonal loss, WINNet is able to (2-scale) achieves similar performance as the state-of-the art
achieve robust estimation for images with unseen low noise methods, including DeamNet [36], and DRUNet [31]. We
levels. also note that the model size of WINNet (2-scale) is only
around 63%, 10%, 16%, and 1% of the model size of DnCNN,
SGN, DeamNet, and DRUNet, respectively. Compared to
D. Comparison With Other Methods DnINN [35], WINNet (1-scale) and WINNet (2-scale) achieve
We compare the proposed WINNet with several state- around 0.15 dB and 0.21 dB higher PSNR and have a model
of-the-art image denoising algorithms including the model- size which is around 1.2 and 2.5 times larger than DnINN’s,
based methods: BM3D [16], WNNM [17], EPLL [22], and respectively. This validates the effectiveness of the improved
the learning-based methods: DnCNN [25], FFDNet [26], WINNet architecture design. DeamNet [36] achieves the best
BUIFD [29], BF-CNN [28], DRUNet [31], SGN [34], Deam- performance in Fig. 13 and is a deep unfolded network
Net [36], and DnINN [35]. All the methods based on deep with multiple iteration stages. Though DeamNet achieves only
neural networks were trained using BSD400 training dataset, around 0.05dB higher PSNR than the proposed WINNet (2-
and the training patch size for non-blind and blind image scale), it requires a number of learnable parameters more
denoising is set to 40 × 40 and 50 × 50, respectively. than 6 times greater than WINNet. Fig. 14 shows the image
1) Non-Blind Image Denoising: Table IV shows the com- denoising results of different methods on image “test044” from
parison results of different non-blind image denoising meth- BSD68 dataset with noise level 50. It can be seen that WINNet
ods evaluated on three noise levels (i.e., σ = 15, 25, 50). (2 scale) is better than the comparison methods at recovering
All the learning-based methods learn from training samples edges.
with the correct noise level. From the table, we can see that WINNet has strong generalization ability to images with
the proposed WINNets achieve better performance than the unseen noise levels as well. Although WINNet only sees
model-based methods i.e., BM3D [16], WNNM [17], and training image pairs with a fixed noise level σ N , its parameters
EPLL [22] by a large margin, and our proposed WINNets can be adjusted to adapt to unseen noise levels. When the
also outperforms the deep learning based methods TNRD [24], testing noise level σT ≥ σ N , all the soft-thresholds in PUNets
DnCNN-S [25], BF-CNN [28], FFDNet [26] , DnINN [35] and in CLISTA denoising networks are rescaled by a factor
and SGN [34], and achieves comparable performance with σT /σ N . When σT < σ N , only the soft-thresholds in CLISTA
DeamNet [36] and DRUNet [31]. With one more level decom- denoising networks are rescaled by a factor σT /σ N .
position, WINNet (2-scale) further improves the WINNet (1- Fig. 12 shows the performance of the proposed WINNet,
scale). BF-CNN and DnCNN-S which are trained on noise level
HUANG AND DRAGOTTI: WINNet: WAVELET-INSPIRED INVERTIBLE NETWORK FOR IMAGE DENOISING 4389
TABLE V
T HE AVERAGE PSNR ( D B) OF D IFFERENT BLIND I MAGE D ENOISING M ETHODS T RAINED ON BSD400 D ATASET W ITH N OISE L EVEL σ ∈ [0, 55] AND
T ESTED ON BSD68 AND Set12 D ATASET W ITH N OISE L EVEL σ ∈ [5, 145]. (T HE B EST AND THE S ECOND B EST R ESULT IN E ACH C OLUMN I S IN
B OLD AND W ITH U NDERLINE , R ESPECTIVELY )
σ N = 25, while are tested on testing noise levels σT ∈ Algorithm 1 Plug-and-Play Image Deblurring With Blind
[5, 195]. We can see that the performance of DnCNN remains WINNet
similar when σT ≤ σ N and quickly deteriorates otherwise.
For BF-CNN and WINNet, they can well generalize to testing
images with noise levels σT = σ N . When σT ≥ σ N , WINNet
achieves improved gain compared to BF-CNN. When σT =
15, the performance of WINNet is inferior to BF-CNN, but
when σT = 5, WINNet achieves around 0.8 dB higher PSNR
than BF-CNN.
2) Blind Image Denoising: For blind image denoising, the
image denoising method does not take the noise level of the
noisy image as an input and directly recovers the denoised
image from the input noisy image. The comparison methods
include DnCNN-B [25], BUIFD [29], and BF-CNN [28]. The
depth of DnCNN-B and BF-CNN are increased form 17 to
20 and their number of parameters is around 660 × 103 . For for example, image deblurring. In this case, the goal is to
WINNet, we use the 1-scale model for comparison which has recover a sharp image x from the blurred and noisy obser-
around 173 × 103 parameters. The number of parameters for vation y = k ⊗ x + n where k is the blurring kernel and
NENet is only around 6 × 103 . The training data for these n ∼ N (0, σ 2 ) represents the measurement noise with variance
methods is BSD400 with AWGN σ N ∈ [0, 55]. σ 2 . The image deblurring task can be formulated as the
Table V shows the testing results of different methods eval- following optimization problem:
uated with images with noise level σ ∈ [5, 145]. We can see
1
that the performance of DnCNN-B [25] is highly competitive x = arg min y − k ⊗ x22 + λ(x), (19)
when the testing noise level is within the range of the training 2σ 2
x
noise level, while quickly deteriorates otherwise. BUIFD [29] where (·) is a prior term and λ is the regularization para-
consists of a noise level CNN, a prior CNN and a fusion meter. With half-quadratic splitting [6], the image deblurring
network which are all based on DnCNN architecture. Its total problem can be solved by iteratively optimizing two sub-
number of parameters is around 119 × 104 . With explicit problems:
noise level learning, BUIFD shows stronger generalization
λσ 2
ability towards unseen noise levels compared to DnCNN-B. x k = arg min y − k ⊗ x22 + x − z k−1 22 , (20)
By removing all bias terms in DnCNN, BF-CNN [28] is able x β2
to well generalize beyond the training noise levels, however, 1
z k = arg min z − x k 22 + (z), (21)
it is slightly less effective when σ ∈ [0, 55] compared to z 2β 2
DnCNN. With the exception of σ = 25, WINNet consistently √
where β = λ/μ is a hyper-parameter and can be interpreted
outperforms all the other methods. as the noise level if the z sub-problem is treated as Gaussian
Fig. 15 further shows the noisy image with different noise denoising on x k .
levels and the blind image denoising results by the proposed In [6], a CNN-based Gaussian denoiser is used to solve
WINNet. We can see that the proposed WINNet is able to the z sub-problem. The hyper-parameter λ is set to be fixed
achieve robust denoising not only within the training noise during iterations, while μ is set to exponentially decay from
level range (marked in green) but also beyond the training a large value to the given noise level σ with a fixed iteration
noise levels. number. Since the proposed NENet is an effective noise level
estimator and WINNet can denoise images with noise beyond
E. Application on Image Deblurring training noise levels, we propose to use WINNet for image
With the plug-and-play technique, image denoisers can be deblurring. At the k-th iteration, the x sub-problem can be
applied to solve general image restoration problems [23]–[27], solved with closed-form solution with the estimated noise
4390 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 31, 2022
Fig. 15. Exemplar noisy images and blind image denoising results of blind WINNet whose training noise levels σ ∈ [0, 55]. The first row shows the noisy
image with different noise levels, and the second row shows the denoising results. The images within the training noise levels are outlined with green.
Fig. 16. The input blurred image y with the blurring kernel k, the convergence curve and the visualization results of x k and z k at different iterations. The
blurring kernel is the first kernel of size 19 × 19 from [71] and the noise level is 2.55 (1%).
TABLE VI
T HE AVERAGE PSNR ( D B) OF D IFFERENT I MAGE D EBLURRING M ETHODS E VALUATED ON Set12
W ITH K ERNELS F ROM [71] AND N OISE L EVEL 2.55 (1%)
level βk (βk denotes the hyper-parameter at the k-th iteration) process of the proposed method on image Cameraman which
where βk is estimated with NENet. The z sub-problem can is blurred using the first kernel from [71]. We can see that the
be solved using the proposed blind WINNet with noise level proposed method converges after 8 iterations and the PSNR of
2βk (perform denoising with a stronger strength to ensure x k and z k consistently improves and finally reaches a similar
convergence). With the proposed robust NENet and WINNet, result.
we can achieve image deblurring without accessing the noise
levels and using the pre-defined regularization parameters; λ is VII. C ONCLUSION
the only free parameter and is set to 0.23 as in [6]. Algorithm 1 In this paper, we have proposed a wavelet-inspired invert-
illustrates the plug-and-play image deblurring algorithm with ible network (WINNet). It consists of K levels of lifting
the proposed WINNet. inspired invertible neural network (LINN) and sparsity-driven
Table VI shows the average PSNR (dB) of the EPLL [22], denoising networks. LINNs are designed to mimic the nice
IRCNN [6], [31] and the proposed method evaluated on Set12 properties of wavelet transform and are used as a non-linear
with 8 different kernels from [71] and noise level 2.55. We can redundant transform with perfect reconstruction property. For
see that the proposed method is able to achieve highly compet- image denoising task, the sparsity-driven denoising network is
itive performance. Fig. 16 shows an exemplar image deblurring used to remove the noise in the detail parts of the transform
HUANG AND DRAGOTTI: WINNet: WAVELET-INSPIRED INVERTIBLE NETWORK FOR IMAGE DENOISING 4391
coefficients and the denoising network can be adjusted to adapt [19] Z. Zha, X. Zhang, Q. Wang, Y. Bai, L. Tang, and X. Yuan, “Group
to unseen noise levels. Together with a model-inspired noise sparsity residual with non-local samples for image denoising,” in Proc.
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2018,
estimation network, the proposed blind WINNet can achieve pp. 1353–1357.
robust blind image denoising results beyond the training noise [20] Z. Zha, B. Wen, X. Yuan, J. T. Zhou, J. Zhou, and C. Zhu, “Triply com-
levels. The flexibility of WINNet has also been demonstrated plementary priors for image restoration,” IEEE Trans. Image Process.,
vol. 30, pp. 5819–5834, 2021.
on the image deblurring task. [21] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
In the future, WINNet as a learnable transform-based image designing overcomplete dictionaries for sparse representation,” IEEE
restoration method can be exploited as building block in other Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
[22] D. Zoran and Y. Weiss, “From learning models of natural image patches
deep neural network based image processing tasks to enhance to whole image restoration,” in Proc. Int. Conf. Comput. Vis., Nov. 2011,
the model interpretability and impose reconstruction constraint pp. 479–486.
on the solutions. It would also be interesting to investigate [23] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising:
Can plain neural networks compete with BM3D?” in Proc. IEEE Conf.
the non-linear image approximation properties of WINNet. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2392–2399.
Another direction is to improve the training memory and [24] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible
runtime consumption of WINNet. framework for fast and effective image restoration,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 39, no. 6, pp. 1256–1272, Jun. 2016.
[25] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
Gaussian denoiser: Residual learning of deep CNN for image denoising,”
R EFERENCES IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017.
[26] K. Zhang, W. Zuo, and L. Zhang, “FFDNet: Toward a fast and flexible
[1] T. Brooks, B. Mildenhall, T. Xue, J. Chen, D. Sharlet, and solution for CNN-based image denoising,” IEEE Trans. Image Process.,
J. T. Barron, “Unprocessing images for learned raw denoising,” in Proc. vol. 27, no. 9, pp. 4608–4622, Sep. 2018.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, [27] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, “Toward convolutional
pp. 11036–11045. blind denoising of real photographs,” in Proc. IEEE/CVF Conf. Comput.
[2] Y. Wang, H. Huang, Q. Xu, J. Liu, Y. Liu, and J. Wang, “Practical deep Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1712–1722.
raw image denoising on mobile devices,” in Proc. Eur. Conf. Comput. [28] S. Mohan, Z. Kadkhodaie, E. P. Simoncelli, and C. Fernandez-Granda,
Vis. (ECCV), Aug. 2020, pp. 1–16. “Robust and interpretable blind image denoising via bias-free convolu-
[3] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and- tional neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR),
play priors for model based reconstruction,” in Proc. IEEE Global Conf. 2020, pp. 1–22.
Signal Inf. Process., Dec. 2013, pp. 945–948. [29] M. E. Helou and S. Susstrunk, “Blind universal Bayesian image denois-
[4] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “BM3D-AMP: A new ing with Gaussian noise level learning,” IEEE Trans. Image Process.,
image recovery algorithm based on BM3D denoising,” in Proc. IEEE vol. 29, pp. 4885–4897, 2020.
Int. Conf. Image Process. (ICIP), Sep. 2015, pp. 3116–3120. [30] S. Anwar and N. Barnes, “Real image denoising with feature atten-
[5] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: tion,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
Regularization by denoising (RED),” SIAM J. Imag. Sci., vol. 10, no. 4, pp. 3155–3164.
pp. 1804–1844, 2017. [31] K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. Timofte,
[6] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser “Plug-and-play image restoration with deep denoiser prior,” IEEE
prior for image restoration,” in Proc. IEEE Conf. Comput. Vis. Pattern Trans. Pattern Anal. Mach. Intell., early access, Jun. 14, 2021, doi:
Recognit. (CVPR), Jul. 2017, pp. 3929–3938. 10.1109/TPAMI.2021.3088914.
[7] T. Meinhardt, M. Moeller, C. Hazirbas, and D. Cremers, “Learning [32] A. Krull, T.-O. Buchholz, and F. Jug, “Noise2Void–learning denoising
proximal operators: Using denoising networks for regularizing inverse from single noisy images,” in Proc. IEEE/CVF Conf. Comput. Vis.
imaging problems,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Pattern Recognit. (CVPR), Jun. 2019, pp. 2129–2137.
Oct. 2017, pp. 1781–1790. [33] T. Plötz and S. Roth, “Neural nearest neighbors networks,” in Proc. Adv.
[8] D. L. Donoho and J. M. Johnstone, “Ideal spatial adaptation by wavelet Neural Inf. Process. Syst. (NeurIPS), 2018, pp. 1–12.
shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994. [34] S. Gu, Y. Li, L. Van Gool, and R. Timofte, “Self-guided network for fast
[9] D. L. Donoho and I. M. Johnstone, “Adapting to unknown smooth- image denoising,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
ness via wavelet shrinkage,” J. Amer. Stat. Assoc., vol. 90, no. 432, Oct. 2019, pp. 2511–2520.
pp. 1200–1224, Dec. 1995. [35] J.-J. Huang and P. L. Dragotti, “LINN: Lifting inspired invertible neural
[10] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for network for image denoising,” in Proc. 29th Eur. Signal Process. Conf.
image denoising and compression,” IEEE Trans. Image Process., vol. 9, (EUSIPCO), Aug. 2021, pp. 23–27.
no. 9, pp. 1532–1546, Sep. 2000. [36] C. Ren, X. He, C. Wang, and Z. Zhao, “Adaptive consistency prior based
[11] T. Blu and F. Luisier, “The SURE-LET approach to image denoising,” deep network for image denoising,” in Proc. IEEE/CVF Conf. Comput.
IEEE Trans. Image Process., vol. 16, no. 11, pp. 2778–2786, Nov. 2007. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 8596–8606.
[12] M. Elad and M. Aharon, “Image denoising via sparse and redundant [37] D. L. Donoho and M. E. Raimondo, “A fast wavelet algorithm for image
representations over learned dictionaries,” IEEE Trans. Image Process., deblurring,” ANZIAM J., vol. 46, pp. C29–C46, Mar. 2004.
vol. 15, no. 12, pp. 3736–3745, Dec. 2006. [38] N. Pustelnik, A. Benazza-Benhayia, Y. Zheng, and J.-C. Pesquet,
[13] W. Dong, X. Li, L. Zhang, and G. Shi, “Sparsity-based image denois- “Wavelet-based image deconvolution and reconstruction,” in Wiley Ency-
ing via dictionary learning and structural clustering,” in Proc. CVPR, clopedia of Electrical and Electronics Engineering. Hoboken, NJ, USA:
Jun. 2011, pp. 457–464. Wiley, 2016, pp. 1–34.
[14] A. Buades, B. Coll, and J. M. Morel, “A non-local algorithm for image [39] B. Dong, H. Ji, J. Li, Z. Shen, and Y. Xu, “Wavelet frame based
denoising,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2, blind image inpainting,” Appl. Comput. Harmon. Anal., vol. 32, no. 2,
Jul. 2005, pp. 60–65. pp. 268–279, Mar. 2012.
[15] M. Mahmoudi and G. Sapiro, “Fast image and video denoising via [40] L. He and Y. Wang, “Iterative support detection-based split Bregman
nonlocal means of similar neighborhoods,” IEEE Signal Process. Lett., method for wavelet frame-based image inpainting,” IEEE Trans. Image
vol. 12, no. 12, pp. 839–842, Dec. 2005. Process., vol. 23, no. 12, pp. 5470–5485, Dec. 2014.
[16] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising [41] W. Sweldens, “The lifting scheme: A construction of second generation
by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. wavelets,” SIAM J. Math. Anal., vol. 29, no. 2, pp. 511–546, Jan. 1998.
Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007. [42] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into
[17] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm lifting steps,” J. Fourier Anal. Appl., vol. 4, no. 3, pp. 247–269, 1998.
minimization with application to image denoising,” in Proc. IEEE Conf. [43] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Scholkopf, “A machine
Comput. Vis. Pattern Recognit., Jun. 2014, pp. 2862–2869. learning approach for non-blind image deconvolution,” in Proc. IEEE
[18] J. Xu, L. Zhang, D. Zhang, and X. Feng, “Multi-channel weighted Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1067–1074.
nuclear norm minimization for real color image denoising,” in Proc. [44] L. Dinh, D. Krueger, and Y. Bengio, “NICE: Non-linear independent
IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1105–1113. components estimation,” 2014, arXiv:1410.8516.
4392 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 31, 2022
[45] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using [67] X. Liu, M. Tanaka, and M. Okutomi, “Single-image noise level estima-
real NVP,” 2016, arXiv:1605.08803. tion for blind denoising,” IEEE Trans. Image Process., vol. 22, no. 12,
[46] A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, “The reversible pp. 5226–5237, Dec. 2013.
residual network: Backpropagation without storing activations,” 2017, [68] S. Roth and M. J. Black, “Fields of experts,” Int. J. Comput. Vis., vol. 82,
arXiv:1707.04585. no. 2, pp. 205–229, 2009.
[47] J.-H. Jacobsen, A. W. Smeulders, and E. Oyallon, “I-RevNet: Deep [69] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
invertible networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2018, Surpassing human-level performance on ImageNet classification,” in
pp. 1–11. Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034.
[48] M. Xiao et al., “Invertible image rescaling,” in Proc. Eur. Conf. Comput. [70] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Vis. (ECCV), Aug. 2020, pp. 126–144. 2014, arXiv:1412.6980.
[49] C. Etmann, R. Ke, and C.-B. Schonlieb, “IUNets: Learnable invertible [71] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding
up- and downsampling for large-scale inverse problems,” in Proc. IEEE and evaluating blind deconvolution algorithms,” in Proc. IEEE Conf.
30th Int. Workshop Mach. Learn. Signal Process. (MLSP), Sep. 2020, Comput. Vis. Pattern Recognit., Jun. 2009, pp. 1964–1971.
pp. 1–6.
[50] L. Ardizzone, C. Lüth, J. Kruse, C. Rother, and U. Köthe, “Guided
image generation with conditional invertible neural networks,” 2019, Jun-Jie Huang (Member, IEEE) received the
arXiv:1907.02392. B.Eng. degree (Hons.) in electronic engineering and
[51] R. Zhao, T. Liu, J. Xiao, D. P. K. Lun, and K.-M. Lam, “Invertible image the M.Phil. degree in electronic and information
decolorization,” IEEE Trans. Image Process., vol. 30, pp. 6081–6095, engineering from The Hong Kong Polytechnic Uni-
2021. versity, Hong Kong, China, in 2013 and 2015,
[52] H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile respectively, and the Ph.D. degree from Imperial
image compression with wavelet-like transform,” IEEE Trans. Pattern College London (ICL), London, U.K., in 2019.
Anal. Mach. Intell., vol. 44, no. 3, pp. 1247–1263, Mar. 2022. From 2019 to 2021, he held a postdoctoral position
[53] H. Ma, D. Liu, R. Xiong, and F. Wu, “IWave: CNN-based wavelet-like with the Communications and Signal Processing
transform for image compression,” IEEE Trans. Multimedia, vol. 22, (CSP) Group, Department of Electrical and Elec-
no. 7, pp. 1667–1679, Jul. 2020. tronic Engineering, ICL. He is currently a Lecturer
[54] S. Li, Z. Zheng, W. Dai, J. Zou, and H. Xiong, “REV-AE: A learned with the College of Computer Science, National University of Defense Tech-
frame set for image reconstruction,” in Proc. IEEE Int. Conf. Acoust., nology (NUDT), Changsha, China. His research interests include the areas of
Speech Signal Process. (ICASSP), May 2020, pp. 1823–1827. computer vision, signal processing, and deep learning.
[55] Y. Liu et al., “Invertible denoising network: A light solution for real
noise removal,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2021, pp. 13365–13374.
[56] M. Shensa, “The discrete wavelet transform: Wedding the a trous
and Mallat algorithms,” IEEE Trans. Signal Process., vol. 40, no. 10, Pier Luigi Dragotti (Fellow, IEEE) received the
pp. 2464–2482, Oct. 1992. Laurea degree (summa cum laude) in electronic
[57] M. X. B. Rodriguez et al., “Deep adaptive wavelet network,” in engineering from the University of Naples Federico
Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, II, Naples, Italy, in 1997, and the master’s degree
pp. 3100–3108. in communications systems and the Ph.D. degree
[58] J. L. Starck, J. Fadili, and F. Murtagh, “The undecimated wavelet from the Swiss Federal Institute of Technology of
decomposition and its reconstruction,” IEEE Trans. Image Process., Lausanne (EPFL), Switzerland, in 1998 and in April
vol. 16, no. 2, pp. 297–309, Feb. 2007. 2002, respectively.
[59] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on He has held several visiting positions, in particular,
Matrix Manifolds. Princeton, NJ, USA: Princeton Univ. Press, 2009. he was a Visiting Student at Stanford University,
[60] F. Chollet, “Xception: Deep learning with depthwise separable convo- Stanford, CA, USA, in 1996; a Summer Researcher
lutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), in mathematics with the Department of Communications, Bell Labs, Lucent
Jul. 2017, pp. 1251–1258. Technologies, Murray Hill, NJ, USA, in 2000; a Visiting Scientist with the
[61] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding Massachusetts Institute of Technology (MIT) in 2011; and a Visiting Scholar
algorithm for linear inverse problems with a sparsity constraint,” Com- at Trinity College, Cambridge, in 2020. Before joining Imperial College
mun. Pure Appl. Math., vol. 57, no. 11, pp. 1413–1457, Aug. 2004. London in November 2002, he was a Senior Researcher at EPFL working
[62] K. Gregor and Y. LeCun, “Learning fast approximations of sparse on distributed signal processing for the Swiss National Competence Center
coding,” in Proc. 27th Int. Conf. Mach. Learn. (ICML), Jun. 2010, in Research on Mobile Information and Communication Systems. He is
pp. 399–406. currently a Professor of signal processing with the Department of Electrical
[63] J.-J. Huang and P. L. Dragotti, “Learning deep analysis dictionaries and Electronic Engineering, Imperial College London. His research interests
for image super-resolution,” IEEE Trans. Signal Process., vol. 68, include sampling theory, wavelet theory and its applications, computational
pp. 6633–6648, 2020. imaging, and sparsity-driven signal processing.
[64] J. Xue, Y. Zhao, S. Huang, W. Liao, J. C.-W. Chan, and S. G. Kong, Dr. Dragotti was an Elected Member of the IEEE Image, Video and Mul-
“Multilayer sparsity-based tensor decomposition for low-rank tensor tidimensional Signal Processing Technical Committee as well as an Elected
completion,” IEEE Trans. Neural Netw. Learn. Syst., early access, Member of the IEEE Signal Processing Theory and Methods Technical Com-
Jun. 18, 2021, doi: 10.1109/TNNLS.2021.3083931. mittee and the IEEE Computational Imaging Technical Committee. In 2011,
[65] Y. Bu et al., “Hyperspectral and multispectral image fusion via graph he was awarded the Prestigious ERC Starting Investigator Award (consolidator
Laplacian-guided coupled tensor decomposition,” IEEE Trans. Geosci. stream), and he is currently an IEEE SPS Distinguished Lecturer. He was
Remote Sens., vol. 59, no. 1, pp. 648–662, Jan. 2021. the Editor-in-Chief of the IEEE T RANSACTIONS ON S IGNAL P ROCESSING
[66] J. Xue, Y. Zhao, W. Liao, J. C. Chan, and S. G. Kong, “Enhanced sparsity (2018–2020), the Technical Co-Chair of the European Signal Processing
prior model for low-rank tensor completion,” IEEE Trans. Neural Netw. Conference in 2012, and an Associate Editor of the IEEE T RANSACTIONS
Learn. Syst., vol. 31, no. 11, pp. 4567–4581, Nov. 2020. ON I MAGE P ROCESSING from 2006 to 2009.