J2024-Towards Applicable Unsupervised Signal Denoising via Subsequence Splitting and Blind Spot Network
J2024-Towards Applicable Unsupervised Signal Denoising via Subsequence Splitting and Blind Spot Network
1053-587X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4968 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
In recent years, DL-based methods have gradually emerged input of the model is the entire array signal receiving matrix,
and become an important research direction in the field of de- no additional preprocessing work is required (e.g., flattening,
noising [6], [7], [8]. DL-based denoising methods can automat- chunking, and transformation [16], [17]), so the information
ically learn complex signal features and noise distributions to in the original signal and the correlation information between
achieve efficient denoising effects without explicit assumptions arrays are completely preserved, thereby avoiding information
of signal types. At present, the research on deep denoising loss as much as possible. The contributions of this paper can
methods mainly focuses on image and speech signals [9], [10], be summarized in the following four folds:
[11], and there are still few researches on array signal denoising. 1) We propose an unsupervised deep-learning signal de-
In addition, a considerable number of deep denoising methods noising method by constructing split subsequences and
are supervised methods [12], [13], [14], which require clean a blind-spot network is designed to further boost the de-
signals during training, which is difficult to achieve in some noising performances. Extensive experiments show that
signal processing problems, such as DOA and spatial spectrum our method can outperform previous traditional and ML-
estimation. Therefore, the denoising method that learning from based methods on single/array signal denoising tasks
only received signals is necessary. qualitatively and quantitatively.
From the perspective of denoising objects, denoising algo- 2) Our proposed method can handle denoising problems in
rithms can be divided into single-sensor signal denoising and single sensor signals and array signals under multiple
array signal denoising [15]. Existing array denoising methods array forms, such as Uniform Linear Array (ULA), Uni-
can be divided into two major categories. One is to directly form Rectangular Array (URA), Uniform Circular Array
use a single sensor denoising algorithm on each element in the (UCA), and Coprime Array (CA). It is also effective for
array, which is called the sensor-by-sensor method. The other non-Gaussian noise denoising problems, demonstrating
algorithm is often the extension of the single-sensor denoising the high practicability of our method.
method [15], [16]. By designing appropriate extension meth- 3) Some downstream applications are tested to prove the
ods, coherence between array sensors can be included in the effectiveness and efficiency of our method in the appli-
denoising process, thereby helping to improve the performance cation pipeline.
of array signal denoising algorithms. However, there are certain 4) Generalization experiments show that the proposed
limitations, such as requiring the signal to be known a priori or model has satisfactory generalization ability and can ef-
only being able to process Gaussian noise. In addition, due to fectively handle unseen situations.
the need for engineering implementation [16], these extension The rest of the paper is organized as follows: in Sect. II,
methods need to be approximately replaced by methods that we present related work on signal denoising. In Sect. III, we
are easy to implement, which undoubtedly introduces infor- give the mathematical principles and knowledge used in this
mation loss, resulting in a decrease in algorithm performance. paper. In Sect. IV, we introduce the theoretical background and
Therefore, there are few models that can process single-sensor component principles of the models used in this paper. In Sect.
signal and array signal denoising effectively and practically, V, we present the experimental results of the proposed method.
it is imminent to propose a denoising method that is highly In Sect. VI, we summarize and highlight the work.
effective and applicable to both single-sensor and array signals
simultaneously. II. RELATED WORKS
This paper introduces an innovative unsupervised denoising
A. Traditional Methods
method that can address the following challenges: 1) Lack of
a highly effective model that can be used for single-sensor Traditional single-sensor denoising method encompass tech-
signal and array signal denoising problems simultaneously. niques such as filtering and wavelet decomposition, which are
2) Information loss caused by manually designed features non-learning methods. Filtering methods require designing di-
and tedious hyperparameter tuning during denoising. 3) Lack verse filters to achieve denoising goals, such as median filters,
of effective unsupervised denoising methods applicable to mean filters [18], and Wiener filters [19]. These methods can
multiple array forms and non-Gaussian noise. 4) Supervised work well when designed properly, but in practice, it is often
learning denoising method which requires clean signals that difficult to obtain enough a priori information to accurately
cannot be obtained in some scenarios. It is distinctive of design the filter. For instance, implementing a Wiener filter
our method to denoise signals without imposing constraints on necessitates knowledge of the covariance matrix of clean sig-
signal distribution, relying solely on the prerequisite that the nals, which are frequently challenging to obtain in practical
noise possesses a zero mean and doesn’t need access to the scenarios.
clean signal. In our proposed method, the model learns features Wavelet decomposition methods are grounded in wavelet
directly from noisy signals. Mathematically, it can be strictly transforms, known for their robust data decorrelation and sig-
proved that this learning method is approximately learning di- nal energy compression capabilities. The concept of wavelet
rectly from clean signals under the condition that the number of denoising was first proposed by Donoho et al. [20], in which
samples is sufficient. In addition, since the method we proposed a wavelet denoising method with a hard threshold (HTWT)
only requires the mean value of the noise and does not constrict was given. Donoho et al. [21] proposed a soft threshold-
the form of the signal, this mathematical guarantee can be based wavelet transform (STWT) to solve the single-sensor
generalized in array signal processing scenarios, and since the signal denoising problem, which smoother denoising results
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4969
by using a continuous threshold function. Zhao et al. [22] are not suitable for direct application on the array denoising
introduced a compromise threshold-based wavelet transform problem.
(CTWT) denoising method, overcoming the discontinuous in
hard-threshold denoising and reducing the permanent bias in
soft-threshold denoising. The Wavelet-based empirical wiener B. Learning-Based Methods
filtering (WWEF) method was proposed by Sandeep et al. [23]. With the advancement of machine learning and deep learning
This approach employs two different wavelet transforms to techniques, there has been a notable surge in the proposal of
smooth the denoising outcome of the first wavelet transform, methods based on machining-learning (ML) and deep learning
extending it to encompass a broader coefficient range. Choi (DL) paradigms. Within the realm of ML-based approaches,
et al. [24] made an analysis of the WWEF method and proposed these methods primarily encompass optimization techniques
an iterative method based on the multi-wavelet basis to improve [28], [29], dictionary-based strategies [30], [31], as well as
the performance of WWEF. singular decomposition methodologies [32], [33]. Notably,
Transitioning to array signal denoising, the earlier single- principal component analysis (PCA), a widely adopted dimen-
sensor signal denoising methods can be readily extended. sionality reduction method, finds utility in tasks such as signal
Specifically, array signal denoising can be achieved by us- denoising [5], [34]. Sun et al. [4] introduced a method that
ing a single-sensor signal denoising method on each sensor, employs the least-square support vector machine (SVM) to ad-
which is called a “sensor-by-sensor” method. However, this dress lidar signal denoising. Similarly, Rojo et al. [35] leveraged
direct approach ignores the inter-sensor correlation and fails SVM techniques for the denoising of heart rate turbulence data.
to preserve the inter-sensor correlation information. Based on DL-based methods, encompassing both supervised and un-
this, a series of improved extension methods are proposed for supervised approaches, have exhibited remarkable efficacy in
array denoising to diminish information loss. One classic array signal denoising. Within the realm of supervised techniques, a
signal denoising strategy arises from the Temporal Wavelet common approach involves amassing sets of noisy-clean pairs
Array Denoising (TWAD) method proposed by Rao et al. [16]. or synthetic-noisy-clean triplets, followed by end-to-end net-
Before applying the wavelet denoising method, this method work training. Arsene et al. [8] investigated the performance
makes full use of the additional information provided by the of CNN and LSTM models for denoising electrocardiogram
array measurement, flattens the array received signal matrix signals, noting that CNN outperformed LSTM based on the
into a vector, and performs time decorrelation processing and RMSE metric. In the domain of seismic signal denoising, the
spatial decorrelation processing respectively. Another array sig- utilization of a UNet architecture [36] has been effective. To
nal denoising method was proposed by R. Sathish et al. in address the challenge of high-frequency loss during image de-
[25] (SWAD), which has the advantage of significantly re- noising, DnCNN [6] adopted a strategy wherein the network
ducing computational complexity at the expense of slightly predicted the residual discrepancies between clean and noisy
reducing SNR gain. Recently, Naveed et al. [15] proposed an images. In another vein, [12] embarked on the creation of a
efficient multivariate denoising technique using the multivari- library of core image patterns from the noisy input data. They
ate goodness-of-fit test (MGWD), which projects multichannel subsequently harnessed these foundational patterns to recon-
data into a single-dimensional space using the squared Ma- struct images within this pattern-defined space, achieving suc-
halanobis distance measure, and then perform a goodness-of- cessful denoising outcomes in the process.
fit test on multiple input data scales derived from the discrete Although the supervised methods can produce satisfactory
wavelet transform, so as to achieve the purpose of denoising. performances, in some circumstances, collecting lots of noisy-
While the aforementioned extension methods can enhance clean pairs is expensive. Therefore, many researchers turned
denoising performance to a certain extent compared to the to unsupervised methods. N2N [7] introduces unsupervised
sensor-by-sensor approach, they do have certain limitations. learning in image denoising using paired noisy data. N2V [37]
The TWAD method necessitates accurate prior knowledge of directly learns the consistency loss with a blind-spot network
spatial signal statistics for practical implementation, which is to avoid identity mapping. Laine et al. [38] takes the blind-
not attainable in some real-world scenarios. Despite the use of spot mechanism into the designing of the network. Recent
Discrete Fourier Transform (DFT) as an approximation to the works focus on the invertibility of neural networks. The added
original spatial decorrelation operation under unknown signal invertibility constraint has proven effective for tasks such as
statistics, this approximation remains exclusively applicable to blind source separation [39], [40] and image denoising [41],
uniform linear arrays (ULA), thereby imposing a limitation [42]. Since these invertibility properties are often integrated
on the scope of applicability. The SWAD method does not directly into the network, they introduce a stronger inductive
need to know the precise prior information of the signal, but it bias compared to blind spot networks. Moreover, there is the
depends on the appropriate selection of wavelet base and thresh- same term between blind spot networks and invertible networks,
old function, and cannot deal with complex and changeable “trivial solutions”. To avoid confusion, we discuss the different
scenes in practice. The MGWD method needs to project high- meanings of trivial solutions in the two kinds of networks, as
dimensional data to low-dimensional, resulting in a certain loss well as the relationship between them in the Materials Sect.
of information. Furthermore, the method has not been proven IX, available in the supplementary material. Some advanced
to work in the case of non-Gaussian noise. Although there are works concentrated on alternative training and inference: DIP
methods designed for non-Gaussian noise tackle [26], [27], they [43] exploited the prior of the CNNs’ learning process and
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4970 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
used it to denoise but suffered uncertain stop timestep and poor Ex|z (x) = x and Ey|z (y) = y + . The variance of y is σ 2 ,
performance. Self2Self (S2S) [9] leveraged the dropout to train such that,
the network and infer the output by averaging the multiple runs.
Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2
III. PRELIMINARY + 2Ex,z (fθ (x) − z). (7)
A. N2N Model The proof can be found in the Appendix. From Theorem.
N2N [7] provides a new perspective to denoise one given IV.1, we can see that the self-consistency loss of N2N can not
noisy signal without using any clean signal. Consider a clean be zero if the gap = 0, which means training with this loss
signal z and a noisy signal (or observation) x = z + n, where can not output the same results in a supervised manner (i.e.,
n is the noise, the joint distribution can be drawn, with clean signals). But if the gap is sufficiently small ( → 0),
the network can be regarded as training with supervised loss.
p(z, n) = p(z)p(n|z). (1) Specifically, when → 0, Eq. (7) becomes:
The distribution p(z) can be an arbitrary distribution satisfying
Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2 , (8)
p(z i |z j ) = p(z i ), (2)
where the left term represents the supervised loss, and the
which means two elements zi and zj are not statistically inde- first term on the right represents the unsupervised loss. The
noise n is assumed to be a conditional distribution
pendent. The difference between the two losses is a constant σ 2 . The role
p(n|z) = i p(ni |z i ). Therefore, the noise is conditionally of the loss function is to find the optimal network parameters
independent of the clean signal. Furthermore and empirically, θ∗ , and the constant σ 2 does not affect the result of θ∗ . In
the noise is often assumed to be zero means, other words, minimizing the left and right sides of Eq. (8) leads
to the same optimal solution θ∗ . Therefore, training with the
E[ni ] = 0, (3) unsupervised loss can be considered equivalent to training with
and it causes that, the supervised loss in this scenario.
To mitigate the problem of the need for noise-noise pair,
E[xi ] = z i . (4) we can construct several sub-noisy signal pairs D with some
similar constructors Di to satisfy the above condition. Take one
By utilizing this conclusion, N2N acquires multiple different sub-noisy signal pair as an example, the D = {D1 (x), D2 (x)}.
noisy signals under the same clean signal and trains the network Assume the two subsampled noisy signals are quite similar. We
to minimize the N2N loss, can reuse the N2N self-consistency loss,
LN 2N = fθ (y) − x22 , (5)
arg min Ex,z fθ (D1 (x)) − D2 (x)22 . (9)
θ
where x, y are two different noisy signals sharing the same z. In
this way, the denoised output fθ (y) can approximate the clean However, the gap between the two subsampled noisy signals
signal z. is not zero in practice, which is Ex|z D2 (x) − Ex|z D1 (x) = 0.
Then, directly reusing the loss may lead to suboptimal results.
IV. METHOD To solve this problem, we can introduce a regularized term to
fix the problem.
A. Therotical Background Proposition IV.1: Given a trained and optimal denoising net-
Considering the N2N background, training N2N does not work fθ∗ , which has the optimal denoising results, fθ∗ (x) = z
need to involve the ground truth. N2N tries to train the network and fθ∗ (Di (x)) = Di (z). Then, the following holds,
with self-consistency loss,
Ex|z {fθ∗ (D1 (x)) − D2 (x) − (D1 (fθ∗ (x)) − D2 (fθ∗ (x)))}
arg min Ex,y,z fθ (y) − x22 , (6) = D1 (z) − Ex|z D2 (x) − (D1 (z) − D2 (z))
θ
where x, y are two independent noisy observations that share = D2 (z) − Ex|z D2 (x) = 0. (10)
a clean unobserved signal z and f is the training network with
its parameters θ. The main drawback of N2N is the need for Eq. (10) constrains the output to be optimal. Therefore, a reg-
two paired noisy observations, which is usually unaffordable ularized loss is used,
to obtain in some signal processing scenarios. The other dis-
arg min Ex,z fθ (D1 (x)) − D2 (x)22
advantage is that the N2N self-consistency loss may encounter θ
the noise gap, which is described as follows, + αEx,z fθ (D1 (x))−D2 (x)−(D1 (fθ (x))−D2 (fθ (x)))22 .
Theorem IV.1: Consider two independent noisy observations (11)
x, y with the unobserved clean signal z, the noisy observations
x, y are independent conditioned on z: Ex,y|z = Ex|z Ey|z , Fig. 1(a) illustrates how to train an unsupervised denoiser uti-
and a gap := Ey|z (y) − Ex|z (x) = 0 exists with condition lizing the aforementioned proposition.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4971
Fig. 1. Training and inference stages of the proposed framework. (a) Training stage of our framework: first, the noisy signals are randomly split as
subsequence pairs. Second, one sub-noisy signal is fed into the denoising network to produce the sub-denoised signal. Finally, an unsupervised loss is used
to train the denoising network without using any groundtruth (i.e., clean signal). The regularization term in Eq. (11) is omitted for clearer illustration. (b)
Blind-spot convolution. The middle element of the convolution kernel is masked to avoid information leakage. (c) Our proposed dual branch blind-spot CNN.
K, S, P, and D denote convolution kernel size, stride, padding, and dilation, respectively. (d) Inference stage of our framework. The trained denoising network
is exploited to process the noisy signals into denoised (or near-clean) signals with large SNR gain, and the denoised signals can be used for downstream
applications such as DoA estimation and Estimated Number of Sources.
B. Subsequence Constructor
As defined in Eq. (11), we need I subsequence constructors
D to construct the noisy subsequence {D1 (x), · · · , DI (x)}.
A naive approach is randomly sampling the signal, but the
problem is that it cannot preserve the structure of the original
signal, in other words, D1 (x) may differ a lot from D2 (x). To
ensure the subsequences are similar, we propose a structure-
keeping subsequence construction technique.
To be specific, considering a signal x with shape RM ×L ,
our structural-keeping subsequence construction technique is
composed of three operations: 1) Unfold, 2) Shuffle, and 3)
Index. We first unfold it on the last dimension into the unfolded
signal x̂ ∈ RM ×I×(L/I) . Then, to preserve the structure of the
input signal, we choose to uniformly sample the unfolded signal
on I dimension. Finally, the i-th channel is indexed to form the
subsequence Di (x), which can be formulated as follows,
Di = Indexi ◦ Shuffle ◦ Unfold, (12)
Shuffle(x̂) = {x̂[:, :, j]}, j ∈ π({1, · · · , L/I}), (13)
Indexi (x̂) = x̂[:, i, :], (14)
where π ∈ Π denotes random permutation. An intuitive illustra- Fig. 2. Illustration of subsequence construction. ①, ②, ③ indicates Unfold-
ing, Shuffling and Indexing operations. M = 1 is here shown as a simple and
tion of the subsquence constructor is shown in Fig. 2. Now, we clear example.
begin by defining structure-keeping property and then introduce
a proposition to demonstrate that the proposed subsequence
constructor has this property.
Definition IV.1 (Structure-keeping property): Structure- by I: Xi = x(i−1) L +1 , . . . , xi L , i = 1, 2, · · · , I. After shuf-
I I
keeping of a signal sequence includes maintaining the basic fling these blocks and indexing certain elements within each
statistical properties of the signal (e.g., expectation) and the block, each of the constructed subsequences Si ∈ {(ΠX)i , i =
signal structure (e.g., autocorrelation function). 1, 2, · · · , I} maintains the statistical characteristics of X,
Proposition IV.2 (Structure-keeping property of subsequence particularly in terms of mean and autocorrelation, where the
constructor): Let X = [x1 , x2 , · · · , xL ] be a signal sequence subscript i denotes the Indexi operation and Π := {π} is the
divided into I blocks, each of length assuming L is divisible concatenation of each applied random permutations.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4972 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4973
Algorithm 1: Training Algorithm is the characteristic function of the stable distribution. In the
below formula,
input : Noisy signal dataset X, blind spot network fθ ,
number of subsequences I. tan πα 2 α = 1
Φ= . (22)
output: Trained network fθ∗ − π2 log |t| α = 1
// Network initialization. α, β, c, and μ are the characteristic index, dispersion parameter,
1 θ ← Init(θ) scale parameter, and location parameter, respectively. When
2 for x ← X do β = 0, the distribution is in a symmetric form, and the stable
// Eqs. (12)–(14)
distribution degenerates into a symmetric stable distribution.
3 D ← SubsequenceConstruct(I) According to [17], impulsive noise in radar can be modeled as
4 L←0 a symmetric stable distribution. This paper will persist with this
5 for i ← range(I) do configuration and address the impulsive noise denoising issue
// Avoid index out of range.
described by a symmetrical stable distribution in the subsequent
6 j ← mod(i + 1, I)
sections.
7 xi , xj ← Di (x), Dj (x)
2) Data Generation: We train and test the proposed model
// Compute loss, see Eq. (11).
using simulated data, which is generated with five common
8 Li ← fθ (xi ) − xj 22 + αfθ (xi ) − xj −
types of classic signals: sine, chirp, square, blocks, and spike.
(Di (fθ (x)) − Dj (fθ (x)))22
Each of them is defined as below,
9 L ← L + Li
10 end z sine (t) = sin(f0 t + ψ0 ), (23)
// Gradient descent and update parameters.
θ ← arg minθ L where f0 is central frequency and ψ0 is initial phase.
11 end z chirp (t) = sin(f0 t + (f1 − f0 )t2 + ψ0 ), (24)
12 θ∗ ← θ
13 return θ∗ where f1 − f0 controls the speed of frequency change.
+∞
z square (t) = A0 (u (t − nT0 + t0 ) − u (t − nT0 − t0 )),
n=−∞
Algorithm 2: Effecient inference (25)
input : Noisy signal x, trained blind spot network fθ∗ , where A0 , T0 , and t0 are the amplitude, period, and time slot
number of iterative smoothing N . respectively.
output: Denoised signal s.
1 s←x
+∞
L
// Iterative smoothing.
z blocks (t) = Ai (u (t − nTi + ti ) −u (t−nTi −ti )) ,
n=−∞ i=1
2 for n ←range(N) do
(26)
3 s ← fθ∗ (s)
4 end where Ai represents the amplitude of i-th subsignal, and L
5 return s represents the number of different squares. The spike signal
z spike (t) is generated by α-stable distribution(see Sect. V-A1).
For the single-signal scenario, we generate 4000 data for train-
ing and testing for each type of signal. For array signal, we
the network multiple times may result in a slight decrease in
use the Uniform Linear Array (ULA), the number of sensors is
output SNR, but the resulting reduction in glitches is worth it.
M = 8, the number of sources is K = 3, the arrival direction of
V. EXPERIMENTS each signal source is randomly selected from −60 degrees to 60
degrees, the number of snapshots is N = 1024, and the inter-
A. Datasets space of the sensor is d = λ2 , where λ is the wavelength of the
1) α-Stable Distribution: α-stable distribution, also known signal. For each type of signal above, we generate 500 samples.
as the stable distribution, is frequently employed for character- In the discussion experiments, we generated data under the
izing random variables or processes that exhibit heavy-tailed array form of URA, UCA, and CA, as well as array data with
behavior. Within a stable distribution, the sum of independently different degrees of signal correlation.
and identically distributed random variables retains the same
distribution. The probability density of stable distribution is B. Benchmarking
depicted as follows, For single-sensor signal denoising task, we implement sev-
+∞
1 eral methods for comprehensive comparisons. These meth-
f (x; α, β, c, μ) = ϕ(t; α, β, c, μ)e−itx dt, (20) ods contain soft threshold-based wavelet transform (STWT)
2π −∞
[21], compromise threshold-based wavelet transform (CTWT)
where,
α
[22], hard threshold-based wavelet transformer (HTWT) [20],
ϕ(t; α, β, c, μ) = eitμ−|ct| (1−iβ sign(t)Φ(α,t))
, (21) wavelet-based empirical wiener filter (WWEF) [23], principal
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4974 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
component analysis (PCA) [34], median filter (MF) [18], mul- inner channel is chosen at 64, and the dual branch is equipped
tivariate goodness-of-git gests (MGWD) [15], and Self-to-Self with 12 residual blocks. Our optimization approach involves
(S2S) [9]. For the array signals denoising task, we compare the the utilization of the AdamW [49] optimizer, and we train the
array extended versions of the above methods except for MF, network for 200 epochs. The initial learning rate is initialized
and newly added the TWAD [16] and JBRI for Gaussian white at 10−3 and subsequently adjusted to 10−4 after the first 50
noise denoising and Impulsive noise denoising method respec- epochs. Across all experiments, we consistently employed 8
tively. In addition, since the original S2S model is designed for subsequences (i.e., I), and a comprehensive analysis of the
image denoising, we extend it to an array denoising form in impact of I is presented in the Materials Sect. XI, available in
this paper. The traditional methods are tested with meticulous the supplementary material.
parameter tuning, while the ML-based methods and DL-based
methods are trained with full convergence. E. Main Results
1) Single-Sensor Signal Denoising Results: In this section,
C. Metrics we explore the denoising performance of our proposed model
We choose the two metrics: Gain of SNR (GSNR) and Gain on single-sensor signal under Gaussian white noise. We select
of MSE (GMSE), to describe the performance of the denoising five types of signals: sine, chirp, square, blocks, and spike as
method. For a received array signal containing K sources, we input, and set three sub-experiments in each type of signal. The
have, input SNRs are −5 dB, 0 dB, and 5 dB, corresponding to the
input data under different noise conditions. For each SNR, we
X = AZ + N , (27) calculate the corresponding MSE index to fully describe the
where A ∈ CM ×K represents the array manifold matrix, Z ∈ effect of the denoising algorithm.
CK×N represents the spatial signal matrix, and N ∈ CM ×N The experimental results are shown in Table I, we compare
represents the spatial noise matrix. The signals are denoised a total of nine methods including the proposed method. It
by the network fθ (·) and the denoised signals are defined as can be seen that for sinusoidal signals, our proposed method
fθ (X). outperforms other methods in most cases, S2S and MGWD
The GSNR is defined as, methods also remain certainly competitive. At 5 dB of the
sinusoidal signal, the performance of the MGWD method is on
GSN R = SN Rout − SN Rin . (28) par with our method. The superior performance of our proposed
method comes from the self-reasoning ability of the model,
where, which avoids information loss caused by artificially designing
AZ2F criteria or features. Our model also achieves satisfactory results
SN Rin = 10 log10 , (29)
N 2F at runtimes.
For the chirp signal, the GSNR of our proposed method is
AZ2F lower than that of the sinusoidal signal, it may be caused by the
SN Rout = 10 log10 . (30)
f θ (X) − AZ2F chirp signal has a wider frequency spectrum than the sinusoidal
|| · ||F is Frobenius norm. The ideal value of GSNR is infinite, signal and contains richer frequency domain information, which
and the larger GSNR indicates better denoising performance. increases the difficulty of denoising. However, our proposed
The GMSE is defined as, method still outperforms other methods in all three cases.
For square signal, the WWEF method slightly outperforms
GM SE = M SEin − M SEout , (31) our proposed method at −5 dB, and the PCA method also shows
decent performance. In the other cases, our proposed method
where,
GSNR far outperforms the remaining methods. It is worth not-
M SEin = X − AZ2F , (32) ing that the performance of the PCA method largely depends
on the selection of the principal component dimension, and the
M SEout = fθ (X) − AZ2F . (33) performance of the WWEF method depends on the selection
The ideal GMSE metric is finite and equals to M SEin . GSNR of the two wavelet bases and the threshold functions, but our
and GMSE measure the degree of noise removal and signal proposed method does not need to manually set parameters, the
distortion respectively. model relies on its reasoning ability to learn deep features in
noisy signals.
Our method demonstrates clear superiority in handling block
D. Implementation Details
and spike signals, which are complex forms of communication.
We implement our methodology on a workstation featuring The S2S and MWGD also show their advance. MF method
an Intel i9 CPU core, coupled with two NVIDIA 4090 GPUs. In often proves inadequate when tackling denoising problems in-
the context of single-sensor signals, both the input and output volving complex signal forms. Similarly, PCA is prone to en-
channels are configured at 1. However, in scenarios involving countering information loss during their projection operations.
array signals, we synthesize an 8-sensor dataset and concatenate They even have negative gains when processing spike signals,
real and imaginary parts, thereby necessitating a configura- which may be attributed to the fact that each pulse of the spike
tion of 16 channels for both input and output. The network’s signal lasts for a short time and is mixed in the noise, making
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4975
TABLE I
COMPARISONS WITH PREVIOUS WAVELET-BASED, ML-BASED, AND TIME-DOMAIN FILTER METHODS ON SINGLE-SENSOR SIGNAL OF DIFFERENT
SIGNAL TYPES UNDER GAUSSIAN WHITE NOISE. GAINS OF SNR AND MSE ARE REPORTED. THE BEST RESULTS ARE IN RED AND THE SECOND BEST
RESULTS ARE IN BLUE. * DENOTES THE TESTING RUNTIME ON THE NVIDIA 4090 GPU PLATFORM AND ** MEANS THE TESTING RUNTIME ON THE
INTEL 12TH I9 CPU PLATFORM
Input GSNR(dB)/GMSE
Signal type
SNR(dB)/MSE
STWT [21] CTWT [22] HTWT [20] WWEF [23] PCA [5] MF [18] MGWD [15] S2S [9] Proposed
-5.00/1.520 8.97/1.320 8.91/1.323 8.27/1.311 11.84/1.410 9.26/1.342 8.59/1.312 14.82/1.468 15.92/1.476 16.50/1.507
Sine 0.00/0.511 8.92/0.452 8.82/0.441 6.32/0.387 12.17/0.478 9.28/0.447 8.55/0.436 14.86/0.490 14.72/0.479 15.04/0.489
5.00/0.173 8.93/0.151 8.89/0.149 5.36/0.132 12.08/0.158 9.33/0.149 11.40/0.156 14.60/0.161 14.08/0.159 14.60/0.161
-5.00/1.500 6.95/1.211 7.12/1.201 4.94/1.021 7.29/1.221 8.05/1.267 5.22/1.050 7.16/1.214 7.88/1.278 9.88/1.326
Chirp 0.00/0.502 5.40/0.351 5.51/0.350 3.34/0.261 4.86/0.342 9.72/0.447 4.84/0.336 6.67/0.392 8.01/0.419 11.15/0.452
5.00/0.169 4.25/0.112 4.47/0.111 3.47/0.091 2.88/0.079 8.31/0.142 3.97/0.100 5.94/0.124 7.92/0.133 11.04/0.151
-5.00/2.972 8.78/2.571 8.50/2.551 8.00/2.191 19.70/2.970 18.10/2.960 11.33/2.784 11.12/2.805 15.29/2.834 18.88/2.960
Square 0.00/1.001 8.47/0.861 8.39/0.849 9.46/0.890 16.80/0.801 18.08/0.985 9.63/0.891 9.85/0.891 15.20/0.947 19.92/0.996
5.00/0.333 7.61/0.269 7.25/0.272 10.74/0.241 13.91/0.317 17.14/0.326 7.97/0.279 7.05/0.270 14.88/0.302 19.73/0.328
-5.00/16.457 8.73/14.341 8.72/14.241 11.38/12.890 8.17/15.101 8.55/14.318 8.42/14.248 12.31/15.530 12.33/15.572 15.00/15.885
Blocks 0.00/5.103 8.38/4.288 8.29/4.361 9.65/3.543 6.10/4.158 8.38/4.692 9.90/4.933 10.18/4.561 10.83/4.681 13.88/5.091
5.00/1.583 7.43/1.312 7.30/1.289 6.95/1.311 7.54/1.297 4.93/1.141 7.87/1.320 7.02/1.041 7.92/1.196 14.02/1.514
-5.00/0.015 5.29/0.011 5.37/0.011 6.38/0.012 5.72/0.011 4.77/0.010 4.78/0.010 10.32/0.014 10.45/0.014 12.04/0.014
Spike 0.00/0.0050 3.31/0.0027 3.43/0.0029 4.32/0.0031 2.93/0.0024 0.003/0.0000 -0.03/0.0000 9.33/0.0044 9.12/0.0044 11.06/0.0046
5.00/0.0017 1.21/0.0004 1.09/0.0005 2.26/0.0006 0.78/0.0003 -0.31/-0.0001 -0.43/-0.0002 7.88/0.0014 7.96/0.0014 10.46/0.0015
Runtime(s)/sample 0.003 0.003 0.003 0.005 0.0005 0.002 10.480 0.900 0.00005* /0.0074**
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4976 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
Fig. 5. Single-sensor signal denoising task with “block” signal type and 5dB input SNR. Comparisons with other methods are clearly illustrated. Blue lines
denote the signals and the dashed lines denote the clean signal (i.e., without any noise).
TABLE II
COMPARISONS WITH PREVIOUS WAVELET-BASED AND ML-BASED METHODS ON ARRAY SIGNAL OF DIFFERENT SIGNAL TYPES UNDER GAUSSIAN WHITE
NOISE. GAINS OF SNR AND MSE ARE REPORTED. THE BEST RESULTS ARE IN RED AND THE SECOND BEST RESULTS ARE IN BLUE. THE PINK CELL
COLOR REPRESENTS THE ‘‘SENSOR-BY-SENSOR’’ METHODS, THE ORANGE CELL COLOR REPRESENTS THE EXTENSION OF SINGLE-SENSOR METHODS AND THE
GREEN CELL COLOR REPRESENTS THE DL METHODS. * DENOTES THE TESTING RUNTIME ON THE NVIDIA 4090 GPU PLATFORM AND ** MEANS THE
TESTING RUNTIME ON THE INTEL 12TH I9 CPU PLATFORM
Input GSNR(dB)/GMSE
Signal Type
SNR(dB)/MSE
STWT [21] CTWT [22] HTWT [20] WWEF [23] TWAD [16] MGWD [15] PCA [5] S2S [9] Proposed
-5.00/2.283 6.91/1.420 7.24/1.443 9.82/1.668 10.32/1.991 11.89/2.121 15.59/2.217 10.26/1.962 15.40/2.208 15.63/2.222
Sine 0.00/0.751 6.12/0.449 7.33/0.465 8.13/0.531 7.29/0.458 10.24/0.664 14.58/0.734 9.88/0.590 14.89/0.745 15.02/0.756
5.00/0.251 4.31/0.166 5.34/0.178 7.24/0.196 7.33/0.199 8.22/0.214 13.40/0.244 7.02/0.189 13.22/0.240 13.26/0.241
-5.00/2.256 4.58/1.459 4.87/1.522 4.68/1.476 4.98/1.529 5.01/1.531 5.99/1.835 5.09/1.562 6.64/1.883 8.74/1.952
Chirp 0.00/0.750 0.92/0.143 1.95/0.268 2.05/0.281 4.77/0.532 3.22/0.341 4.38/0.507 3.20/0.329 5.48/0.549 7.82/0.621
5.00/0.249 0.86/0.040 1.02/0.058 0.79/0.037 1.81/0.072 4.38/0.152 4.47/0.157 3.99/0.110 5.69/0.188 8.45/0.238
-5.00/4.508 6.06/3.299 6.28/3.368 6.17/3.337 7.01/3.498 7.12/3.660 8.27/4.036 7.11/3.592 10.26/3.890 12.43/4.392
Square 0.00/1.502 2.02/0.520 3.07/0.736 3.23/0.765 5.11/1.003 4.85/0.945 6.02/1.219 4.70/0.918 8.29/1.193 10.48/1.360
5.00/0.501 0.53/0.104 0.92/0.075 1.33/0.116 1.71/0.189 1.87/0.197 2.89/0.241 2.71/0.228 6.22/0.329 9.04/0.432
-5.00/22.899 12.45/21.597 12.40/21.582 12.17/21.511 13.12/22.012 12.42/21.592 12.98/21.974 12.20/21.096 13.52/22.072 14.22/22.237
Blocks 0.00/8.007 9.92/7.225 10.07/7.249 9.87/7.214 10.43/7.401 10.53/7.494 10.48/7.446 9.98/7.288 11.89/7.668 13.06/7.981
5.00/2.506 6.54/1.964 7.12/2.031 7.10/2.029 7.51/2.164 7.32/2.064 7.52/2.165 7.33/1.142 9.82/2.098 11.25/2.482
Spike -5.00/0.046 5.03/0.037 5.36/0.038 6.14/0.039 2.39/0.020 4.64/0.036 8.98/0.042 2.31/0.024 9.39/0.042 11.03/0.045
Runtime(s)/sample 0.035 0.031 0.036 0.058 0.048 14.820 0.0006 2.300 0.0002* /0.0138**
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4977
TABLE III
COMPARISONS WITH PREVIOUS WAVELET-BASED AND ML-BASED METHODS ON ARRAY SIGNAL OF DIFFERENT SIGNAL TYPES UNDER IMPULSIVE NOISE.
GAINS OF SNR AND MSE ARE REPORTED. THE BEST RESULTS ARE IN RED AND THE SECOND BEST RESULTS ARE IN BLUE. THE PINK CELL COLOR
REPRESENTS THE ‘‘SENSOR-BY-SENSOR’’ METHODS, THE ORANGE CELL COLOR REPRESENTS THE EXTENSION OF SINGLE-SENSOR METHODS AND THE GREEN
CELL COLOR REPRESENTS THE DL METHODS
Input GSNR(dB)/GMSE
Signal Type
SNR(dB)/MSE
STWT [21] CTWT [22] HTWT [20] WWEF [23] JBRI [26] MGWD [15] PCA [5] S2S [9] Proposed
-5.00/3.892 5.29/1.626 3.54/1.219 2.52/0.895 2.95/1.163 8.32/1.893 9.73/2.093 8.49/1.974 14.36/2.920 19.24/3.664
Sine 0.00/1.204 4.87/0.585 3.42/0.439 2.59/0.324 2.72/0.447 7.64/0.701 8.93/0.89 7.98/0.721 12.29/0.998 17.19/1.182
5.00/0.257 4.53/0.144 2.69/0.090 2.31/0.080 2.88/0.104 6.03/0.166 7.73/0.182 6.19/0.170 10.21/0.203 14.679/0.241
-5.00/3.851 2.61/1.466 2.02/1.101 1.36/0.734 2.91/1.305 5.62/2.231 6.94/2.413 2.90/1.803 9.98/2.873 14.89/3.639
Chirp 0.00/1.203 1.10/0.308 1.38/0.268 1.08/0.180 2.61/0.350 5.68/0.670 5.80/0.683 2.57/0.585 8.63/0.831 12.21/1.182
5.00/0.318 -0.03/0.033 0.68/0.053 0.81/0.043 1.92/0.095 4.22/0.149 4.47/0.157 2.00/0.127 5.23/0.219 8.49/0.308
-5.00/7.830 3.32/3.354 2.46/2.504 1.69/1.724 3.65/3.755 5.13/5.232 5.09/5.011 4.35/4.663 12.82/6.401 17.18/7.655
Square 0.00/2.693 1.56/0.745 1.37/0.595 0.91/0.391 2.67/1.257 4.34/1.623 4.31/1.612 3.81/1.409 10.35/2.003 14.65/2.562
5.00/0.649 -0.57/0.049 0.92/0.075 -0.39/0.010 2.24/0.172 3.35/0.281 3.19/0.269 2.98/0.231 6.97/0.559 10.14/0.634
-5.00/47.293 5.27/17.204 3.53/12.946 2.36/9.485 2.79/10.522 6.89/18.192 12.98/21.974 5.93/20.039 14.59/39.790 18.07/45.694
Blocks 0.00/15.027 4.78/6.635 3.32/4.905 2.34/3.508 2.67/3.962 6.80/7.036 10.48/7.446 5.19/6.993 13.23/12.193 16.24/13.326
5.00/2.593 3.50/0.829 2.53/0.640 1.75/0.213 2.36/0.559 5.98/1.982 7.52/2.165 3.88/1.205 9.93/2.231 11.25/2.482
the preservation of information between channels by these ex- wavelet denoising can select an appropriate threshold relatively
tension methods. Our method still shows the best performance, easily. But for impulse noise, due to its irregular character-
which is consistent with our above analysis. Since our model istics, the selection of the threshold may be more challeng-
can be accelerated by the GPU, it requires minimal runtimes ing, and more complex methods may be required to determine
when dealing with array signal denoising problems. the appropriate threshold, which to some extent leads to the
3) Impulsive Noise Array Denoising Results: In this degradation of the performance of the wavelet method. The
section, we explore the denoising performance of our proposed JBRI method, purpose-built for exclusive handling of impulse
model on array signals under Impulsive noise. The selection of noise, showcases state-of-the-art performance among sensor-
experimental signals is the same as that of the previous group by-sensor methods due to its effective inversion gamma struc-
of experiments. Three groups of sub-experiments are set in ture modeling. Nonetheless, this effectiveness hinges on em-
each type of signal, corresponding to the average SNR of the pirical factors and is sensitive to the specific form of impulsive
received signal on each sensor is −5 dB, 0 dB, and 5dB, and noise, rendering it less robust in dynamic and changeable real-
the MSE is also given to describe the denoising performance world scenarios.
in an all-round way. Our method effectively addresses the aforementioned chal-
The experimental results are shown in Table III. A lenges, sidestepping explicit feature engineering and criteria
comparison is carried out among nine techniques, encom- design. Instead, it leverages the inferential capability of the
passing the proposed approach. Differing from the array deep model to extract profound signal information, resulting
denoising endeavor executed under Gaussian noise, in this in robust denoising outcomes. The noisy signal and denoised
case, the TWAD method encounters difficulties in handling result plot can still be observed in Figs. 14(b) and 16 in the
impulse noise. Thus, we substitute the TWAD method with supplementary material for visual verification, respectively.
the JBRI method and extend it into the array denoising
problem. Observing the overall results, the performance of
F. Downstream Application
non-DL-based methods tends to degrade when subjected to
heightened impulsive noise under the same SNR. In contrast, 1) DOA: In order to further verify the practicability of our
the performance of DL-based methods, including our proposed proposed model, we added the proposed denoising method
method, demonstrates improvement under impulsive noise. to specific array signal processing scenarios, including DOA,
This can be attributed to the fact that deep convolutional neural estimated number of sources, and spatial spectrum estimation.
networks are sensitive to low-frequency information but not to We compare the performance changes of the algorithm before
high-frequency information. This sensitivity enables them to and after denoising. We still use the ULA array form when
exhibit strong robustness in the presence of impulse noise, a generating the data, the signal form is sinusoidal signals, and
feature that non-deep learning-based methods lack. fix the DOA to be 10 degrees, 15 degrees, and 40 degrees
For the wavelet denoising method, on the one hand, the respectively.
impulse noise leads to drastic changes in some wavelet co- Apply the classic MUltiple SIgnal Classification (MUSIC)
efficients, so the wavelet-based denoising method cannot ac- method to conduct 100 DOA estimation experiments, and the
curately identify and deal with the impulsive noise. On the results are shown in Fig. 6. Fig. 6(a) shows the estimated
other hand, for Gaussian noise, due to its statistical properties, spectrum of DOA under the application of noisy signal (-5.00
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4978 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
Fig. 6. (a) MUSIC spectrum. (b) normalized eigenvalues of the DOA estimation task. Comparisons among the noisy signal (−5.00 dB), clean signal, and
denoised signal (10.63 dB) are clearly illustrated.
dB), clean signal, and denoised signal (10.63 dB), respectively. TABLE IV
It can be seen that the introduction of the denoising algorithm ACCURACY COMPARISON OF THE NUMBER OF SOURCES
ESTIMATION METHODS BEFORE (0.00 DB) AND AFTER
improves the DOA estimation performance due to the improved DENOISING (15.02 DB)
SNR. DOA estimation using the original noisy signal cannot
distinguish the two directions of arrival of 10 degrees and 15 Accuracy(%)
Methods
degrees, but these two directions can be successfully distin- Noisy Signal Denoised Signal
guished after denoising, which verifies the effectiveness of our
method in DOA estimation. Fig. 6(b) shows the normalized AIC 64.0 80.0
MDL 59.5 96.0
eigenvalue distribution of the covariance matrix of the noisy
signal (−5.00 dB), clean signal, and denoised signal (10.63
dB). It can be seen that the difference in the eigenvalues of 3) Spatial Spectrum Estimation: This section aims to
the noisy signal is not obvious, which makes the MUSIC al- validate the performance enhancement introduced by the pro-
gorithm unable to accurately distinguish the signal number of posed denoising method when applied to the spatial spectrum
sources. After denoising, the difference in eigenvalue increases estimation algorithm. For this purpose, let us assume the signal
significantly, and the first three eigenvalues are significantly directions to be 10 degrees, 15 degrees, and 40 degrees. As
larger than the last five eigenvalues, which shows that our revealed in Fig. 7, the experimental results distinctly illustrate
denoising algorithm effectively suppresses noise and improves the great impact of applying the denoising algorithm, which
resolution. notably enhances the resolution, effectively generating narrow
2) Estimated Number of Sources: The estimation of the lobes in the direction of the signal, and significantly suppressing
number of sources is a common problem in signal processing. power levels in other directions. After the denoising process,
In this section, we will explore the effectiveness of the the previously ambiguous 10-degree and 15-degree directions
proposed denoising framework for the problem of source exhibit a distinct demarcation, highlighting the efficacy of our
number estimation. We have chosen the two most classic proposed denoising method in the context of spatial spectrum
source number estimation methods, Akaike Information Crite- estimation.
rion (AIC) [50] and Minimum Description Length (MDL) [51], By applying the denoising method to DOA estimation, esti-
as the comparative methods for the experiments in this section. mated number of sources, and spatial spectrum estimation, we
The signals used in the experiments are sinusoidal signals under have validated the practicality of the proposed denoising model
a uniform array. We randomly selected 200 samples for source in certain array processing tasks. In fact, the denoising method,
number estimation and calculated the estimation accuracy. The serving as a versatile technique, can be extended to various
experimental results are shown in Table IV, which indicates downstream tasks (such as Synthetic Aperture Radar, Array
that the denoised signals exhibit a significant improvement Optimization, and Remote Sensing) to enhance the performance
in SNR, resulting in a clearer distinction between the signal of different algorithms.
subspace and the noise subspace. Consequently, both AIC and
MDL methods exhibit superior performance, consistent with
the analysis presented in Section V-F. It is noteworthy that after G. Ablation Study
denoising, the performance of the MDL method surpasses that 1) Iterative Denoising: For certain radar systems, sensitivity
of the AIC method. This is because the MDL method provides to signal glitches is a significant concern. As a solution, we
consistent estimates, while the estimates provided by the AIC introduce iterative smoothing. In simple terms, we repeatedly
method tend to be overestimated. input the signal into the trained network for denoising. Through
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4979
TABLE V
DISCUSSION ON COHERENT SIGNAL SCENARIO
UNDER DIFFERENT CORRELATION COEFFICIENTS ρ.
THE INPUT SIGNAL IS ‘‘CHIRP’’ UNDER 5.00 DB
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4980 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
Fig. 8. Illutrations of Iterative smoothing on a single-sensor signal with “blocks” type (−5 dB). Notice that the spikes within the red boxes have been
eliminated.
TABLE VII
FLOPS AND LATENCY FROM 1 TO 3 FORWARD PASSES. THE
LATENCY IS TESTED ON AN NVIDIA RTX 4090 GPU I. Generalization Study
N Forward Passes FLOPs (G) Latency (s) Since situations in the real world may be more varied, this
1 1.78 0.00005 section discusses the generalization ability of the proposed
2 3.56 0.00011 model. We set up two sets of generalization experiments: gen-
3 5.34 0.00014 eralization on signal type and generalization on correlation.
1) Generalization on Signal Type: In this section, we inves-
tigate the generalization ability of our model across different
hits, GPU caching bottlenecks, and memory access efficiency. signal types. We still select five signal types (single-sensor,
We provide the FLOPs for a single forward pass of the network 5dB): Sine, Chirp, Square, Blocks, and Spike. Our model is
as 1.78G. This is a reasonably small amount of FLOPs, allowing trained exclusively on the Sine signal and then tested directly on
for relatively fast execution on any type of GPU or CPU. About the other signal types using the trained model. The results are
multiple forwards during inference, since inference does not shown in Table VIII. The numbers in brackets represent the dif-
require maintaining a computational graph or backpropagation, ference between the generalized performance and the original
performing the forward pass n times will multiply the network’s performance (see Table I). As can be seen, our method achieves
FLOPs by n. We have also included the FLOPs and latency for good performance even when trained on only one type of signal
1 ∼ 3 forward passes in Table VII. and tested on other various types of signal. The performance
4) Limitations: In this section, we discuss the limitations degradation due to generalization is minimal. For example, the
of the proposed model. Since we assume that the noise is generalized results GMSE on the Chirp is only 0.60 dB lower
independent, the biggest limitation of the proposed model is than the GMSE obtained when tested after training on the Chirp,
that it cannot handle structured noise, i.e., non-independent which demonstrates that our model has a strong generalization
noise. In the blind spot network, since the center point is oc- capability across different signal types. In addition, to more
cluded, the model can only infer the information of the center intuitively demonstrate the effect of generalization, we give the
point through the surrounding points. If the noise points are waveforms of each type of signal before and after denoising in
independent, then the surrounding points have no information the generalization experiment. (see Figs. 9 and 10).
related to the noise of the center point. The model will tend 2) Generalization on Correlation: In this section, we in-
to output the mean of various possible cases (which is 0 in vestigate the generalization capability of the proposed model
our assumption). The signals are not independent, so the model with respect to correlation. Specifically, we trained the model
can estimate the signal value of the center point through the using array chirp signals at 5dB with ρ = 0.5, and then tested
surrounding points. The blind spot network essentially performs the trained model on three sets of chirp signals with ρ =
the denoising task by this simple statistical fact. Therefore, 0.4, 0.8, 1.0 directly. The experimental results are shown in
if we deal with structured noise, the blind spot network may Table IX. As observed, compared to the performance obtained
infer part of the noise information of the center point from by testing after training on different ρ values separately (see
the surrounding point, resulting in poor performance. In fu- Table V), the generalization performance shows only a slight
ture work, we will seek effective methods for structured noise decrease. This effectively demonstrates the adaptability of our
removal. proposed model to varying levels of correlation.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4981
APPENDIX
Theorem A.1: Consider two independent noisy observations
x, y with the unobserved clean signal z, the noisy observations
x, y are independent conditioned on z: Ex,y|z = Ex|z Ey|z ,
and a gap := Ey|z (y) − Ex|z (x) = 0 exists with condition
Ex|z (x) = x and Ey|z (y) = y + . The variance of y is σ 2 ,
such that,
Fig. 9. Illutrations of noisy signals (5.00 dB) in generalization experiments. Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2
+ 2Ex,z (fθ (x) − z). (35)
Proof: We simplify the equation,
Ex|z fθ (x) − z22 = Ex,y|z (fθ (x) − y) + (y − z)22
= Ex,y|z fθ (x) − y22 + Ey|z y − z22
+ 2Ex,y|z (fθ (x) − y) (y − z)
= Ex,y|z fθ (x) − y22 + σ 2
+ 2Ex,y|z (fθ (x) − z + z − y) (y − z)
= Ex,y|z fθ (x) − y22 + σ 2
+ 2Ex,y|z (fθ (x) − z) (y − z)
+ 2Ey|z (z − y) (y − z)
= Ex,y|z fθ (x) − y22 − σ 2
Fig. 10. Illutrations of denoised signal in generalization study. “Test”
represents the results of training on each type of signal and test on them. + 2Ex,y|z (fθ (x) − z) (y − z). (36)
“Generalized” represents the results of training only on the Sin signal and
test on each type of signal. The independence exists between x and y conditioned on z,
the following holds,
TABLE IX Ex|z fθ (x) − z22 = Ex,y|z fθ (x) − y22 − σ 2
GENERALIZATION STUDY ON CORRELATION. OUR MODEL IS TRAINED ON
ARRAY CHIRP (5.00 DB) SIGNAL WITH THE CORRELATION COEFFICIENT + 2Ex|z (fθ (x) − z) Ey|z (y − z)
ρ = 0.5 AND TESTED ON ARRAY CHIRP (5.00 DB) SIGNAL WITH THE = Ex,y|z fθ (x) − y22 − σ 2
CORRELATION COEFFICIENT ρ = 0.4, 0.8, 1.0 RESPECTIVELY. THE
NUMBERS IN BRACKETS REPRESENT THE DIFFERENCE BETWEEN THE + 2Ex|z (fθ (x) − z). (37)
GENERALIZED PERFORMANCE AND THE ORIGINAL PERFORMANCE (TABLE V)
We have Ex,z = Ez Ex|z , then
Generalized ρ
Training ρ = 0.5
0.4 0.8 1.0 Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2
MSE in 0.249 0.251 0.243 + 2Ex,z (fθ (x) − z). (38)
GSNR 8.62(-0.02) 8.28(-0.25) 7.85(-0.17)
GMSE 0.249(0.000) 0.238(-0.002) 0.213(-0.002)
In this paper, we propose a robust and applicable unsuper- The authors would like to thank Professor Xuejing Zhang
vised denoising method that demonstrates versatility across for his paper revision. Thanks to Professor George Atia and
noise types (Gaussian noise and impulse noise), multiple ar- two professional reviewers for their valuable contributions to
ray configurations (ULA, URA, UCA, and CA), and diverse this paper and thanks to Yan Cheng for her encouragement and
degrees of signal correlation, which proves the robustness of companionship.
our method. Furthermore, to underscore the practicality of our
proposed method, we delve into its application within three dis- REFERENCES
tinct array-related contexts: DOA estimation, estimated number [1] H. Krim and M. Viberg, “Two decades of array signal processing
of sources, and spatial spectrum estimation. Simulation results research: The parametric approach,” IEEE Signal Process. Mag., vol. 13,
no. 4, pp. 67–94, Jul. 1996.
show that the proposed method can effectively improve the [2] S. V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction.
performance of DOA estimation, estimated number of sources, Hoboken, NJ, USA: Wiley, 2008.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4982 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024
[3] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Upper [27] T.-H. Liu and J. M. Mendel, “A subspace-based direction finding
Saddle River, NJ, USA: Prentice-Hall, 1995. algorithm using fractional lower order statistics,” IEEE Trans. Signal
[4] B.-Y. Sun, D.-S. Huang, and H.-T. Fang, “Lidar signal denoising Process., vol. 49, no. 8, pp. 1605–1613, Aug. 2001.
using least-squares support vector machine,” IEEE Signal Process. Lett., [28] E. J. Candes and T. Tao, “Near-optimal signal recovery from random
vol. 12, no. 2, pp. 101–104, Feb. 2005. projections: Universal encoding strategies?” IEEE Trans. Inf. Theory,
[5] K. Naveed, S. Mukhtar, and N. U. Rehman, “Multivariate signal de- vol. 52, no. 12, pp. 5406–5425, Dec. 2006.
noising based on generic multivariate detrended fluctuation analysis,” [29] A. Beck and M. Teboulle, “A fast iterative Shrinkage-Thresholding
in Proc. IEEE Statist. Signal Process. Workshop (SSP), Piscataway, NJ, algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1,
USA: IEEE Press, 2021, pp. 441–445. pp. 183–202, 2009, doi: 10.1137/080716542.
[6] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a [30] T. Tasdizen, “Principal neighborhood dictionaries for nonlocal means
Gaussian denoiser: Residual learning of deep CNN for image denoising,” image denoising,” IEEE Trans. Image Process., vol. 18, no. 12,
IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017. pp. 2649–2660, Dec. 2009.
[7] J. Chai, H. Zeng, A. Li, and E. W. Ngai, “Deep learning in computer [31] M. Elad and M. Aharon, “Image denoising via sparse and redundant
vision: A critical review of emerging techniques and application scenar- representations over learned dictionaries,” IEEE Trans. Image Process.,
ios,” Mach. Learn. Appl., vol. 6, 2021, Art. no. 100134. vol. 15, no. 12, pp. 3736–3745, Dec. 2006.
[8] W. Zhu, S. M. Mousavi, and G. C. Beroza, “Seismic signal denoising [32] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
and decomposition using deep neural networks,” IEEE Trans. Geosci. designing overcomplete dictionaries for sparse representation,” IEEE
Remote Sens., vol. 57, no. 11, pp. 9476–9488, Nov. 2019. Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
[9] Y. Quan, M. Chen, T. Pang, and H. Ji, “Self2Self with dropout: Learning [33] M. Scetbon, M. Elad, and P. Milanfar, “Deep K-SVD denoising,” IEEE
self-supervised denoising from single image,” in Proc. IEEE/CVF Conf. Trans. Image Process., vol. 30, pp. 5944–5955, 2021.
Comput. Vis. Pattern Recognit., 2020, pp. 1890–1898. [34] G. H. Baki̇r, J. Weston, and B. Schölkopf, “Learning to find pre-images,”
[10] D. Yin, C. Luo, Z. Xiong, and W. Zeng, “Phasen: A phase-and- in Proc. Adv. Neural Inf. Process. Syst., vol. 16, 2004, pp. 449–456.
harmonics-aware speech enhancement network,” in Proc. AAAI Conf. [35] J. L. Rojo-Alvarez, O. Barquero-Perez, I. Mora-Jimenez, E. Everss, A.
Artif. Intell., vol. 34, no. 5, 2020, pp. 9458–9465. B. Rodriguez-Gonzalez, and A. Garcia-Alberola, “Heart rate turbulence
[11] A. Li, S. You, G. Yu, C. Zheng, and X. Li, “Taylor, can you hear me denoising using support vector machines,” IEEE Trans. Biomed. Eng.,
now? A Taylor-unfolding framework for monaural speech enhancement,” vol. 56, no. 2, pp. 310–319, Feb. 2009.
2022, arXiv:2205.00206. [36] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
[12] S. Cheng, Y. Wang, H. Huang, D. Liu, H. Fan, and S. Liu, “NBNet: networks for biomedical image segmentation,” in Proc. Med. Im-
Noise basis learning for image denoising with subspace projection,” in age Comput. Comput.-Assisted Intervention–MICCAI 2015: 18th Int.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 4896– Conf., Munich, Germany. Luxembourg, Germany: Springer, 2015,
4906. pp. 234–241.
[13] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a [37] A. Krull, T.-O. Buchholz, and F. Jug, “Noise2Void-learning denoising
Gaussian denoiser: Residual learning of deep CNN for image denoising,” from single noisy images,” in Proc. IEEE/CVF Conf. Comput. Vis.
IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017. Pattern Recognit., 2019, pp. 2129–2137.
[14] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, [38] S. Laine, T. Karras, J. Lehtinen, and T. Aila, “High-quality self-
“SwinIR: Image restoration using swin transformer,” in Proc. IEEE/CVF supervised deep image denoising,” in Proc. Adv. Neural Inf. Process.
Int. Conf. Comput. Vis., 2021, pp. 1833–1844. Syst., vol. 32, 2019, pp. 6970–6980.
[15] K. Naveed and N. ur Rehman, “Wavelet based multivariate signal [39] Q. Lyu and X. Fu, “Identifiability-guaranteed simplex-structured post-
denoising using Mahalanobis distance and EDF statistics,” IEEE Trans. nonlinear mixture learning via autoencoder,” IEEE Trans. Signal Pro-
Signal Process., vol. 68, pp. 5997–6010, 2020. cess., vol. 69, pp. 4921–4936, 2021.
[16] A. M. Rao and D. L. Jones, “A denoising approach to multisensor signal [40] B. Yang, X. Fu, N. D. Sidiropoulos, and K. Huang, “Learning nonlinear
estimation,” IEEE Trans. Signal Process., vol. 48, no. 5, pp. 1225–1234, mixtures: Identifiability and algorithm,” IEEE Trans. Signal Process.,
May 2000. vol. 68, pp. 2857–2869, 2020.
[17] R. J. Kozick and B. M. Sadler, “Maximum-likelihood array processing [41] Y. Liu, Z. Qin, S. Anwar, P. Ji, D. Kim, S. Caldwell, and T. Gedeon,
in non-Gaussian noise with Gaussian mixtures,” IEEE Trans. Signal “Invertible denoising network: A light solution for real noise removal,” in
Process., vol. 48, no. 12, pp. 3520–3535, Dec. 2000. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13365–
[18] R. A. Roberts and C. T. Mullis, Digital Signal Processing. Reading, 13374.
MA, USA: Addison-Wesley, 1987. [42] T. Kwon and J. C. Ye, “Cycle-free CycleGAN using invertible generator
[19] M. H. Hayes, Statistical Digital Signal Processing and Modeling. for unsupervised low-dose CT denoising,” IEEE Trans. Comput. Imag.,
Hoboken, NJ, USA: Wiley, 1996. vol. 7, pp. 1354–1368, 2021.
[20] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Pi- [43] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proc.
card, “Wavelet shrinkage: Asymptopia?” J. Roy. Statist. Soc.: Ser. B IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9446–9454.
(Methodol.), vol. 57, no. 2, pp. 301–337, 1995. [44] W. Shi et al., “Real-time single image and video super-resolution using
[21] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf. an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf.
Theory, vol. 41, no. 3, pp. 613–627, May 1995. Comput. Vis. Pattern Recognit., 2016, pp. 1874–1883.
[22] R.-M. Zhao and H.-m. Cui, “Improved threshold denoising method [45] Y. Mansour and R. Heckel, “Zero-Shot Noise2Noise: Efficient image
based on wavelet transform,” in Proc. 2015 7th Int. Conf. Modelling, denoising without any data,” in Proc. IEEE/CVF Conf. Comput. Vis.
Identification Control (ICMIC), Piscataway, NJ, USA: IEEE Press, 2015, Pattern Recognit., 2023, pp. 14018–14027.
pp. 1–4. [46] X. Ding, X. Zhang, Y. Zhou, J. Han, G. Ding, and J. Sun, “Scaling up
[23] S. Ghael, A. M. Sayeed, and R. G. Baraniuk, “Improved wavelet your kernels to 31x31: Revisiting large kernel design in CNNs,” 2022,
denoising via empirical Wiener filtering,” in Proc. SPIE Tech. Conf. arXiv:2203.06717.
Wavelet Appl. Signal Process., 1997. [47] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden,
[24] H. Choi and R. Baraniuk, “Analysis of wavelet-domain Wiener fil- “Pyramid methods in image processing,” RCA Engineer, vol. 29, no. 6,
ters,” in Proc. IEEE-SP Int. Symp. Time-Frequency Time-Scale Anal- pp. 33–41, 1984.
ysis (Cat. no. 98TH8380), Piscataway, NJ, USA: IEEE Press, 1998, [48] J. C. Goswami and A. K. Chan, Fundamentals of Wavelets: Theory,
pp. 613–616. Algorithms, and Applications. Hoboken, NJ, USA: Wiley, 2011.
[25] R. Sathish and C. Anand, “Spatial wavelet packet denoising for improved [49] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”
DOA estimation,” in Proc. 14th IEEE Signal Process. Soc. Workshop in Proc. Int. Conf. Learn. Representations (ICLR), 2017.
Mach. Learn. Signal Process., Piscataway, NJ, USA: IEEE Press, 2004, [50] Y. Sakamoto, M. Ishiguro, and G. Kitagawa, “Akaike information crite-
pp. 745–754. rion statistics,” Dordrecht, Netherlands: D. Reidel, vol. 81, no. 10.5555,
[26] J. Murphy and S. Godsill, “Joint Bayesian removal of impulse and 1986, Art. no. 26853.
background noise,” in Proc. IEEE Int. Conf. Acoust., Speech Sig- [51] M. Wax and T. Kailath, “Detection of signals by information theoretic
nal Process. (ICASSP), Piscataway, NJ, USA: IEEE Press, 2011, criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33,
pp. 261–264. no. 2, pp. 387–392, Apr. 1985.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.