0% found this document useful (0 votes)
3 views

J2024-Towards Applicable Unsupervised Signal Denoising via Subsequence Splitting and Blind Spot Network

This paper presents an unsupervised signal denoising method that utilizes subsequence splitting and a blind spot network to effectively learn signal characteristics without requiring clean signals for training. The proposed approach demonstrates strong performance in denoising both single-sensor and array signals under various noise conditions, including Gaussian and impulsive noise. Experimental results indicate that this method outperforms traditional and machine learning-based techniques, showcasing its practical applicability and generalization capabilities across different signal processing scenarios.

Uploaded by

jzxxxiaolin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

J2024-Towards Applicable Unsupervised Signal Denoising via Subsequence Splitting and Blind Spot Network

This paper presents an unsupervised signal denoising method that utilizes subsequence splitting and a blind spot network to effectively learn signal characteristics without requiring clean signals for training. The proposed approach demonstrates strong performance in denoising both single-sensor and array signals under various noise conditions, including Gaussian and impulsive noise. Experimental results indicate that this method outperforms traditional and machine learning-based techniques, showcasing its practical applicability and generalization capabilities across different signal processing scenarios.

Uploaded by

jzxxxiaolin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL.

72, 2024 4967

Towards Applicable Unsupervised Signal Denoising


via Subsequence Splitting and Blind Spot Network
Ziqi Wang , Zihan Cao , Julan Xie , Member, IEEE, Huiyong Li , Member, IEEE, and
Zishu He , Member, IEEE

Abstract—Denoising is a significant preprocessing process, I. INTRODUCTION


garnering substantial attention across various signal-processing
domains. Many traditional denoising methods assume signal
stationary and adherence of noise to Gaussian distribution,
thereby limiting their practical applicability. Despite significant
I N various signal processing problems, the existence of noise
cannot be ignored, and it has a significant impact on the
performance of signal processing algorithms [1]. The presence
advancements in machine learning and deep learning methods, of noise makes signal processing more complex and challenging
machine learning-based (ML-based) approaches still require
manual feature engineering and intricate parameter tuning, in many cases, as it can obscure the true characteristics of the
and deep learning-based (DL-based) methods, remain largely signal, degrade the quality of the signal, and lead to erroneous
constrained by supervised denoising techniques. In this paper, analysis results. Therefore, many scholars are committed to
we propose an unsupervised denoising approach that addresses exploring effective methods of signal denoising [2] to restore
the shortcomings of previous methods. Our proposed method useful (clean) signals from noisy signals to a certain extent.
uses subsequence splitting and blind spot network to adaptively
learn the signal characteristics in different scenarios, so as to Denoising algorithms provide a cleaner input to subsequent
achieve the purpose of denoising. The experimental results show signal processing steps by reducing the noise level, thereby
that our method performs satisfactorily on both single-sensor ensuring that the algorithm will function correctly and produce
and array signal denoising problems under Gaussian white noise reliable results.
and Impulsive noise. Moreover, our method is also verified to be From the perspective of denoising methods, the prevailing
effective on some array signal processing problems of Direction
of Arrival (DOA) estimation, Estimated Number of Sources, denoising algorithms can be categorized into three major
and Spatial Spectrum estimation. Finally, in the discussion groups: traditional methods, ML-based methods, and DL-based
experiments and generalization experiments, we demonstrate that methods. Traditional methods mainly rely on mathematical
our method performs well across a wide variety of array forms statistics and signal processing techniques, such as mean fil-
and degrees of signal correlation, and has good generalization. ter and wavelet transform [3]. Although these methods have
Our code will be released after possible acceptance.
achieved certain results in some scenarios, they are often limited
Index Terms—Signal denoising, unsupervised inference, array by manually selected parameters and assumptions about signal
forms, Gaussian noise, and Impulsive noise.
and noise, which make their performance unsatisfactory in com-
plex and variable practical applications, e.g., DOA estimation
and spatial spectrum estimation.
Compared with traditional methods, ML-based methods par-
tially address the above problems. ML-based methods use ML
algorithms, such as support vector machines (SVM) [4] and
principal component analysis (PCA) [5], to learn the difference
between signal and noise to perform noise suppression. Thanks
to ML-based methods, which are learning-based approaches,
Received 5 October 2023; revised 15 June 2024 and 8 October 2024; they are less sensitive to data and enhance robustness compared
accepted 15 October 2024. Date of publication 18 October 2024; date of to traditional methods. Specifically, traditional methods, such
current version 8 November 2024. This work was supported by the National
Natural Science Foundation of China (NSFC) under Grant 62031007 and as wavelet methods, are usually more sensitive to data, i.e.,
Grant 62231006. The associate editor coordinating the review of this article and their generalization ability is relatively poor. In different
and approving it for publication was Prof. George Atia. (Ziqi Wang and Zihan scenarios (for example, when the signal type and noise type
Cao contributed equally to this work.) (Corresponding author: Julan Xie.)
Ziqi Wang, Julan Xie, Huiyong Li, and Zishu He is with the School are different), different wavelet bases may need to be selected
of Information and Communication Engineering, University of Electronic and different thresholds may need to be set for denoising, which
Science and Technology of China, Chengdu 610000, China (e-mail: ziqi- undoubtedly increases the difficulty of application. ML-based
[email protected]; [email protected]).
Zihan Cao is with the Department of Mathematics, University of Elec- methods are data-driven methods that can learn the denoising
tronic Science and Technology of China, Chengdu 610000, China (e-mail: process through data, thus having strong robustness and gener-
[email protected]). alization. However, ML-based methods usually require manual
This article has supplementary downloadable material available at https://
doi.org/10.1109/10.1109/TSP.2024.3483453, provided by the authors. design and selection of features, which may be ineffective when
Digital Object Identifier 10.1109/TSP.2024.3483453 the characteristics of the received signal are unknown.

1053-587X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4968 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

In recent years, DL-based methods have gradually emerged input of the model is the entire array signal receiving matrix,
and become an important research direction in the field of de- no additional preprocessing work is required (e.g., flattening,
noising [6], [7], [8]. DL-based denoising methods can automat- chunking, and transformation [16], [17]), so the information
ically learn complex signal features and noise distributions to in the original signal and the correlation information between
achieve efficient denoising effects without explicit assumptions arrays are completely preserved, thereby avoiding information
of signal types. At present, the research on deep denoising loss as much as possible. The contributions of this paper can
methods mainly focuses on image and speech signals [9], [10], be summarized in the following four folds:
[11], and there are still few researches on array signal denoising. 1) We propose an unsupervised deep-learning signal de-
In addition, a considerable number of deep denoising methods noising method by constructing split subsequences and
are supervised methods [12], [13], [14], which require clean a blind-spot network is designed to further boost the de-
signals during training, which is difficult to achieve in some noising performances. Extensive experiments show that
signal processing problems, such as DOA and spatial spectrum our method can outperform previous traditional and ML-
estimation. Therefore, the denoising method that learning from based methods on single/array signal denoising tasks
only received signals is necessary. qualitatively and quantitatively.
From the perspective of denoising objects, denoising algo- 2) Our proposed method can handle denoising problems in
rithms can be divided into single-sensor signal denoising and single sensor signals and array signals under multiple
array signal denoising [15]. Existing array denoising methods array forms, such as Uniform Linear Array (ULA), Uni-
can be divided into two major categories. One is to directly form Rectangular Array (URA), Uniform Circular Array
use a single sensor denoising algorithm on each element in the (UCA), and Coprime Array (CA). It is also effective for
array, which is called the sensor-by-sensor method. The other non-Gaussian noise denoising problems, demonstrating
algorithm is often the extension of the single-sensor denoising the high practicability of our method.
method [15], [16]. By designing appropriate extension meth- 3) Some downstream applications are tested to prove the
ods, coherence between array sensors can be included in the effectiveness and efficiency of our method in the appli-
denoising process, thereby helping to improve the performance cation pipeline.
of array signal denoising algorithms. However, there are certain 4) Generalization experiments show that the proposed
limitations, such as requiring the signal to be known a priori or model has satisfactory generalization ability and can ef-
only being able to process Gaussian noise. In addition, due to fectively handle unseen situations.
the need for engineering implementation [16], these extension The rest of the paper is organized as follows: in Sect. II,
methods need to be approximately replaced by methods that we present related work on signal denoising. In Sect. III, we
are easy to implement, which undoubtedly introduces infor- give the mathematical principles and knowledge used in this
mation loss, resulting in a decrease in algorithm performance. paper. In Sect. IV, we introduce the theoretical background and
Therefore, there are few models that can process single-sensor component principles of the models used in this paper. In Sect.
signal and array signal denoising effectively and practically, V, we present the experimental results of the proposed method.
it is imminent to propose a denoising method that is highly In Sect. VI, we summarize and highlight the work.
effective and applicable to both single-sensor and array signals
simultaneously. II. RELATED WORKS
This paper introduces an innovative unsupervised denoising
A. Traditional Methods
method that can address the following challenges: 1) Lack of
a highly effective model that can be used for single-sensor Traditional single-sensor denoising method encompass tech-
signal and array signal denoising problems simultaneously. niques such as filtering and wavelet decomposition, which are
2) Information loss caused by manually designed features non-learning methods. Filtering methods require designing di-
and tedious hyperparameter tuning during denoising. 3) Lack verse filters to achieve denoising goals, such as median filters,
of effective unsupervised denoising methods applicable to mean filters [18], and Wiener filters [19]. These methods can
multiple array forms and non-Gaussian noise. 4) Supervised work well when designed properly, but in practice, it is often
learning denoising method which requires clean signals that difficult to obtain enough a priori information to accurately
cannot be obtained in some scenarios. It is distinctive of design the filter. For instance, implementing a Wiener filter
our method to denoise signals without imposing constraints on necessitates knowledge of the covariance matrix of clean sig-
signal distribution, relying solely on the prerequisite that the nals, which are frequently challenging to obtain in practical
noise possesses a zero mean and doesn’t need access to the scenarios.
clean signal. In our proposed method, the model learns features Wavelet decomposition methods are grounded in wavelet
directly from noisy signals. Mathematically, it can be strictly transforms, known for their robust data decorrelation and sig-
proved that this learning method is approximately learning di- nal energy compression capabilities. The concept of wavelet
rectly from clean signals under the condition that the number of denoising was first proposed by Donoho et al. [20], in which
samples is sufficient. In addition, since the method we proposed a wavelet denoising method with a hard threshold (HTWT)
only requires the mean value of the noise and does not constrict was given. Donoho et al. [21] proposed a soft threshold-
the form of the signal, this mathematical guarantee can be based wavelet transform (STWT) to solve the single-sensor
generalized in array signal processing scenarios, and since the signal denoising problem, which smoother denoising results

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4969

by using a continuous threshold function. Zhao et al. [22] are not suitable for direct application on the array denoising
introduced a compromise threshold-based wavelet transform problem.
(CTWT) denoising method, overcoming the discontinuous in
hard-threshold denoising and reducing the permanent bias in
soft-threshold denoising. The Wavelet-based empirical wiener B. Learning-Based Methods
filtering (WWEF) method was proposed by Sandeep et al. [23]. With the advancement of machine learning and deep learning
This approach employs two different wavelet transforms to techniques, there has been a notable surge in the proposal of
smooth the denoising outcome of the first wavelet transform, methods based on machining-learning (ML) and deep learning
extending it to encompass a broader coefficient range. Choi (DL) paradigms. Within the realm of ML-based approaches,
et al. [24] made an analysis of the WWEF method and proposed these methods primarily encompass optimization techniques
an iterative method based on the multi-wavelet basis to improve [28], [29], dictionary-based strategies [30], [31], as well as
the performance of WWEF. singular decomposition methodologies [32], [33]. Notably,
Transitioning to array signal denoising, the earlier single- principal component analysis (PCA), a widely adopted dimen-
sensor signal denoising methods can be readily extended. sionality reduction method, finds utility in tasks such as signal
Specifically, array signal denoising can be achieved by us- denoising [5], [34]. Sun et al. [4] introduced a method that
ing a single-sensor signal denoising method on each sensor, employs the least-square support vector machine (SVM) to ad-
which is called a “sensor-by-sensor” method. However, this dress lidar signal denoising. Similarly, Rojo et al. [35] leveraged
direct approach ignores the inter-sensor correlation and fails SVM techniques for the denoising of heart rate turbulence data.
to preserve the inter-sensor correlation information. Based on DL-based methods, encompassing both supervised and un-
this, a series of improved extension methods are proposed for supervised approaches, have exhibited remarkable efficacy in
array denoising to diminish information loss. One classic array signal denoising. Within the realm of supervised techniques, a
signal denoising strategy arises from the Temporal Wavelet common approach involves amassing sets of noisy-clean pairs
Array Denoising (TWAD) method proposed by Rao et al. [16]. or synthetic-noisy-clean triplets, followed by end-to-end net-
Before applying the wavelet denoising method, this method work training. Arsene et al. [8] investigated the performance
makes full use of the additional information provided by the of CNN and LSTM models for denoising electrocardiogram
array measurement, flattens the array received signal matrix signals, noting that CNN outperformed LSTM based on the
into a vector, and performs time decorrelation processing and RMSE metric. In the domain of seismic signal denoising, the
spatial decorrelation processing respectively. Another array sig- utilization of a UNet architecture [36] has been effective. To
nal denoising method was proposed by R. Sathish et al. in address the challenge of high-frequency loss during image de-
[25] (SWAD), which has the advantage of significantly re- noising, DnCNN [6] adopted a strategy wherein the network
ducing computational complexity at the expense of slightly predicted the residual discrepancies between clean and noisy
reducing SNR gain. Recently, Naveed et al. [15] proposed an images. In another vein, [12] embarked on the creation of a
efficient multivariate denoising technique using the multivari- library of core image patterns from the noisy input data. They
ate goodness-of-fit test (MGWD), which projects multichannel subsequently harnessed these foundational patterns to recon-
data into a single-dimensional space using the squared Ma- struct images within this pattern-defined space, achieving suc-
halanobis distance measure, and then perform a goodness-of- cessful denoising outcomes in the process.
fit test on multiple input data scales derived from the discrete Although the supervised methods can produce satisfactory
wavelet transform, so as to achieve the purpose of denoising. performances, in some circumstances, collecting lots of noisy-
While the aforementioned extension methods can enhance clean pairs is expensive. Therefore, many researchers turned
denoising performance to a certain extent compared to the to unsupervised methods. N2N [7] introduces unsupervised
sensor-by-sensor approach, they do have certain limitations. learning in image denoising using paired noisy data. N2V [37]
The TWAD method necessitates accurate prior knowledge of directly learns the consistency loss with a blind-spot network
spatial signal statistics for practical implementation, which is to avoid identity mapping. Laine et al. [38] takes the blind-
not attainable in some real-world scenarios. Despite the use of spot mechanism into the designing of the network. Recent
Discrete Fourier Transform (DFT) as an approximation to the works focus on the invertibility of neural networks. The added
original spatial decorrelation operation under unknown signal invertibility constraint has proven effective for tasks such as
statistics, this approximation remains exclusively applicable to blind source separation [39], [40] and image denoising [41],
uniform linear arrays (ULA), thereby imposing a limitation [42]. Since these invertibility properties are often integrated
on the scope of applicability. The SWAD method does not directly into the network, they introduce a stronger inductive
need to know the precise prior information of the signal, but it bias compared to blind spot networks. Moreover, there is the
depends on the appropriate selection of wavelet base and thresh- same term between blind spot networks and invertible networks,
old function, and cannot deal with complex and changeable “trivial solutions”. To avoid confusion, we discuss the different
scenes in practice. The MGWD method needs to project high- meanings of trivial solutions in the two kinds of networks, as
dimensional data to low-dimensional, resulting in a certain loss well as the relationship between them in the Materials Sect.
of information. Furthermore, the method has not been proven IX, available in the supplementary material. Some advanced
to work in the case of non-Gaussian noise. Although there are works concentrated on alternative training and inference: DIP
methods designed for non-Gaussian noise tackle [26], [27], they [43] exploited the prior of the CNNs’ learning process and

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4970 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

used it to denoise but suffered uncertain stop timestep and poor Ex|z (x) = x and Ey|z (y) = y + . The variance of y is σ 2 ,
performance. Self2Self (S2S) [9] leveraged the dropout to train such that,
the network and infer the output by averaging the multiple runs.
Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2
III. PRELIMINARY + 2Ex,z (fθ (x) − z). (7)
A. N2N Model The proof can be found in the Appendix. From Theorem.
N2N [7] provides a new perspective to denoise one given IV.1, we can see that the self-consistency loss of N2N can not
noisy signal without using any clean signal. Consider a clean be zero if the gap  = 0, which means training with this loss
signal z and a noisy signal (or observation) x = z + n, where can not output the same results in a supervised manner (i.e.,
n is the noise, the joint distribution can be drawn, with clean signals). But if the gap is sufficiently small ( → 0),
the network can be regarded as training with supervised loss.
p(z, n) = p(z)p(n|z). (1) Specifically, when  → 0, Eq. (7) becomes:
The distribution p(z) can be an arbitrary distribution satisfying
Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2 , (8)
p(z i |z j ) = p(z i ), (2)
where the left term represents the supervised loss, and the
which means two elements zi and zj are not statistically inde- first term on the right represents the unsupervised loss. The
 noise n is assumed to be a conditional distribution
pendent. The difference between the two losses is a constant σ 2 . The role
p(n|z) = i p(ni |z i ). Therefore, the noise is conditionally of the loss function is to find the optimal network parameters
independent of the clean signal. Furthermore and empirically, θ∗ , and the constant σ 2 does not affect the result of θ∗ . In
the noise is often assumed to be zero means, other words, minimizing the left and right sides of Eq. (8) leads
to the same optimal solution θ∗ . Therefore, training with the
E[ni ] = 0, (3) unsupervised loss can be considered equivalent to training with
and it causes that, the supervised loss in this scenario.
To mitigate the problem of the need for noise-noise pair,
E[xi ] = z i . (4) we can construct several sub-noisy signal pairs D with some
similar constructors Di to satisfy the above condition. Take one
By utilizing this conclusion, N2N acquires multiple different sub-noisy signal pair as an example, the D = {D1 (x), D2 (x)}.
noisy signals under the same clean signal and trains the network Assume the two subsampled noisy signals are quite similar. We
to minimize the N2N loss, can reuse the N2N self-consistency loss,
LN 2N = fθ (y) − x22 , (5)
arg min Ex,z fθ (D1 (x)) − D2 (x)22 . (9)
θ
where x, y are two different noisy signals sharing the same z. In
this way, the denoised output fθ (y) can approximate the clean However, the gap between the two subsampled noisy signals
signal z. is not zero in practice, which is Ex|z D2 (x) − Ex|z D1 (x) = 0.
Then, directly reusing the loss may lead to suboptimal results.
IV. METHOD To solve this problem, we can introduce a regularized term to
fix the problem.
A. Therotical Background Proposition IV.1: Given a trained and optimal denoising net-
Considering the N2N background, training N2N does not work fθ∗ , which has the optimal denoising results, fθ∗ (x) = z
need to involve the ground truth. N2N tries to train the network and fθ∗ (Di (x)) = Di (z). Then, the following holds,
with self-consistency loss,
Ex|z {fθ∗ (D1 (x)) − D2 (x) − (D1 (fθ∗ (x)) − D2 (fθ∗ (x)))}
arg min Ex,y,z fθ (y) − x22 , (6) = D1 (z) − Ex|z D2 (x) − (D1 (z) − D2 (z))
θ

where x, y are two independent noisy observations that share = D2 (z) − Ex|z D2 (x) = 0. (10)
a clean unobserved signal z and f is the training network with
its parameters θ. The main drawback of N2N is the need for Eq. (10) constrains the output to be optimal. Therefore, a reg-
two paired noisy observations, which is usually unaffordable ularized loss is used,
to obtain in some signal processing scenarios. The other dis-
arg min Ex,z fθ (D1 (x)) − D2 (x)22
advantage is that the N2N self-consistency loss may encounter θ
the noise gap, which is described as follows, + αEx,z fθ (D1 (x))−D2 (x)−(D1 (fθ (x))−D2 (fθ (x)))22 .
Theorem IV.1: Consider two independent noisy observations (11)
x, y with the unobserved clean signal z, the noisy observations
x, y are independent conditioned on z: Ex,y|z = Ex|z Ey|z , Fig. 1(a) illustrates how to train an unsupervised denoiser uti-
and a gap  := Ey|z (y) − Ex|z (x) = 0 exists with condition lizing the aforementioned proposition.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4971

Fig. 1. Training and inference stages of the proposed framework. (a) Training stage of our framework: first, the noisy signals are randomly split as
subsequence pairs. Second, one sub-noisy signal is fed into the denoising network to produce the sub-denoised signal. Finally, an unsupervised loss is used
to train the denoising network without using any groundtruth (i.e., clean signal). The regularization term in Eq. (11) is omitted for clearer illustration. (b)
Blind-spot convolution. The middle element of the convolution kernel is masked to avoid information leakage. (c) Our proposed dual branch blind-spot CNN.
K, S, P, and D denote convolution kernel size, stride, padding, and dilation, respectively. (d) Inference stage of our framework. The trained denoising network
is exploited to process the noisy signals into denoised (or near-clean) signals with large SNR gain, and the denoised signals can be used for downstream
applications such as DoA estimation and Estimated Number of Sources.

B. Subsequence Constructor
As defined in Eq. (11), we need I subsequence constructors
D to construct the noisy subsequence {D1 (x), · · · , DI (x)}.
A naive approach is randomly sampling the signal, but the
problem is that it cannot preserve the structure of the original
signal, in other words, D1 (x) may differ a lot from D2 (x). To
ensure the subsequences are similar, we propose a structure-
keeping subsequence construction technique.
To be specific, considering a signal x with shape RM ×L ,
our structural-keeping subsequence construction technique is
composed of three operations: 1) Unfold, 2) Shuffle, and 3)
Index. We first unfold it on the last dimension into the unfolded
signal x̂ ∈ RM ×I×(L/I) . Then, to preserve the structure of the
input signal, we choose to uniformly sample the unfolded signal
on I dimension. Finally, the i-th channel is indexed to form the
subsequence Di (x), which can be formulated as follows,
Di = Indexi ◦ Shuffle ◦ Unfold, (12)
Shuffle(x̂) = {x̂[:, :, j]}, j ∈ π({1, · · · , L/I}), (13)
Indexi (x̂) = x̂[:, i, :], (14)
where π ∈ Π denotes random permutation. An intuitive illustra- Fig. 2. Illustration of subsequence construction. ①, ②, ③ indicates Unfold-
ing, Shuffling and Indexing operations. M = 1 is here shown as a simple and
tion of the subsquence constructor is shown in Fig. 2. Now, we clear example.
begin by defining structure-keeping property and then introduce
a proposition to demonstrate that the proposed subsequence
constructor has this property.  
Definition IV.1 (Structure-keeping property): Structure- by I: Xi = x(i−1) L +1 , . . . , xi L , i = 1, 2, · · · , I. After shuf-
I I
keeping of a signal sequence includes maintaining the basic fling these blocks and indexing certain elements within each
statistical properties of the signal (e.g., expectation) and the block, each of the constructed subsequences Si ∈ {(ΠX)i , i =
signal structure (e.g., autocorrelation function). 1, 2, · · · , I} maintains the statistical characteristics of X,
Proposition IV.2 (Structure-keeping property of subsequence particularly in terms of mean and autocorrelation, where the
constructor): Let X = [x1 , x2 , · · · , xL ] be a signal sequence subscript i denotes the Indexi operation and Π := {π} is the
divided into I blocks, each of length assuming L is divisible concatenation of each applied random permutations.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4972 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

Remark 1: The receptive field of the network depends on the


size of the convolution kernel and the dilation sizes. Increasing
the receptive field can make the network a stronger modeling
ability for the signal. There are two different approaches to this:
1) directly enlarging the kernel size to increase the receptive
field, such as RepLKNet [46], but increasing the kernel size will
bring additional FLOPs overhead of K 2 complexity where K
is the kernel size; 2) increase the dilation, so that the overhead
Fig. 3. Illustration of the autocorrelation function of the entire sequence is limited to the memory access of the dilation, which is a rela-
(first figure) and the subsequences (second and third figures) with I = 2.
tively more efficient choice. In addition, choosing a tower-like
network design can take into account different sizes of receptive
Proof: The expected value of any Xi is equal to the mean fields. This hierarchical design idea is proven to be effective for
of X since the blocks are uniformly sampled without replace- image and signal modeling, such as image pyramids [47] and
ment: decomposition scales of wavelet transform [48].
The blind-spot network is composed of several 1 × 1 convo-
E[Xi ] = E[X] = μ. (15) lutions, a dual branch with blind-spot ResNet block of different
dilations, and another several 1 × 1 convolutions to map back to
Given that the shuffling operation is a permutation and index- the signal space. Every convolution kernel except 1 × 1 kernel
ing is uniformly distributed across possible positions within is blind-spot (see in Fig. 1(b)) to avoid information leakage and
each block, for any subsequence Si , the expected remains prevent the trivial solution.
unchanged: Consider a window xWj around one element xj of the noisy
E[Si ] = μ. (16) signal, and this window denotes the receptive field of the net-
work. Only other elements in xWj except for xj itself can affect
The autocorrelation function RX (τ ) for X is defined as: the j-th output of the network fθ (xWj ) = fθ (xj ). Therefore,
the N2N loss can be converted to the following equivalent form,
RX (τ ) = E[Xt Xt+τ ] (17)
1 
arg min fθ (D1 (xWj )), D2 (xj )22 . (19)
where τ is the lag. For subsequences, since the indexing does θ 2L
j
not affect the relative positions significantly across the entire
sequence. Combined with experimental verification (as shown The 1 × k, k = 1 blind kernel has a blind spot in its center. Since
in Fig. 3), the autocorrelation function of subsequence RS (the the noise is independent of the clean signal, the signal elements
subscript i is omitted here) can be formulated as: in the receptive field do not have any information about the
noise. So, the network can produce the center element by only
RS (τ ) = E[St St+τ ] ≈ RX (Iτ ). (18) using the surrounding information which is Eqs. 3 and 4. A
more detailed illustration of the designed network architecture
Thus, we can conclude that the subsequence constructor has the is shown in Materials Sect. X, available in the supplementary
property of structure-keeping. material.
Note that, our structure-keeping subsequence construction
technique looks like the pixelshuffling [44] with indexing op-
eration, but they are totally different because the pixelshuffle D. Efficient Training and Inference
is fixed and does not have any randomness. Using pixelshuffle The overall training and inference stages are illustrated in
operation on training our network will lead to poor perfor- Fig. 1. Since the deep networks are easy to be overfitted to the
mance. Similarly, recent unsupervised image denoising works train data when the train data is rare. To avoid overfitting, our
[45] share a similar spirit with ours. We will provide a thorough subsequence constructors can be multiple besides two (i.e., I >
distinction between our approach and theirs in the Materials 2). The more subsequences are constructed, the length of subse-
Sect. VIII, available in the supplementary material. quences are smaller. In practice, we find that there is a trade-off
between the number of subsequences and their lengths. Based
on this finding, we choose to use more subsequences to boost
C. Blind-Spot Network
denoising performance and fewer subsequences but long signal
In N2N, a simple fully convolutional network (FCN) is em- lengths to maintain good denoising quality. Algo. 1 provides
ployed to extract features and conduct denoising. Look back the overall training pipeline of our method.
to Eq. (11), as the subsequence constructor D1 , D2 is quite As for the inference stage, for the follow-up system of the
similar, the network may learn a trivial solution. That is, iden- radar (e.g., target signal detection), higher requirements are
tity mapping. To prevent the trivial solution and extract multi- placed on the signal glitches. If the signal has more glitches,
scale features, we design a novel blind-spot network with dual the constant false alarm system may fail. To this end, we pro-
branches as shown in Fig. 1(c). Our blind spot network is a pose iterative smoothing inference, specifically, we feed the
tower-like network structure with two towers having different signal multiple times into a trained denoising neural network to
dilation sizes. This design allows the towers to have different smooth the glitches as shown in Fig. 1(d). Algo. 2 is provided to
receptive fields, which benefits signal feature modeling. illustrate the iterative smoothing. It is worth noting that feeding

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4973

Algorithm 1: Training Algorithm is the characteristic function of the stable distribution. In the
below formula,
input : Noisy signal dataset X, blind spot network fθ ,  
number of subsequences I. tan πα 2 α = 1
Φ= . (22)
output: Trained network fθ∗ − π2 log |t| α = 1
// Network initialization. α, β, c, and μ are the characteristic index, dispersion parameter,
1 θ ← Init(θ) scale parameter, and location parameter, respectively. When
2 for x ← X do β = 0, the distribution is in a symmetric form, and the stable
// Eqs. (12)–(14)
distribution degenerates into a symmetric stable distribution.
3 D ← SubsequenceConstruct(I) According to [17], impulsive noise in radar can be modeled as
4 L←0 a symmetric stable distribution. This paper will persist with this
5 for i ← range(I) do configuration and address the impulsive noise denoising issue
// Avoid index out of range.
described by a symmetrical stable distribution in the subsequent
6 j ← mod(i + 1, I)
sections.
7 xi , xj ← Di (x), Dj (x)
2) Data Generation: We train and test the proposed model
// Compute loss, see Eq. (11).
using simulated data, which is generated with five common
8 Li ← fθ (xi ) − xj 22 + αfθ (xi ) − xj −
types of classic signals: sine, chirp, square, blocks, and spike.
(Di (fθ (x)) − Dj (fθ (x)))22
Each of them is defined as below,
9 L ← L + Li
10 end z sine (t) = sin(f0 t + ψ0 ), (23)
// Gradient descent and update parameters.
θ ← arg minθ L where f0 is central frequency and ψ0 is initial phase.
11 end z chirp (t) = sin(f0 t + (f1 − f0 )t2 + ψ0 ), (24)
12 θ∗ ← θ
13 return θ∗ where f1 − f0 controls the speed of frequency change.

+∞
z square (t) = A0 (u (t − nT0 + t0 ) − u (t − nT0 − t0 )),
n=−∞
Algorithm 2: Effecient inference (25)
input : Noisy signal x, trained blind spot network fθ∗ , where A0 , T0 , and t0 are the amplitude, period, and time slot
number of iterative smoothing N . respectively.
output: Denoised signal s.
1 s←x 
+∞ 
L

// Iterative smoothing.
z blocks (t) = Ai (u (t − nTi + ti ) −u (t−nTi −ti )) ,
n=−∞ i=1
2 for n ←range(N) do
(26)
3 s ← fθ∗ (s)
4 end where Ai represents the amplitude of i-th subsignal, and L
5 return s represents the number of different squares. The spike signal
z spike (t) is generated by α-stable distribution(see Sect. V-A1).
For the single-signal scenario, we generate 4000 data for train-
ing and testing for each type of signal. For array signal, we
the network multiple times may result in a slight decrease in
use the Uniform Linear Array (ULA), the number of sensors is
output SNR, but the resulting reduction in glitches is worth it.
M = 8, the number of sources is K = 3, the arrival direction of
V. EXPERIMENTS each signal source is randomly selected from −60 degrees to 60
degrees, the number of snapshots is N = 1024, and the inter-
A. Datasets space of the sensor is d = λ2 , where λ is the wavelength of the
1) α-Stable Distribution: α-stable distribution, also known signal. For each type of signal above, we generate 500 samples.
as the stable distribution, is frequently employed for character- In the discussion experiments, we generated data under the
izing random variables or processes that exhibit heavy-tailed array form of URA, UCA, and CA, as well as array data with
behavior. Within a stable distribution, the sum of independently different degrees of signal correlation.
and identically distributed random variables retains the same
distribution. The probability density of stable distribution is B. Benchmarking
depicted as follows, For single-sensor signal denoising task, we implement sev-
 +∞
1 eral methods for comprehensive comparisons. These meth-
f (x; α, β, c, μ) = ϕ(t; α, β, c, μ)e−itx dt, (20) ods contain soft threshold-based wavelet transform (STWT)
2π −∞
[21], compromise threshold-based wavelet transform (CTWT)
where,
α
[22], hard threshold-based wavelet transformer (HTWT) [20],
ϕ(t; α, β, c, μ) = eitμ−|ct| (1−iβ sign(t)Φ(α,t))
, (21) wavelet-based empirical wiener filter (WWEF) [23], principal

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4974 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

component analysis (PCA) [34], median filter (MF) [18], mul- inner channel is chosen at 64, and the dual branch is equipped
tivariate goodness-of-git gests (MGWD) [15], and Self-to-Self with 12 residual blocks. Our optimization approach involves
(S2S) [9]. For the array signals denoising task, we compare the the utilization of the AdamW [49] optimizer, and we train the
array extended versions of the above methods except for MF, network for 200 epochs. The initial learning rate is initialized
and newly added the TWAD [16] and JBRI for Gaussian white at 10−3 and subsequently adjusted to 10−4 after the first 50
noise denoising and Impulsive noise denoising method respec- epochs. Across all experiments, we consistently employed 8
tively. In addition, since the original S2S model is designed for subsequences (i.e., I), and a comprehensive analysis of the
image denoising, we extend it to an array denoising form in impact of I is presented in the Materials Sect. XI, available in
this paper. The traditional methods are tested with meticulous the supplementary material.
parameter tuning, while the ML-based methods and DL-based
methods are trained with full convergence. E. Main Results
1) Single-Sensor Signal Denoising Results: In this section,
C. Metrics we explore the denoising performance of our proposed model
We choose the two metrics: Gain of SNR (GSNR) and Gain on single-sensor signal under Gaussian white noise. We select
of MSE (GMSE), to describe the performance of the denoising five types of signals: sine, chirp, square, blocks, and spike as
method. For a received array signal containing K sources, we input, and set three sub-experiments in each type of signal. The
have, input SNRs are −5 dB, 0 dB, and 5 dB, corresponding to the
input data under different noise conditions. For each SNR, we
X = AZ + N , (27) calculate the corresponding MSE index to fully describe the
where A ∈ CM ×K represents the array manifold matrix, Z ∈ effect of the denoising algorithm.
CK×N represents the spatial signal matrix, and N ∈ CM ×N The experimental results are shown in Table I, we compare
represents the spatial noise matrix. The signals are denoised a total of nine methods including the proposed method. It
by the network fθ (·) and the denoised signals are defined as can be seen that for sinusoidal signals, our proposed method
fθ (X). outperforms other methods in most cases, S2S and MGWD
The GSNR is defined as, methods also remain certainly competitive. At 5 dB of the
sinusoidal signal, the performance of the MGWD method is on
GSN R = SN Rout − SN Rin . (28) par with our method. The superior performance of our proposed
method comes from the self-reasoning ability of the model,
where, which avoids information loss caused by artificially designing
AZ2F criteria or features. Our model also achieves satisfactory results
SN Rin = 10 log10 , (29)
N 2F at runtimes.
For the chirp signal, the GSNR of our proposed method is
AZ2F lower than that of the sinusoidal signal, it may be caused by the
SN Rout = 10 log10 . (30)
f θ (X) − AZ2F chirp signal has a wider frequency spectrum than the sinusoidal
|| · ||F is Frobenius norm. The ideal value of GSNR is infinite, signal and contains richer frequency domain information, which
and the larger GSNR indicates better denoising performance. increases the difficulty of denoising. However, our proposed
The GMSE is defined as, method still outperforms other methods in all three cases.
For square signal, the WWEF method slightly outperforms
GM SE = M SEin − M SEout , (31) our proposed method at −5 dB, and the PCA method also shows
decent performance. In the other cases, our proposed method
where,
GSNR far outperforms the remaining methods. It is worth not-
M SEin = X − AZ2F , (32) ing that the performance of the PCA method largely depends
on the selection of the principal component dimension, and the
M SEout = fθ (X) − AZ2F . (33) performance of the WWEF method depends on the selection
The ideal GMSE metric is finite and equals to M SEin . GSNR of the two wavelet bases and the threshold functions, but our
and GMSE measure the degree of noise removal and signal proposed method does not need to manually set parameters, the
distortion respectively. model relies on its reasoning ability to learn deep features in
noisy signals.
Our method demonstrates clear superiority in handling block
D. Implementation Details
and spike signals, which are complex forms of communication.
We implement our methodology on a workstation featuring The S2S and MWGD also show their advance. MF method
an Intel i9 CPU core, coupled with two NVIDIA 4090 GPUs. In often proves inadequate when tackling denoising problems in-
the context of single-sensor signals, both the input and output volving complex signal forms. Similarly, PCA is prone to en-
channels are configured at 1. However, in scenarios involving countering information loss during their projection operations.
array signals, we synthesize an 8-sensor dataset and concatenate They even have negative gains when processing spike signals,
real and imaginary parts, thereby necessitating a configura- which may be attributed to the fact that each pulse of the spike
tion of 16 channels for both input and output. The network’s signal lasts for a short time and is mixed in the noise, making

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4975

TABLE I
COMPARISONS WITH PREVIOUS WAVELET-BASED, ML-BASED, AND TIME-DOMAIN FILTER METHODS ON SINGLE-SENSOR SIGNAL OF DIFFERENT
SIGNAL TYPES UNDER GAUSSIAN WHITE NOISE. GAINS OF SNR AND MSE ARE REPORTED. THE BEST RESULTS ARE IN RED AND THE SECOND BEST
RESULTS ARE IN BLUE. * DENOTES THE TESTING RUNTIME ON THE NVIDIA 4090 GPU PLATFORM AND ** MEANS THE TESTING RUNTIME ON THE
INTEL 12TH I9 CPU PLATFORM

Input GSNR(dB)/GMSE
Signal type
SNR(dB)/MSE
STWT [21] CTWT [22] HTWT [20] WWEF [23] PCA [5] MF [18] MGWD [15] S2S [9] Proposed

-5.00/1.520 8.97/1.320 8.91/1.323 8.27/1.311 11.84/1.410 9.26/1.342 8.59/1.312 14.82/1.468 15.92/1.476 16.50/1.507
Sine 0.00/0.511 8.92/0.452 8.82/0.441 6.32/0.387 12.17/0.478 9.28/0.447 8.55/0.436 14.86/0.490 14.72/0.479 15.04/0.489
5.00/0.173 8.93/0.151 8.89/0.149 5.36/0.132 12.08/0.158 9.33/0.149 11.40/0.156 14.60/0.161 14.08/0.159 14.60/0.161

-5.00/1.500 6.95/1.211 7.12/1.201 4.94/1.021 7.29/1.221 8.05/1.267 5.22/1.050 7.16/1.214 7.88/1.278 9.88/1.326
Chirp 0.00/0.502 5.40/0.351 5.51/0.350 3.34/0.261 4.86/0.342 9.72/0.447 4.84/0.336 6.67/0.392 8.01/0.419 11.15/0.452
5.00/0.169 4.25/0.112 4.47/0.111 3.47/0.091 2.88/0.079 8.31/0.142 3.97/0.100 5.94/0.124 7.92/0.133 11.04/0.151

-5.00/2.972 8.78/2.571 8.50/2.551 8.00/2.191 19.70/2.970 18.10/2.960 11.33/2.784 11.12/2.805 15.29/2.834 18.88/2.960
Square 0.00/1.001 8.47/0.861 8.39/0.849 9.46/0.890 16.80/0.801 18.08/0.985 9.63/0.891 9.85/0.891 15.20/0.947 19.92/0.996
5.00/0.333 7.61/0.269 7.25/0.272 10.74/0.241 13.91/0.317 17.14/0.326 7.97/0.279 7.05/0.270 14.88/0.302 19.73/0.328

-5.00/16.457 8.73/14.341 8.72/14.241 11.38/12.890 8.17/15.101 8.55/14.318 8.42/14.248 12.31/15.530 12.33/15.572 15.00/15.885
Blocks 0.00/5.103 8.38/4.288 8.29/4.361 9.65/3.543 6.10/4.158 8.38/4.692 9.90/4.933 10.18/4.561 10.83/4.681 13.88/5.091
5.00/1.583 7.43/1.312 7.30/1.289 6.95/1.311 7.54/1.297 4.93/1.141 7.87/1.320 7.02/1.041 7.92/1.196 14.02/1.514

-5.00/0.015 5.29/0.011 5.37/0.011 6.38/0.012 5.72/0.011 4.77/0.010 4.78/0.010 10.32/0.014 10.45/0.014 12.04/0.014
Spike 0.00/0.0050 3.31/0.0027 3.43/0.0029 4.32/0.0031 2.93/0.0024 0.003/0.0000 -0.03/0.0000 9.33/0.0044 9.12/0.0044 11.06/0.0046
5.00/0.0017 1.21/0.0004 1.09/0.0005 2.26/0.0006 0.78/0.0003 -0.31/-0.0001 -0.43/-0.0002 7.88/0.0014 7.96/0.0014 10.46/0.0015

Runtime(s)/sample 0.003 0.003 0.003 0.005 0.0005 0.002 10.480 0.900 0.00005* /0.0074**

2) Gaussian Noise Array Signal Denoising Results: In this


section, we explore the denoising performance of our proposed
model on array signal under Gaussian white noise. We also
choose the five types of signals in the single-sensor signal
denoising experiment as input, the difference is that each type
of signal will be received on the array. Three groups of sub-
experiments are set in the first four types of signal, corre-
sponding to the average SNR of the received signal on each
sensor −5 dB, 0 dB, and 5 dB. In addition, we set up a set
of experiments at −5 dB for spike signals to further verify
the denoising ability of the model. The GSNR and GMSE
Fig. 4. Single-sensor noisy signal with “blocks” signal type and 5dB input
SNR under Gaussian white noise.
are also given to describe the denoising performance in an
all-round way.
The experimental results are depicted in Table II. We con-
it difficult for traditional methods or machine learning methods duct a comprehensive comparison of nine methods, including
to distinguish. The core of our proposed method is the noise our proposed approach. The difference from the single sensor
independence assumption and the noise zero mean assumption. signal denoising experiment is that we replace the MF method
In other words, compared with the signal form, the proposed with the TWAD method, and other methods besides the pro-
model pays more attention to the “noise” itself. This again posed method are extended to the scenario of array signals.
demonstrates the ability of our method to reduce information From the overarching outcomes, the GSNR of array signals
loss, especially in complex signal scenarios. under equivalent SNR consistently generally registers lower
To show the denoising results more intuitively, we draw the compared to single-sensor signals, because the received signal
noisy and denoising results of the blocks signal under the 5 dB on each sensor in the array is the composite situation after
condition, as shown in Figs. 4 and 5. In terms of visual effects, the superposition of three signals rather than a single signal.
HTWT has sharper denoising results than CTWT and STWT, Moreover, the introduction of the array makes the original
which is one of the disadvantages of the hard threshold method. signal into a complex signal, which inevitably leads to a degree
The denoising result of MGWD has more fluctuations because of amplitude distortion and phase distortion that elevates the
it adopts a multi-scale feature extraction method, which retains denoising challenge. In order to show the denoising results more
more high-frequency information than the wavelet threshold intuitively, we draw the noisy signal and denoising results of the
method. Our proposed method is significantly better than other chirp signal under the 5 dB condition, as shown in Figs. 14(a)
methods in terms of visual effects, which thanks to our avoid- and 15 in the supplementary material, respectively. From the
ance of information loss caused by signal preprocessing, once perspective of visual effect, the methods based on extension are
again confirms the effectiveness of our model. better than the sensor-by-sensor methods, which benefit from

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4976 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

Fig. 5. Single-sensor signal denoising task with “block” signal type and 5dB input SNR. Comparisons with other methods are clearly illustrated. Blue lines
denote the signals and the dashed lines denote the clean signal (i.e., without any noise).

TABLE II
COMPARISONS WITH PREVIOUS WAVELET-BASED AND ML-BASED METHODS ON ARRAY SIGNAL OF DIFFERENT SIGNAL TYPES UNDER GAUSSIAN WHITE
NOISE. GAINS OF SNR AND MSE ARE REPORTED. THE BEST RESULTS ARE IN RED AND THE SECOND BEST RESULTS ARE IN BLUE. THE PINK CELL
COLOR REPRESENTS THE ‘‘SENSOR-BY-SENSOR’’ METHODS, THE ORANGE CELL COLOR REPRESENTS THE EXTENSION OF SINGLE-SENSOR METHODS AND THE
GREEN CELL COLOR REPRESENTS THE DL METHODS. * DENOTES THE TESTING RUNTIME ON THE NVIDIA 4090 GPU PLATFORM AND ** MEANS THE
TESTING RUNTIME ON THE INTEL 12TH I9 CPU PLATFORM

Input GSNR(dB)/GMSE
Signal Type
SNR(dB)/MSE
STWT [21] CTWT [22] HTWT [20] WWEF [23] TWAD [16] MGWD [15] PCA [5] S2S [9] Proposed

-5.00/2.283 6.91/1.420 7.24/1.443 9.82/1.668 10.32/1.991 11.89/2.121 15.59/2.217 10.26/1.962 15.40/2.208 15.63/2.222
Sine 0.00/0.751 6.12/0.449 7.33/0.465 8.13/0.531 7.29/0.458 10.24/0.664 14.58/0.734 9.88/0.590 14.89/0.745 15.02/0.756
5.00/0.251 4.31/0.166 5.34/0.178 7.24/0.196 7.33/0.199 8.22/0.214 13.40/0.244 7.02/0.189 13.22/0.240 13.26/0.241

-5.00/2.256 4.58/1.459 4.87/1.522 4.68/1.476 4.98/1.529 5.01/1.531 5.99/1.835 5.09/1.562 6.64/1.883 8.74/1.952
Chirp 0.00/0.750 0.92/0.143 1.95/0.268 2.05/0.281 4.77/0.532 3.22/0.341 4.38/0.507 3.20/0.329 5.48/0.549 7.82/0.621
5.00/0.249 0.86/0.040 1.02/0.058 0.79/0.037 1.81/0.072 4.38/0.152 4.47/0.157 3.99/0.110 5.69/0.188 8.45/0.238

-5.00/4.508 6.06/3.299 6.28/3.368 6.17/3.337 7.01/3.498 7.12/3.660 8.27/4.036 7.11/3.592 10.26/3.890 12.43/4.392
Square 0.00/1.502 2.02/0.520 3.07/0.736 3.23/0.765 5.11/1.003 4.85/0.945 6.02/1.219 4.70/0.918 8.29/1.193 10.48/1.360
5.00/0.501 0.53/0.104 0.92/0.075 1.33/0.116 1.71/0.189 1.87/0.197 2.89/0.241 2.71/0.228 6.22/0.329 9.04/0.432

-5.00/22.899 12.45/21.597 12.40/21.582 12.17/21.511 13.12/22.012 12.42/21.592 12.98/21.974 12.20/21.096 13.52/22.072 14.22/22.237
Blocks 0.00/8.007 9.92/7.225 10.07/7.249 9.87/7.214 10.43/7.401 10.53/7.494 10.48/7.446 9.98/7.288 11.89/7.668 13.06/7.981
5.00/2.506 6.54/1.964 7.12/2.031 7.10/2.029 7.51/2.164 7.32/2.064 7.52/2.165 7.33/1.142 9.82/2.098 11.25/2.482
Spike -5.00/0.046 5.03/0.037 5.36/0.038 6.14/0.039 2.39/0.020 4.64/0.036 8.98/0.042 2.31/0.024 9.39/0.042 11.03/0.045

Runtime(s)/sample 0.035 0.031 0.036 0.058 0.048 14.820 0.0006 2.300 0.0002* /0.0138**

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4977

TABLE III
COMPARISONS WITH PREVIOUS WAVELET-BASED AND ML-BASED METHODS ON ARRAY SIGNAL OF DIFFERENT SIGNAL TYPES UNDER IMPULSIVE NOISE.
GAINS OF SNR AND MSE ARE REPORTED. THE BEST RESULTS ARE IN RED AND THE SECOND BEST RESULTS ARE IN BLUE. THE PINK CELL COLOR
REPRESENTS THE ‘‘SENSOR-BY-SENSOR’’ METHODS, THE ORANGE CELL COLOR REPRESENTS THE EXTENSION OF SINGLE-SENSOR METHODS AND THE GREEN
CELL COLOR REPRESENTS THE DL METHODS

Input GSNR(dB)/GMSE
Signal Type
SNR(dB)/MSE
STWT [21] CTWT [22] HTWT [20] WWEF [23] JBRI [26] MGWD [15] PCA [5] S2S [9] Proposed
-5.00/3.892 5.29/1.626 3.54/1.219 2.52/0.895 2.95/1.163 8.32/1.893 9.73/2.093 8.49/1.974 14.36/2.920 19.24/3.664
Sine 0.00/1.204 4.87/0.585 3.42/0.439 2.59/0.324 2.72/0.447 7.64/0.701 8.93/0.89 7.98/0.721 12.29/0.998 17.19/1.182
5.00/0.257 4.53/0.144 2.69/0.090 2.31/0.080 2.88/0.104 6.03/0.166 7.73/0.182 6.19/0.170 10.21/0.203 14.679/0.241
-5.00/3.851 2.61/1.466 2.02/1.101 1.36/0.734 2.91/1.305 5.62/2.231 6.94/2.413 2.90/1.803 9.98/2.873 14.89/3.639
Chirp 0.00/1.203 1.10/0.308 1.38/0.268 1.08/0.180 2.61/0.350 5.68/0.670 5.80/0.683 2.57/0.585 8.63/0.831 12.21/1.182
5.00/0.318 -0.03/0.033 0.68/0.053 0.81/0.043 1.92/0.095 4.22/0.149 4.47/0.157 2.00/0.127 5.23/0.219 8.49/0.308
-5.00/7.830 3.32/3.354 2.46/2.504 1.69/1.724 3.65/3.755 5.13/5.232 5.09/5.011 4.35/4.663 12.82/6.401 17.18/7.655
Square 0.00/2.693 1.56/0.745 1.37/0.595 0.91/0.391 2.67/1.257 4.34/1.623 4.31/1.612 3.81/1.409 10.35/2.003 14.65/2.562
5.00/0.649 -0.57/0.049 0.92/0.075 -0.39/0.010 2.24/0.172 3.35/0.281 3.19/0.269 2.98/0.231 6.97/0.559 10.14/0.634
-5.00/47.293 5.27/17.204 3.53/12.946 2.36/9.485 2.79/10.522 6.89/18.192 12.98/21.974 5.93/20.039 14.59/39.790 18.07/45.694
Blocks 0.00/15.027 4.78/6.635 3.32/4.905 2.34/3.508 2.67/3.962 6.80/7.036 10.48/7.446 5.19/6.993 13.23/12.193 16.24/13.326
5.00/2.593 3.50/0.829 2.53/0.640 1.75/0.213 2.36/0.559 5.98/1.982 7.52/2.165 3.88/1.205 9.93/2.231 11.25/2.482

the preservation of information between channels by these ex- wavelet denoising can select an appropriate threshold relatively
tension methods. Our method still shows the best performance, easily. But for impulse noise, due to its irregular character-
which is consistent with our above analysis. Since our model istics, the selection of the threshold may be more challeng-
can be accelerated by the GPU, it requires minimal runtimes ing, and more complex methods may be required to determine
when dealing with array signal denoising problems. the appropriate threshold, which to some extent leads to the
3) Impulsive Noise Array Denoising Results: In this degradation of the performance of the wavelet method. The
section, we explore the denoising performance of our proposed JBRI method, purpose-built for exclusive handling of impulse
model on array signals under Impulsive noise. The selection of noise, showcases state-of-the-art performance among sensor-
experimental signals is the same as that of the previous group by-sensor methods due to its effective inversion gamma struc-
of experiments. Three groups of sub-experiments are set in ture modeling. Nonetheless, this effectiveness hinges on em-
each type of signal, corresponding to the average SNR of the pirical factors and is sensitive to the specific form of impulsive
received signal on each sensor is −5 dB, 0 dB, and 5dB, and noise, rendering it less robust in dynamic and changeable real-
the MSE is also given to describe the denoising performance world scenarios.
in an all-round way. Our method effectively addresses the aforementioned chal-
The experimental results are shown in Table III. A lenges, sidestepping explicit feature engineering and criteria
comparison is carried out among nine techniques, encom- design. Instead, it leverages the inferential capability of the
passing the proposed approach. Differing from the array deep model to extract profound signal information, resulting
denoising endeavor executed under Gaussian noise, in this in robust denoising outcomes. The noisy signal and denoised
case, the TWAD method encounters difficulties in handling result plot can still be observed in Figs. 14(b) and 16 in the
impulse noise. Thus, we substitute the TWAD method with supplementary material for visual verification, respectively.
the JBRI method and extend it into the array denoising
problem. Observing the overall results, the performance of
F. Downstream Application
non-DL-based methods tends to degrade when subjected to
heightened impulsive noise under the same SNR. In contrast, 1) DOA: In order to further verify the practicability of our
the performance of DL-based methods, including our proposed proposed model, we added the proposed denoising method
method, demonstrates improvement under impulsive noise. to specific array signal processing scenarios, including DOA,
This can be attributed to the fact that deep convolutional neural estimated number of sources, and spatial spectrum estimation.
networks are sensitive to low-frequency information but not to We compare the performance changes of the algorithm before
high-frequency information. This sensitivity enables them to and after denoising. We still use the ULA array form when
exhibit strong robustness in the presence of impulse noise, a generating the data, the signal form is sinusoidal signals, and
feature that non-deep learning-based methods lack. fix the DOA to be 10 degrees, 15 degrees, and 40 degrees
For the wavelet denoising method, on the one hand, the respectively.
impulse noise leads to drastic changes in some wavelet co- Apply the classic MUltiple SIgnal Classification (MUSIC)
efficients, so the wavelet-based denoising method cannot ac- method to conduct 100 DOA estimation experiments, and the
curately identify and deal with the impulsive noise. On the results are shown in Fig. 6. Fig. 6(a) shows the estimated
other hand, for Gaussian noise, due to its statistical properties, spectrum of DOA under the application of noisy signal (-5.00

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4978 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

Fig. 6. (a) MUSIC spectrum. (b) normalized eigenvalues of the DOA estimation task. Comparisons among the noisy signal (−5.00 dB), clean signal, and
denoised signal (10.63 dB) are clearly illustrated.

dB), clean signal, and denoised signal (10.63 dB), respectively. TABLE IV
It can be seen that the introduction of the denoising algorithm ACCURACY COMPARISON OF THE NUMBER OF SOURCES
ESTIMATION METHODS BEFORE (0.00 DB) AND AFTER
improves the DOA estimation performance due to the improved DENOISING (15.02 DB)
SNR. DOA estimation using the original noisy signal cannot
distinguish the two directions of arrival of 10 degrees and 15 Accuracy(%)
Methods
degrees, but these two directions can be successfully distin- Noisy Signal Denoised Signal
guished after denoising, which verifies the effectiveness of our
method in DOA estimation. Fig. 6(b) shows the normalized AIC 64.0 80.0
MDL 59.5 96.0
eigenvalue distribution of the covariance matrix of the noisy
signal (−5.00 dB), clean signal, and denoised signal (10.63
dB). It can be seen that the difference in the eigenvalues of 3) Spatial Spectrum Estimation: This section aims to
the noisy signal is not obvious, which makes the MUSIC al- validate the performance enhancement introduced by the pro-
gorithm unable to accurately distinguish the signal number of posed denoising method when applied to the spatial spectrum
sources. After denoising, the difference in eigenvalue increases estimation algorithm. For this purpose, let us assume the signal
significantly, and the first three eigenvalues are significantly directions to be 10 degrees, 15 degrees, and 40 degrees. As
larger than the last five eigenvalues, which shows that our revealed in Fig. 7, the experimental results distinctly illustrate
denoising algorithm effectively suppresses noise and improves the great impact of applying the denoising algorithm, which
resolution. notably enhances the resolution, effectively generating narrow
2) Estimated Number of Sources: The estimation of the lobes in the direction of the signal, and significantly suppressing
number of sources is a common problem in signal processing. power levels in other directions. After the denoising process,
In this section, we will explore the effectiveness of the the previously ambiguous 10-degree and 15-degree directions
proposed denoising framework for the problem of source exhibit a distinct demarcation, highlighting the efficacy of our
number estimation. We have chosen the two most classic proposed denoising method in the context of spatial spectrum
source number estimation methods, Akaike Information Crite- estimation.
rion (AIC) [50] and Minimum Description Length (MDL) [51], By applying the denoising method to DOA estimation, esti-
as the comparative methods for the experiments in this section. mated number of sources, and spatial spectrum estimation, we
The signals used in the experiments are sinusoidal signals under have validated the practicality of the proposed denoising model
a uniform array. We randomly selected 200 samples for source in certain array processing tasks. In fact, the denoising method,
number estimation and calculated the estimation accuracy. The serving as a versatile technique, can be extended to various
experimental results are shown in Table IV, which indicates downstream tasks (such as Synthetic Aperture Radar, Array
that the denoised signals exhibit a significant improvement Optimization, and Remote Sensing) to enhance the performance
in SNR, resulting in a clearer distinction between the signal of different algorithms.
subspace and the noise subspace. Consequently, both AIC and
MDL methods exhibit superior performance, consistent with
the analysis presented in Section V-F. It is noteworthy that after G. Ablation Study
denoising, the performance of the MDL method surpasses that 1) Iterative Denoising: For certain radar systems, sensitivity
of the AIC method. This is because the MDL method provides to signal glitches is a significant concern. As a solution, we
consistent estimates, while the estimates provided by the AIC introduce iterative smoothing. In simple terms, we repeatedly
method tend to be overestimated. input the signal into the trained network for denoising. Through

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4979

TABLE V
DISCUSSION ON COHERENT SIGNAL SCENARIO
UNDER DIFFERENT CORRELATION COEFFICIENTS ρ.
THE INPUT SIGNAL IS ‘‘CHIRP’’ UNDER 5.00 DB

ρ GSNR MSE in GMSE


0.4 8.64 0.249 0.242
0.8 8.53 0.251 0.240
1.0 8.02 0.243 0.215

The experimental results are shown in Table V. It can be


observed that the denoising performance when the signal is
partially correlated is better than that when it is coherent or
uncorrelated. From the perspective of the array, this may be
Fig. 7. Estimated Spatial Spectrum under noisy signal (0 dB), ideal situa- caused by the partial correlation of the signal can enhance the
tion, and denoised signal (15.02 dB). spatial filtering effect of the array. Similar signal components
are superimposed in space, while noise components are dis-
persed, which helps to enhance the directivity of the array. From
this process, we observed that the resulting signal, while ex-
the perspective of the model, the convolution operation in CNN
periencing a slight loss in GSNR or on the contrary, could
can be regarded as a spatial averaging operation, which can
effectively eliminate the glitches. To demonstrate improved
smooth the signal and noise to a certain extent. When the signals
visual effects, we applied iterative smoothing to a single-sensor
are partially correlated, the convolution operation can better
“blocks” signal. We input the noisy signal into the network for
average the signal components, thus enhancing the coherence
denoising 1, 2, and 3 times, and the resulting output signals are
of the signal in space.
illustrated in Fig. 8. It can be observed that after a single pass
2) Adaption to Other Array Forms: In this section, we
of denoising through the network (see subfigure a), the signal
will discuss the effectiveness of our proposed denoising method
might still contain glitches. After the second denoising pass
for other array forms except ULA. We choose the three most
(see subfigure b), the spikes are nearly completely suppressed,
representative 2-D array forms: URA, UCA, and CA. For URA,
and the SNR even exhibits improvement. However, upon the
the x axis array elements and the y axis array elements are
third denoising pass, the SNR starts to decrease, despite the
MRx = 8, and MRy = 8, respectively. For UCA, MC = 8. For
denoised signal appearing significantly smoother (see subfigure
CA, the number of elements of the two sub-arrays is set to
c). While performing multiple consecutive denoising iterations
MC1 = 5, MC2 = 7 respectively. In each array case, the number
can eliminate glitches and potentially enhance performance,
of signal sources is still set to K = 3, the signal form is chirp,
it relies on human observation to determine whether to apply
and the average SNR is 5 dB under Gaussian noise.
these iterations and how many iterations. For our default exper-
The experimental results are shown in Table VI. Compared
imental setup, we conduct denoising only once to demonstrate
with the 1-D ULA, the denoising performance of the three 2-D
that our network can still outperform other methods in most
arrays is slightly lower, and the GSNR has changed from 8.45
cases.
dB to 7.13 dB, 7.63 dB and 7.53 dB. The observed distinction
may arise from the incorporation of elevation angle effects
H. Discussion within the array manifold of 2D array, which contributes to a
Here, we engage in a discussion regarding adaptation to certain level of complexity added to the received signal inter-
coherent signal scenarios, adaptation to various other array pretation. It is also noted that the performance of our proposed
configurations, the computational complexity, and limitations denoising method on three two-dimensional arrays is similar.
of our proposed method. This is because our method does not rely on the specific array
1) Adaption to Coherent Signal Scenario: In this section, form, but directly learns deep features from the original data,
we discuss the effect of signal correlation on the performance demonstrating the broad adaptability of our proposed method
of our proposed array denoising method. We set up three sets to array forms.
of experiments, corresponding to the signal average correlation 3) Computational Complexity: In this section, we discuss
coefficient ρ = 0.4, 0.8, 1.0 respectively, where the correlation the computational complexity of the proposed model. In deep
coefficient is calculated as follows: learning, computational complexity is often expressed using
E [zi zk∗ ] FLOPs (floating point operations) and latency. Our implemen-
ρik =    . (34) tation can run on both CPUs and GPUs, achieving higher effi-
2 2
E |zi | E |zk | ciency on GPUs (near 0.00005s) compared to other traditional
methods, with also acceptable latency on CPUs.
In each array case, the number of signal sources is still set to It is important to note that while FLOPs represent the compu-
K = 3, the signal form is chirp, and the average SNR is 5 dB tational complexity, they do not directly indicate actual runtime
under Gaussian white noise. efficiency, which is influenced by other factors such as L2 cache

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4980 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

Fig. 8. Illutrations of Iterative smoothing on a single-sensor signal with “blocks” type (−5 dB). Notice that the spikes within the red boxes have been
eliminated.

TABLE VI TABLE VIII


DISCUSSION ON OTHER ARRAY FORMS. THE INPUT SIGNAL IS GENERALIZATION STUDY ON SIGNAL TYPE. OUR MODEL IS TRAINED ON
‘‘CHIRP’’ UNDER 5.00 DB SINE (5.00 DB) SIGNAL AND TESTED ON CHIRP (5.00 DB), SQUARE (5.00
DB), BLOCKS (5.00 DB), AND SPIKE (5.00 DB) RESPECTIVELY. THE
Array forms GSNR MSE in GMSE NUMBERS IN BRACKETS REPRESENT THE DIFFERENCE BETWEEN THE
GENERALIZED PERFORMANCE AND THE ORIGINAL PERFORMANCE (TABLE I)
URA 7.13 0.243 0.200
UCA 7.63 0.264 0.219 Generalized Signal
CA 7.53 0.252 0.210 Training Signal:Sine
Chirp Square Blocks Spike
MSE in 0.169 0.333 1.583 0.0017
GSNR 10.44(-0.60) 19.44(-0.29) 13.02(-1.00) 9.70(-0.76)
GMSE 0.159(-0.002) 0.327(-0.001) 1.510(-0.004) 0.0014(-0.0001)

TABLE VII
FLOPS AND LATENCY FROM 1 TO 3 FORWARD PASSES. THE
LATENCY IS TESTED ON AN NVIDIA RTX 4090 GPU I. Generalization Study
N Forward Passes FLOPs (G) Latency (s) Since situations in the real world may be more varied, this
1 1.78 0.00005 section discusses the generalization ability of the proposed
2 3.56 0.00011 model. We set up two sets of generalization experiments: gen-
3 5.34 0.00014 eralization on signal type and generalization on correlation.
1) Generalization on Signal Type: In this section, we inves-
tigate the generalization ability of our model across different
hits, GPU caching bottlenecks, and memory access efficiency. signal types. We still select five signal types (single-sensor,
We provide the FLOPs for a single forward pass of the network 5dB): Sine, Chirp, Square, Blocks, and Spike. Our model is
as 1.78G. This is a reasonably small amount of FLOPs, allowing trained exclusively on the Sine signal and then tested directly on
for relatively fast execution on any type of GPU or CPU. About the other signal types using the trained model. The results are
multiple forwards during inference, since inference does not shown in Table VIII. The numbers in brackets represent the dif-
require maintaining a computational graph or backpropagation, ference between the generalized performance and the original
performing the forward pass n times will multiply the network’s performance (see Table I). As can be seen, our method achieves
FLOPs by n. We have also included the FLOPs and latency for good performance even when trained on only one type of signal
1 ∼ 3 forward passes in Table VII. and tested on other various types of signal. The performance
4) Limitations: In this section, we discuss the limitations degradation due to generalization is minimal. For example, the
of the proposed model. Since we assume that the noise is generalized results GMSE on the Chirp is only 0.60 dB lower
independent, the biggest limitation of the proposed model is than the GMSE obtained when tested after training on the Chirp,
that it cannot handle structured noise, i.e., non-independent which demonstrates that our model has a strong generalization
noise. In the blind spot network, since the center point is oc- capability across different signal types. In addition, to more
cluded, the model can only infer the information of the center intuitively demonstrate the effect of generalization, we give the
point through the surrounding points. If the noise points are waveforms of each type of signal before and after denoising in
independent, then the surrounding points have no information the generalization experiment. (see Figs. 9 and 10).
related to the noise of the center point. The model will tend 2) Generalization on Correlation: In this section, we in-
to output the mean of various possible cases (which is 0 in vestigate the generalization capability of the proposed model
our assumption). The signals are not independent, so the model with respect to correlation. Specifically, we trained the model
can estimate the signal value of the center point through the using array chirp signals at 5dB with ρ = 0.5, and then tested
surrounding points. The blind spot network essentially performs the trained model on three sets of chirp signals with ρ =
the denoising task by this simple statistical fact. Therefore, 0.4, 0.8, 1.0 directly. The experimental results are shown in
if we deal with structured noise, the blind spot network may Table IX. As observed, compared to the performance obtained
infer part of the noise information of the center point from by testing after training on different ρ values separately (see
the surrounding point, resulting in poor performance. In fu- Table V), the generalization performance shows only a slight
ture work, we will seek effective methods for structured noise decrease. This effectively demonstrates the adaptability of our
removal. proposed model to varying levels of correlation.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: TOWARDS APPLICABLE UNSUPERVISED SIGNAL DENOISING 4981

and spatial spectrum estimation algorithms. In addition, we


set up generalization experiments to confirm that the proposed
model has good generalization ability.

APPENDIX
Theorem A.1: Consider two independent noisy observations
x, y with the unobserved clean signal z, the noisy observations
x, y are independent conditioned on z: Ex,y|z = Ex|z Ey|z ,
and a gap  := Ey|z (y) − Ex|z (x) = 0 exists with condition
Ex|z (x) = x and Ey|z (y) = y + . The variance of y is σ 2 ,
such that,

Fig. 9. Illutrations of noisy signals (5.00 dB) in generalization experiments. Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2
+ 2Ex,z (fθ (x) − z). (35)
Proof: We simplify the equation,
Ex|z fθ (x) − z22 = Ex,y|z (fθ (x) − y) + (y − z)22
= Ex,y|z fθ (x) − y22 + Ey|z y − z22
+ 2Ex,y|z (fθ (x) − y) (y − z)
= Ex,y|z fθ (x) − y22 + σ 2
+ 2Ex,y|z (fθ (x) − z + z − y) (y − z)
= Ex,y|z fθ (x) − y22 + σ 2
+ 2Ex,y|z (fθ (x) − z) (y − z)
+ 2Ey|z (z − y) (y − z)
= Ex,y|z fθ (x) − y22 − σ 2
Fig. 10. Illutrations of denoised signal in generalization study. “Test”
represents the results of training on each type of signal and test on them. + 2Ex,y|z (fθ (x) − z) (y − z). (36)
“Generalized” represents the results of training only on the Sin signal and
test on each type of signal. The independence exists between x and y conditioned on z,
the following holds,
TABLE IX Ex|z fθ (x) − z22 = Ex,y|z fθ (x) − y22 − σ 2
GENERALIZATION STUDY ON CORRELATION. OUR MODEL IS TRAINED ON
ARRAY CHIRP (5.00 DB) SIGNAL WITH THE CORRELATION COEFFICIENT + 2Ex|z (fθ (x) − z) Ey|z (y − z)
ρ = 0.5 AND TESTED ON ARRAY CHIRP (5.00 DB) SIGNAL WITH THE = Ex,y|z fθ (x) − y22 − σ 2
CORRELATION COEFFICIENT ρ = 0.4, 0.8, 1.0 RESPECTIVELY. THE
NUMBERS IN BRACKETS REPRESENT THE DIFFERENCE BETWEEN THE + 2Ex|z (fθ (x) − z). (37)
GENERALIZED PERFORMANCE AND THE ORIGINAL PERFORMANCE (TABLE V)
We have Ex,z = Ez Ex|z , then
Generalized ρ
Training ρ = 0.5
0.4 0.8 1.0 Ex,z fθ (x) − z22 = Ex,y,z fθ (x) − y22 − σ 2
MSE in 0.249 0.251 0.243 + 2Ex,z (fθ (x) − z). (38)
GSNR 8.62(-0.02) 8.28(-0.25) 7.85(-0.17)
GMSE 0.249(0.000) 0.238(-0.002) 0.213(-0.002)

VI. CONCLUSION ACKNOWLEDGMENT

In this paper, we propose a robust and applicable unsuper- The authors would like to thank Professor Xuejing Zhang
vised denoising method that demonstrates versatility across for his paper revision. Thanks to Professor George Atia and
noise types (Gaussian noise and impulse noise), multiple ar- two professional reviewers for their valuable contributions to
ray configurations (ULA, URA, UCA, and CA), and diverse this paper and thanks to Yan Cheng for her encouragement and
degrees of signal correlation, which proves the robustness of companionship.
our method. Furthermore, to underscore the practicality of our
proposed method, we delve into its application within three dis- REFERENCES
tinct array-related contexts: DOA estimation, estimated number [1] H. Krim and M. Viberg, “Two decades of array signal processing
of sources, and spatial spectrum estimation. Simulation results research: The parametric approach,” IEEE Signal Process. Mag., vol. 13,
no. 4, pp. 67–94, Jul. 1996.
show that the proposed method can effectively improve the [2] S. V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction.
performance of DOA estimation, estimated number of sources, Hoboken, NJ, USA: Wiley, 2008.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.
4982 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 72, 2024

[3] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Upper [27] T.-H. Liu and J. M. Mendel, “A subspace-based direction finding
Saddle River, NJ, USA: Prentice-Hall, 1995. algorithm using fractional lower order statistics,” IEEE Trans. Signal
[4] B.-Y. Sun, D.-S. Huang, and H.-T. Fang, “Lidar signal denoising Process., vol. 49, no. 8, pp. 1605–1613, Aug. 2001.
using least-squares support vector machine,” IEEE Signal Process. Lett., [28] E. J. Candes and T. Tao, “Near-optimal signal recovery from random
vol. 12, no. 2, pp. 101–104, Feb. 2005. projections: Universal encoding strategies?” IEEE Trans. Inf. Theory,
[5] K. Naveed, S. Mukhtar, and N. U. Rehman, “Multivariate signal de- vol. 52, no. 12, pp. 5406–5425, Dec. 2006.
noising based on generic multivariate detrended fluctuation analysis,” [29] A. Beck and M. Teboulle, “A fast iterative Shrinkage-Thresholding
in Proc. IEEE Statist. Signal Process. Workshop (SSP), Piscataway, NJ, algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1,
USA: IEEE Press, 2021, pp. 441–445. pp. 183–202, 2009, doi: 10.1137/080716542.
[6] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a [30] T. Tasdizen, “Principal neighborhood dictionaries for nonlocal means
Gaussian denoiser: Residual learning of deep CNN for image denoising,” image denoising,” IEEE Trans. Image Process., vol. 18, no. 12,
IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017. pp. 2649–2660, Dec. 2009.
[7] J. Chai, H. Zeng, A. Li, and E. W. Ngai, “Deep learning in computer [31] M. Elad and M. Aharon, “Image denoising via sparse and redundant
vision: A critical review of emerging techniques and application scenar- representations over learned dictionaries,” IEEE Trans. Image Process.,
ios,” Mach. Learn. Appl., vol. 6, 2021, Art. no. 100134. vol. 15, no. 12, pp. 3736–3745, Dec. 2006.
[8] W. Zhu, S. M. Mousavi, and G. C. Beroza, “Seismic signal denoising [32] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
and decomposition using deep neural networks,” IEEE Trans. Geosci. designing overcomplete dictionaries for sparse representation,” IEEE
Remote Sens., vol. 57, no. 11, pp. 9476–9488, Nov. 2019. Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
[9] Y. Quan, M. Chen, T. Pang, and H. Ji, “Self2Self with dropout: Learning [33] M. Scetbon, M. Elad, and P. Milanfar, “Deep K-SVD denoising,” IEEE
self-supervised denoising from single image,” in Proc. IEEE/CVF Conf. Trans. Image Process., vol. 30, pp. 5944–5955, 2021.
Comput. Vis. Pattern Recognit., 2020, pp. 1890–1898. [34] G. H. Baki̇r, J. Weston, and B. Schölkopf, “Learning to find pre-images,”
[10] D. Yin, C. Luo, Z. Xiong, and W. Zeng, “Phasen: A phase-and- in Proc. Adv. Neural Inf. Process. Syst., vol. 16, 2004, pp. 449–456.
harmonics-aware speech enhancement network,” in Proc. AAAI Conf. [35] J. L. Rojo-Alvarez, O. Barquero-Perez, I. Mora-Jimenez, E. Everss, A.
Artif. Intell., vol. 34, no. 5, 2020, pp. 9458–9465. B. Rodriguez-Gonzalez, and A. Garcia-Alberola, “Heart rate turbulence
[11] A. Li, S. You, G. Yu, C. Zheng, and X. Li, “Taylor, can you hear me denoising using support vector machines,” IEEE Trans. Biomed. Eng.,
now? A Taylor-unfolding framework for monaural speech enhancement,” vol. 56, no. 2, pp. 310–319, Feb. 2009.
2022, arXiv:2205.00206. [36] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
[12] S. Cheng, Y. Wang, H. Huang, D. Liu, H. Fan, and S. Liu, “NBNet: networks for biomedical image segmentation,” in Proc. Med. Im-
Noise basis learning for image denoising with subspace projection,” in age Comput. Comput.-Assisted Intervention–MICCAI 2015: 18th Int.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 4896– Conf., Munich, Germany. Luxembourg, Germany: Springer, 2015,
4906. pp. 234–241.
[13] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a [37] A. Krull, T.-O. Buchholz, and F. Jug, “Noise2Void-learning denoising
Gaussian denoiser: Residual learning of deep CNN for image denoising,” from single noisy images,” in Proc. IEEE/CVF Conf. Comput. Vis.
IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017. Pattern Recognit., 2019, pp. 2129–2137.
[14] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, [38] S. Laine, T. Karras, J. Lehtinen, and T. Aila, “High-quality self-
“SwinIR: Image restoration using swin transformer,” in Proc. IEEE/CVF supervised deep image denoising,” in Proc. Adv. Neural Inf. Process.
Int. Conf. Comput. Vis., 2021, pp. 1833–1844. Syst., vol. 32, 2019, pp. 6970–6980.
[15] K. Naveed and N. ur Rehman, “Wavelet based multivariate signal [39] Q. Lyu and X. Fu, “Identifiability-guaranteed simplex-structured post-
denoising using Mahalanobis distance and EDF statistics,” IEEE Trans. nonlinear mixture learning via autoencoder,” IEEE Trans. Signal Pro-
Signal Process., vol. 68, pp. 5997–6010, 2020. cess., vol. 69, pp. 4921–4936, 2021.
[16] A. M. Rao and D. L. Jones, “A denoising approach to multisensor signal [40] B. Yang, X. Fu, N. D. Sidiropoulos, and K. Huang, “Learning nonlinear
estimation,” IEEE Trans. Signal Process., vol. 48, no. 5, pp. 1225–1234, mixtures: Identifiability and algorithm,” IEEE Trans. Signal Process.,
May 2000. vol. 68, pp. 2857–2869, 2020.
[17] R. J. Kozick and B. M. Sadler, “Maximum-likelihood array processing [41] Y. Liu, Z. Qin, S. Anwar, P. Ji, D. Kim, S. Caldwell, and T. Gedeon,
in non-Gaussian noise with Gaussian mixtures,” IEEE Trans. Signal “Invertible denoising network: A light solution for real noise removal,” in
Process., vol. 48, no. 12, pp. 3520–3535, Dec. 2000. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13365–
[18] R. A. Roberts and C. T. Mullis, Digital Signal Processing. Reading, 13374.
MA, USA: Addison-Wesley, 1987. [42] T. Kwon and J. C. Ye, “Cycle-free CycleGAN using invertible generator
[19] M. H. Hayes, Statistical Digital Signal Processing and Modeling. for unsupervised low-dose CT denoising,” IEEE Trans. Comput. Imag.,
Hoboken, NJ, USA: Wiley, 1996. vol. 7, pp. 1354–1368, 2021.
[20] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Pi- [43] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proc.
card, “Wavelet shrinkage: Asymptopia?” J. Roy. Statist. Soc.: Ser. B IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9446–9454.
(Methodol.), vol. 57, no. 2, pp. 301–337, 1995. [44] W. Shi et al., “Real-time single image and video super-resolution using
[21] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf. an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf.
Theory, vol. 41, no. 3, pp. 613–627, May 1995. Comput. Vis. Pattern Recognit., 2016, pp. 1874–1883.
[22] R.-M. Zhao and H.-m. Cui, “Improved threshold denoising method [45] Y. Mansour and R. Heckel, “Zero-Shot Noise2Noise: Efficient image
based on wavelet transform,” in Proc. 2015 7th Int. Conf. Modelling, denoising without any data,” in Proc. IEEE/CVF Conf. Comput. Vis.
Identification Control (ICMIC), Piscataway, NJ, USA: IEEE Press, 2015, Pattern Recognit., 2023, pp. 14018–14027.
pp. 1–4. [46] X. Ding, X. Zhang, Y. Zhou, J. Han, G. Ding, and J. Sun, “Scaling up
[23] S. Ghael, A. M. Sayeed, and R. G. Baraniuk, “Improved wavelet your kernels to 31x31: Revisiting large kernel design in CNNs,” 2022,
denoising via empirical Wiener filtering,” in Proc. SPIE Tech. Conf. arXiv:2203.06717.
Wavelet Appl. Signal Process., 1997. [47] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden,
[24] H. Choi and R. Baraniuk, “Analysis of wavelet-domain Wiener fil- “Pyramid methods in image processing,” RCA Engineer, vol. 29, no. 6,
ters,” in Proc. IEEE-SP Int. Symp. Time-Frequency Time-Scale Anal- pp. 33–41, 1984.
ysis (Cat. no. 98TH8380), Piscataway, NJ, USA: IEEE Press, 1998, [48] J. C. Goswami and A. K. Chan, Fundamentals of Wavelets: Theory,
pp. 613–616. Algorithms, and Applications. Hoboken, NJ, USA: Wiley, 2011.
[25] R. Sathish and C. Anand, “Spatial wavelet packet denoising for improved [49] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”
DOA estimation,” in Proc. 14th IEEE Signal Process. Soc. Workshop in Proc. Int. Conf. Learn. Representations (ICLR), 2017.
Mach. Learn. Signal Process., Piscataway, NJ, USA: IEEE Press, 2004, [50] Y. Sakamoto, M. Ishiguro, and G. Kitagawa, “Akaike information crite-
pp. 745–754. rion statistics,” Dordrecht, Netherlands: D. Reidel, vol. 81, no. 10.5555,
[26] J. Murphy and S. Godsill, “Joint Bayesian removal of impulse and 1986, Art. no. 26853.
background noise,” in Proc. IEEE Int. Conf. Acoust., Speech Sig- [51] M. Wax and T. Kailath, “Detection of signals by information theoretic
nal Process. (ICASSP), Piscataway, NJ, USA: IEEE Press, 2011, criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33,
pp. 261–264. no. 2, pp. 387–392, Apr. 1985.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on December 26,2024 at 05:20:54 UTC from IEEE Xplore. Restrictions apply.

You might also like