0% found this document useful (0 votes)
10 views7 pages

Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data

This paper presents Radar-diffusion, a novel approach for super-resolving sparse 3D mmWave radar point clouds into dense LiDAR-like point clouds using a diffusion model based on mean-reverting stochastic differential equations. The method effectively handles ghost points and enhances radar data, demonstrating superior performance over existing techniques in 3D radar super-resolution tasks. Additionally, the enhanced point clouds are validated for use in downstream radar point-based registration tasks, marking a significant advancement in all-weather perception applications.

Uploaded by

pearsonicin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data

This paper presents Radar-diffusion, a novel approach for super-resolving sparse 3D mmWave radar point clouds into dense LiDAR-like point clouds using a diffusion model based on mean-reverting stochastic differential equations. The method effectively handles ghost points and enhances radar data, demonstrating superior performance over existing techniques in 3D radar super-resolution tasks. Additionally, the enhanced point clouds are validated for use in downstream radar point-based registration tasks, marking a significant advancement in all-weather perception applications.

Uploaded by

pearsonicin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data

Kai Luan∗ Chenghao Shi∗ Neng Wang Yuwei Cheng Huimin Lu† Xieyuanli Chen†

Abstract— The millimeter-wave radar sensor maintains stable


performance under adverse environmental conditions, making
it a promising solution for all-weather perception tasks, such
as outdoor mobile robotics. However, the radar point clouds
are relatively sparse and contain massive ghost points, which
greatly limits the development of mmWave radar technology.
In this paper, we propose a novel point cloud super-resolution
arXiv:2404.06012v1 [cs.CV] 9 Apr 2024

approach for 3D mmWave radar data, named Radar-diffusion.


Our approach employs the diffusion model defined by mean-
reverting stochastic differential equations (SDE). Using our
proposed new objective function with supervision from corre-
sponding LiDAR point clouds, our approach efficiently handles
radar ghost points and enhances the sparse mmWave radar
point clouds to dense LiDAR-like point clouds. We evaluate our
approach on two different datasets, and the experimental results Fig. 1: Enhancement effect of our method on radar point clouds.
show that our method outperforms the state-of-the-art baseline The image, raw radar points, enhanced point clouds, and LiDAR
methods in 3D radar super-resolution tasks. Furthermore, we point clouds of the corresponding scene are shown in the figure.
demonstrate that our enhanced radar point cloud is capable of
downstream radar point-based registration tasks. Constant false alarm rate (CFAR) [18] is a commonly em-
ployed signal processing method for radar, which adjusts the
I. I NTRODUCTION detection threshold based on the background noise, enabling
stable detection performance. However, it struggles to handle
Camera and LiDAR are two widely used sensors in
a large number of noise points. Cheng et al. [6] propose
robotics and autonomous driving. However, both sensors are
bypassing CFAR to directly learn extracting high-quality
vulnerable to adverse weather conditions, such as rain, fog,
point clouds from raw radar data supervised by LiDAR point
and snow. With the development of robotics and autonomous
cloud. These methods work on raw radar data, which are
driving technologies, there is a great demand for unmanned
exploited to extract high-quality point clouds. However, the
platforms capable of functioning effectively in harsh envi-
extracted point clouds are still sparse. RadarHD [16] builds
ronmental scenarios. Millimeter-wave (mmWave) radar has
dense radar point clouds using an U-Net [19]. But it is
received increased attention as it exhibits robust perfor-
based on 2D radar point cloud lacking height information
mance in such extreme conditions while providing various
and thus cannot handle 3D radar point cloud. To the best
measurements of 3D geometric information and additional
of our knowledge, no prior super-resolution method for 3D
instantaneous velocity. However, radar point clouds suffer
radar point clouds has been proposed.
from a resolution that is two orders of magnitude lower
than LiDAR, presenting significant hurdles for subsequent Recently, diffusion-based approach, denoising diffusion
applications. Additionally, radar point clouds are prone to probabilistic model (DDPM) [10], has demonstrated supe-
artifacts, ghost points, and false targets due to multipath rior performance on image super-resolution [20] and video
effects. Given the extreme sparsity of radar point clouds, restoration [24]. It generates high-quality super-resolved im-
the impact of these noise points is even more pronounced. ages by progressively denoising the degraded input image,
Therefore, obtaining denser point cloud data while effectively making the model particularly suited for high-noise scenar-
handling substantial noise points is the pressing research goal ios. However, applying the diffusion model to sparse radar
for advancing all-weather environmental perception. point clouds is still relatively unexplored and challenging.
In this paper, we employ the idea of DDPM and propose a
K. Luan, C. Shi, N. Wang, H. Lu, X. Chen are with College of Intelligence novel Radar-diffusion for radar point clouds super-resolution,
Science and Technology, National University of Defense Technology, China.
Y. Cheng is with Tsinghua University and ORCA-TECH. as shown in Fig. 1. Our approach begins by transforming
* indicates these authors contribute equally to this work. the radar point clouds into bird’s eye view (BEV) images
† indicates the corresponding authors: H. Lu ([email protected]) X. and then supervised using the corresponding LiDAR BEVs.
Chen ([email protected])
This work was partly supported by the National Science Foundation of During training, we use a diffusion model based on mean-
China under Grant U1913202, U22A2059, and 62203460, Fund for key reverting stochastic differential equations (SDEs) to pro-
Laboratory of Space Flight Dynamics Technology (Num 2022-JYAPAF- cess LiDAR BEV images, simulating the transition from
F1028), Young Elite Scientists Sponsorship Program by CAST (No.
2023QNRC001), and Major Project of Natural Science Foundation of Hunan denser LiDAR data to radar data using our devised objec-
Province under Grant 2021JC0004. tive function. After training, the model reverse denoising
enhances input radar BEV images, producing LiDAR-like, the RadarHD employing an U-Net [19] to generate LiDAR-
denser results for accurate super-resolution. We demonstrate like dense point clouds from low-resolution radar point
that our approach achieves state-of-the-art results in point clouds. Our approach is also a post-processing approach.
cloud super-resolution and exhibits robust generalization Unlike existing methods that focus on single-object super-
capabilities to unseen scenarios. Furthermore, we assess the resolution [9] or 2D radar point cloud super-resolution [16],
performance of the generated high-resolution point cloud our approach addresses 3D radar point cloud super-resolution
in the downstream registration task [21], [23]. The results within the context of autonomous driving scenes. To the best
show that the enhanced point cloud can be well used for of our knowledge, this is the first approach addressing 3D
downstream tasks, revealing its potential for all-weather radar point cloud super-resolution using diffusion model.
perception applications.
In summary, our work makes three main contributions: III. O UR APPROACH
• Proposal of Radar-diffusion as the first approach to We propose Radar-diffusion to enhance sparse mmWave
employ the modified diffusion model for achieving radar point clouds to generate dense LiDAR-like point clouds
dense 3D radar point cloud super-resolution; useful for downstream tasks. The overview of our method
• Demonstration of the superior performance of the pro- is illustrated in Fig. 2. We begin with converting the radar
posed Radar-diffusion incorporating our novel objective and LiDAR point clouds into BEV images. Subsequently,
function in 3D radar point cloud super-resolution; we model the degradation of high-quality LiDAR BEV
• Validation of the usability of the enhanced point clouds images to low-quality radar BEV images using the forward
for downstream registration tasks. diffusion process of mean-reverting SDE. By learning the
corresponding reverse denoising process using our proposed
II. R ELATED W ORK objective function, high-quality LiDAR-like BEV images are
then recovered. Note that no LiDAR data is required during
The sparsity and high noise-to-signal ratio of radar point the generating process.
clouds pose critical challenges hindering the development of
mmWave radar technology. Existing approaches for improv- A. Data Processing
ing the quality of radar point clouds can be categorized into To enable network processing and learning across different
the pre-processing methods and the post-processing methods. sensory modalities, we first convert LiDAR and radar point
Zhang et al. [25] and Cho et al. [7] propose to replace tra- clouds into BEV images and extract their shared field of
ditional methods like fast Fourier transform during the signal view (FOV). The overview of the data processing is illus-
processing with innovative algorithms [25] or learning-based trated in Fig. 3.
algorithms [7]. CFAR [18] is a most commonly employed
Ground point removal. We first remove ground points from
pre-processing approach. While effectively removing clutter
the raw point cloud data, as they lack valuable semantic
points, CFAR also filters out lots of real detection points, re-
information and may hinder the super-resolution learning
sulting in extremely sparse radar point clouds. The learning-
process. Furthermore, radar point clouds usually contain few
based method [2] has been proposed as an alternative to
ground points due to the limited resolution of radar echo
the CFAR process, directly operating on range-Doppler im-
intensity, so no additional steps are required for their re-
ages for subsequent tasks. However, these methods require
moval. For the LiDAR data, we utilize the Patchwork++ [11]
handling a large amount of data, placing high demands on
to detect and remove the ground points from the LiDAR
system bandwidth and computational power. Gall et al. [8]
point cloud. It uses adaptive ground likelihood estimation
employ neural networks to estimate the arrival direction of
to iteratively approximate the ground segmentation region,
mmWave radar acquisition data, improving the accuracy and
ensuring correct separation of ground point clouds even when
enhancing the resolution of acquired point clouds. Cheng et
the ground is elevated by different layers.
al. [6] propose a radar point detector network for high-quality
point cloud extraction incorporating a spatiotemporal filter to Shared field of view extraction. We then align the LiDAR
handle clutter points. However, due to the nature of mmWave point (xl , yl , zl ) to the radar coordinate system by
radar, target points and clutter points are highly correlated.  r r
 ⊤ Rl t l  ⊤
Introducing more radar detection inevitably causes more xc yc zc 1 = xl yl zl 1 , (1)
0 1
clutter points. Furthermore, the generated point clouds using
pre-processing methods are still sparse. where Rrl and trl refer to the rotation and translation matrix
Post-processing methods commonly employ the neural from the radar to the LiDAR coordinate system. As we
network for clutter point handling and super-resolution. intend to use the BEV image to represent the point cloud,
Chamseddine et al. [3] utilize the PointNet [17] to distinguish we simplify the shared FOV extraction by focusing solely
the ghost targets and real targets, resulting in accurate on their shared horizontal FOV. We use a Velodyne HDL-64
radar point clouds. Guan et al. [9] effectively recover high- S3 LiDAR with a horizontal FOV of 360◦ and a FRGen21
frequency object shapes from the original low-resolution radar with a horizontal FOV of 120◦ . By calculating point
radar point clouds in rainy and foggy weather conditions us- yaw angles (θ), we retain LiDAR points and radar points
ing a cGAN [14] architecture. Prabhakara et al. [16] propose whose yaw angles satisfy θ ∈ [30◦ , 150◦ ].
×T ×T

x̃(t − 1)
x

x(0) x(T ) x(T ) x̃(0)

×T ×T

Fig. 2: Training process and generating process of our proposed Radar-diffusion. The training process models the degradation of LiDAR
BEV image to radar BEV image as the forward diffusion process defined by mean-reverting SDE. By learning the reverse denoising
process, the LiDAR-like BEV image is then recovered.

BEV generation: We transform the LiDAR and radar point


clouds into compact BEV images with channel information
representing height. This facilitates fast and efficient feature
learning using mature visual methods. On the other hand,
BEV images can better present overall scenes, enabling
parallel completion of various perception tasks. To create
these BEV images, we retain the points with x coordinates
within the range of [−15, 15], y within the range of [0, 30],
and z within the range of [−0.8, 1.7]. Subsequently, we
compress these points into a 256 × 256 BEV image with
a resolution of 30/256 m. The grayscale value Gi,j for each
pixel {i, j} is determined based on the z value of the highest
point falls within that pixel
 ⊤
Gi,j = [max(Pi,j ∗ 0 0 1 ) − γ]+ /rangez ∗ 255, (2)
where [•]+ = max(•, 0), Pi,j represents the point set falls Fig. 3: The data process of LiDAR and radar point clouds.
within pixel {i, j}, and γ is a predefined threshold. that includes autoregressive Ornstein-Uhlenbeck noise (OU
Multi-frame input: Given the sparse nature of the radar noise) [12] leading to a final state with biased mean and
point cloud, we combine data from multiple consecutive variance. This modification aligns with our objective of
radar frames using their relative poses. In practice, the matching radar data to LiDAR data, teaching the model how
relative poses can be obtained through the point cloud to super-resolve radar data during inference. This forward
registration method [1], [22] or LiDAR odometry method [4]. process can be formulated as
We utilize BEV images generated from the aggregated radar dx = θt (µ − x) dt + σt dw, x(0) ∼ p0 (x), (4)
point cloud from 5 consecutive frames as network input.
where µ is the state mean of the radar BEV image, θt
B. Forward Process based on the Mean-Reverting SDE characterizes the speed of mean reversion, σt is the diffusion
The standard diffusion process defined by SDE follows coefficient. By setting σt2 /θt = 2λ2 , where λ2 is the
stationary variance, we derive the distribution of x(t) as
dx = f (x, t)dt + g(t)dw, x(0) ∼ p0 (x), (3) 
pt (x) = N x(t) | mt (x), vt , (5)
where x refers to the state linked to LiDAR BEV image,
¯
f (x, t) and g(t) are drift and dispersion functions, and w mt (x) : = µ + (x(0) − µ) e−θt , (6)
is a standard Brownian motion. Typically, this leads to a
terminate state x(T ) following a Gaussian distribution with
 
¯
vt : = λ2 1 − e−2θt , (7)
zero mean and fixed variance. Unlike standard SDE widely
applied in vision tasks, adding random Gaussian noise to Z t
x. To model the degradation of the LiDAR BEV image θ¯t : = θz dz, (8)
to the radar BEV image, we employ mean-reverting SDE 0
where the mean state mt and the variance vt converge to matrix M = Jx(0) > 0K and M̄ = Jx(0) == 0K, where
µ and λ2 respectively as t → ∞. This implies that by J•K is an indicator function for which the statement is true.
progressively adding OU noise, the terminate state of LiDAR The modified objective function is written as
BEV image x(T ) converges to the radar BEV image µ with
J = Jtarget + w × Jblank ,
fixed Gaussian noise N (0, λ).
T
X
C. Denoising Process on the Mean-Reverting SDE Jtarget = γi E[∥M ⊙ x̃(i − 1) − M ⊙ x(i − 1)∥],
i=1 (13)
To recover the LiDAR-like BEV image, we reverse the
T
process of mean-reverting SDE by X
Jblank = γi E[∥M̄ ⊙ x̃(i − 1) − M̄ ⊙ x(i − 1)∥],
dx̃ = θt − σt2 ∇x̃ log pt (x̃) dt + σt dw, x̃(T ) = x(T ),
 
i=1
(9) where ⊙ represents the Hadamard product. Using our pro-
where ∇ log pt (x̃) is the score function learned by the time- posed objective function significantly improves the overall
dependent U-Net [10]. performance in our experiments.
Specifically, according to Eq. (5), we can obtain the
ground truth of ∇ log pt (x̃) as IV. E XPERIMENTAL E VALUATION
A. Dataset
x̃(t) − mt
∇x̃ log pt (x̃) = − . (10) We train and evaluate our approach on View-of-Delft
vt
√ (VOD) dataset [15] and RadarHD dataset [16]. The VOD
By rewriting x̃(t) = mt (x̃) + vt ϵt , where ϵt ∼ N (0, I) is dataset contains 8,693 frames of data collected by a Velodyne
the standard Gaussian noise, we further derive Eq. (10) as HDL-64 LiDAR and a FRGen21 radar in complex urban
ϵt traffic environments. The RadarHD dataset is collected in an
∇x̃ log pt (x̃) = − √ . (11)
vt indoor office environment utilizing Ouster OS0-64 LiDAR
The neural network predicts the noise ϵ̃(x̃(t), µ, t) based on and AWR1843 radar without height dimension in the point
current state x̃(t), condition µ, and time t. cloud data. Our evaluating datasets cover both outdoor urban
roads and indoor environments to test the robustness of
D. Objective Function different methods.
Instead of using the standard objective that supervises B. Implementation Details
the network to learn the accurate noise, we follow Luo et
al. [13] and train the U-Net to recover more accurate reversed We train our model using the Lion optimizer [5], with
images by minimizing image residuals at the same stages an initial learning rate of 4 × 10−5 . We choose a more
between forward and denoising processes using the following robust noise level σ = 50 and w = 2 for the forward
objective function diffusion process and objective function, respectively. γi = 1
is equivalent in all timesteps t. We train our model on a single
T
X h i NVIDIA RTX 4090 with a batch size of 8. Total training time
J(ϵ̃) : = γi E ∥ x̃(i) − (dx̃(i))ϵ̃ −x(i − 1)∥ , takes 9 hours, and the .pth format model size is 306.7 MB.
| {z }
i=1
reversed x(i−1) (12) C. Performance on Point Cloud Super-Resolution
T
X
= γi E[∥x̃(i − 1) − x(i − 1)∥], We evaluate the point cloud super-resolution performance
i=1 of our method on VOD and RadarHD datasets. For VOD
dataset, we divide it into 7831 frames for training and 635
where γi represents positive weight, (dx̃(i)ϵ̃ represents the
frames for testing. The test set contains new frames in unseen
mean-reverting SDE defined in Eq. (9) using the score
√ sequences, allowing for a comprehensive evaluation of the
−ϵ̃/ vt learned by the network, and x(i − 1) is the ideal
generalization ability. For RadarHD dataset, we adopt the
state for reversed x̃(i), i.e., the state at time t = i − 1
setup of Prabhakara et al. [16], utilizing 28 trajectories with
in the diffusion process. This objective function exploits
22,784 frames for training and 39 distinct trajectories with
the cumulative error within the denoising process, achieving
36,779 frames for testing. As the RadarHD dataset only
more stable training for image generation tasks.
contains 2D radar data, our approach designed for 3D radar
However, unlike the visual image generation task, LiDAR
point clouds is not directly applicable. Therefore, we employ
and radar BEV images’ data distribution is significantly
the following modifications for training on RadarHD dataset.
imbalanced. We observe that the blank area in LiDAR
We set the grayscale of the input radar BEV image to the
BEV image commonly approximates 20 times larger than
point intensity and the input LiDAR BEV image to {0, 255},
the area with actual sensor detection. Equivalently learning
indicating whether there is a point.
the overall reverse image leads the network to utilize a
conservative strategy that simply sets every confused area to Metrics: We employ the following metrics to evaluate
blank. Therefore, we propose dividing the objective function the quality of the enhanced radar point cloud compar-
into two parts, considering the blank area and the actual ing to the LiDAR point cloud: (i) Fréchet Inception Dis-
detection area separately. To this end, we calculate mask tance (FIDBEV ), the Fréchet distance between the generated
Fig. 4: Qualitative results of our method and the RadarHD on the VOD dataset. Specifically, the point clouds enhanced by RadarHD are
in 2D, while the point clouds enhanced by our method are in 3D.
TABLE I: Super-resolution Performance.
resolution on 3D radar point clouds. The only baseline
VOD dataset we find is RadarHD [16] for 2D radar point cloud super-
resolution. Thus, for a fair comparison, we conduct all
FIDBEV ↓ CD↓ MHD↓ UCD↓ UMHD↓
metrics in 2D, i.e., using only (x, y) coordinates, while
RadarHD 247.2 0.34 0.24 0.45 0.24 providing 3D evaluation in our ablation study. We present
ours 118.4 0.19 0.10 0.15 0.07 experimental results as in Tab. I. As depicted on the VOD
RadarHD dataset dataset, our approach produces superior results across all
FIDBEV ↓ CD↓ MHD↓ UCD↓ UMHD↓
metrics, with an average improvement of 58.4 %. Notably,
our approach exhibits significant advantages over RadarHD
RadarHD 141.1 0.44 0.34 0.38 0.21 in terms of UCD and UMHD metrics, achieving a 64.7 %
ours 139.0 0.59 0.50 0.26 0.13
improvement in UCD and a 70.8 % improvement in UMHD.
The best results are highlighted in bold. All metrics are in 2D. This can be attributed to our approach better faithfully
reproduces the information presented in the LiDAR point
enhanced BEV images and LiDAR BEV images. (ii) Cham- cloud. On the RadarHD dataset, our approach maintains
fer Distance (CD), the average distance from each point to its advantages regarding FID, UCD, and UMHD metrics.
the nearest neighbor point in the other point cloud. (iii) Mod- However, compared to RadarHD, our approach exhibits
ified Hausdorff Distance (MHD), the median distance from certain decreases in the CD and MHD metrics. This is
each point to the nearest neighbor point in the other point because our method can generate denser point clouds and
cloud. Given that radar possesses a stronger penetration even incorporate information that may not be present in the
ability, it can detect objects occluded in the LiDAR point original LiDAR point cloud. However, the enriched points do
clouds, potentially generating more informative point clouds not possess real correspondences in the LiDAR point cloud,
than LiDAR. We thereby present two more metrics for better thus introducing larger errors in CD and MHD metrics.
evaluation: (iv) Unidirectional Chamfer Distance (UCD), The fact that our approach generates denser point clouds is
the CD from LiDAR point cloud to enhanced radar point demonstrated in the subsequent qualitative experiments.
cloud only, and (v) Unidirectional Modified Hausdorff Dis- To provide more insights into the proposed Radar-
tance (UMHD), the MHD from LiDAR point cloud to diffusion, we visualize the enhanced 3D radar point clouds
enhanced radar point cloud only. using Radar-diffusion as in Fig. 4. As RadarHD is only
capable of 2D radar point cloud generation, we visualize
Results: Since radar point cloud super-resolution is a new its generated BEV image for comparison. It can be observed
research direction, no similar work currently achieves super- that the point clouds enhanced by our method effectively
TABLE II: Registration Performance using Enhanced Point Cloud. TABLE III: Ablation Study on the VOD-Dataset.

RR(%)↑ RTE(m)[succ./all]↓ RRE(◦ )[succ./all]↓ case FIDBEV ↓ CD↓ MHD↓ UCD↓ UMHD↓

Raw 88.51 0.11/0.52 0.48/2.39 original 165.4 1.42 1.72 2.39 1.72
Enhanced (ours) 93.10 0.11/0.22 0.61/1.13 w=2 118.4 0.64 0.45 0.52 0.32
w=4 103.9 0.68 0.45 0.73 0.42
The best results are highlighted in bold.
w=5 116.7 0.65 0.45 0.51 0.32

1scans 130.1 0.72 0.52 0.54 0.33


3scans 122.1 0.67 0.47 0.50 0.31
5scans 118.4 0.64 0.45 0.52 0.32
The best results are highlighted in bold.
Besides FIDBEV , other metrics are calculated in 3D.

tration performance. We adjust the correspondence number


according to the point cloud density for a fair comparison. As
depicted, our enhanced point clouds exhibit good consistency
and accuracy for reliable registration. Furthermore, the more
detailed information provided by our enhanced point clouds
allows for more robust registration compared to radar point
clouds. We visualize some registration results using different
point clouds in Fig. 5. As can be seen, due to the sparse
nature of the original radar data, the registration process
Fig. 5: Qualitative results of registration on radar point clouds,
failed to align two overlapping radar scans. In contrast, the
our enhanced radar point clouds, and LiDAR point cloud using enhanced radar data can be aligned as well as the registration
RDMNet [22]. Different colors represent different frames of point results of the corresponding LiDAR point clouds.
clouds.
E. Ablation Studies
capture the overall layout. We further zoom in on represen- We conduct ablation studies to demonstrate the effec-
tative regions, such as vehicles and pedestrians, for closer tiveness of our design in Tab. III. Firstly, we study the
examination. As depicted, our enhanced point clouds possess objective function. As shown, compared to the original
realistic geometric structures for objects while enriching their objective function, using our proposed objective function
details that can be occluded in LiDAR point clouds. significantly improves the performance across all metrics.
Different choices of w in Eq. (13) have different emphasis
D. Performance on Downstream Task: Registration on the final performance of the network. A larger value of w
Our enhanced point clouds present a precise overall layout tends to encourage the network to adopt a more conservative
while possessing enriched details, making them capable approach, making it more inclined to generate unclear or
of downstream tasks. In this experiment, we demonstrate ambiguous regions as blank. We use w = 2 as the default
the capability of our enhanced point cloud for downstream as it achieves the most balance performance. Secondly, we
registration tasks. We evaluate our method on the test sets study the number of input frames. As depicted, our approach
of the VOD dataset. The point cloud pairs with ground truth can work with different numbers of inputs. Merging 5 frames
pose distances more than 1 m are chosen as test samples. of radar point cloud results in the best performance.

Metrics: We employ three metrics to evaluate the registration V. C ONCLUSION


performance: i) Relative Translation Error (RTE), which
In this paper, we present a mean-reverting SDE-based
measures the Euclidean distance between estimated and
diffusion model for 3D mmWave radar point cloud super-
ground truth translation vectors, ii) Relative Rotation Er-
resolution. Our approach models the degradation of high-
ror (RRE), which is the average difference between estimated
quality LiDAR BEV image to low-quality radar BEV image
and ground truth rotation, and iii) Registration Recall (RR),
as the forward diffusion process, and then learns the reverse
representing the fraction of scan pairs with RRE and RTE
process to recover high-quality LiDAR-like BEV images.
below certain thresholds, e.g., 5◦ and 0.5 m.
We propose to improve the object function to make it more
Results: In Tab. II, we present registration results on raw suitable for radar super-resolution tasks. Experiments show
radar point clouds and enhanced radar point clouds generated that our method can gently handle the massive clutter points
by our Radar-diffusion utilizing the state-of-the-art regis- in radar point clouds while enhancing them into high-quality
tration method, RDMNet [22]. It is a deep learning-based LiDAR-like point clouds. Additionally, we demonstrate that
method that finds dense point matches over two point clouds our enhanced point clouds can be effectively utilized for
and subsequently performs accurate registration. We directly downstream registration tasks, laying the foundation for
apply it to different point cloud data to evaluate the regis- future all-weather perception applications.
R EFERENCES driving. IEEE Trans. on Intelligent Transportation Systems (T-ITS),
2023.
[1] P. Besl and N. McKay. A Method for Registration of 3D Shapes. [23] C. Shi, X. Chen, J. Xiao, B. Dai, and H. Lu. Fast and accurate deep
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), loop closing and relocalization for reliable lidar slam. arXiv preprint,
14(2):239–256, 1992. 2023.
[2] D. Brodeski, I. Bilik, and R. Giryes. Deep radar detector. In 2019 [24] V. Voleti, A. Jolicoeur-Martineau, and C. Pal. Mcvd-masked con-
IEEE Radar Conference (RadarConf), pages 1–6. IEEE, 2019. ditional video diffusion for prediction, generation, and interpolation.
[3] M. Chamseddine, J. Rambach, D. Stricker, and O. Wasenmuller. Ghost Proc. of the Advances in Neural Information Processing Systems
target detection in 3d radar data using point cloud based deep neural (NIPS), 35:23371–23385, 2022.
network. In Proc. of the Intl. Conf. on Pattern Recognition (ICPR), [25] F. Zhang, C. Wu, B. Wang, and K.R. Liu. mmeye: Super-resolution
pages 10398–10403. IEEE, 2021. millimeter wave imaging. IEEE Internet of Things Journal, 8(8):6995–
[4] X. Chen, A. Milioto, E. Palazzolo, P. Giguère, J. Behley, and C. Stach- 7008, 2020.
niss. SuMa++: Efficient LiDAR-based Semantic SLAM. In Proc. of
the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS),
2019.
[5] X. Chen, C. Liang, D. Huang, E. Real, K. Wang, Y. Liu, H. Pham,
X. Dong, T. Luong, C.J. Hsieh, et al. Symbolic discovery of
optimization algorithms. arXiv preprint, 2023.
[6] Y. Cheng, J. Su, M. Jiang, and Y. Liu. A novel radar point cloud
generation method for robot environment perception. IEEE Trans. on
Robotics (TRO), 38(6):3754–3773, 2022.
[7] H.W. Cho, W. Kim, S. Choi, M. Eo, S. Khang, and J. Kim. Guided
generative adversarial network for super resolution of imaging radar.
In 2020 17th European Radar Conference (EuRAD), pages 144–147.
IEEE, 2021.
[8] M. Gall, M. Gardill, T. Horn, and J. Fuchs. Spectrum-based single-
snapshot super-resolution direction-of-arrival estimation using deep
learning. In 2020 German Microwave Conference (GeMiC), pages
184–187. IEEE, 2020.
[9] J. Guan, S. Madani, S. Jog, S. Gupta, and H. Hassanieh. Through
fog high-resolution imaging using millimeter wave radar. In Proc. of
the IEEE/CVF Conf. on Computer Vision and Pattern Recognition
(CVPR), pages 11464–11473, 2020.
[10] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic
models. Proc. of the Advances in Neural Information Processing
Systems (NIPS), 33:6840–6851, 2020.
[11] S. Lee, H. Lim, and H. Myung. Patchwork++: Fast and robust ground
segmentation solving partial under-segmentation using 3d point cloud.
In Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems
(IROS), pages 13276–13283. IEEE, 2022.
[12] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Sil-
ver, and D. Wierstra. Continuous control with deep reinforcement
learning. arXiv preprint, 2015.
[13] Z. Luo, F.K. Gustafsson, Z. Zhao, J. Sjölund, and T.B. Schön. Image
restoration with mean-reverting stochastic differential equations. arXiv
preprint, 2023.
[14] M. Mirza and S. Osindero. Conditional generative adversarial nets.
arXiv preprint, 2014.
[15] A. Palffy, E. Pool, S. Baratam, J.F. Kooij, and D.M. Gavrila. Multi-
class road user detection with 3+ 1d radar in the view-of-delft dataset.
IEEE Robotics and Automation Letters (RA-L), 7:4961–4968, 2022.
[16] A. Prabhakara, T. Jin, A. Das, G. Bhatt, L. Kumari, E. Soltanaghai,
J. Bilmes, S. Kumar, and A. Rowe. High resolution point clouds
from mmwave radar. In Proc. of the IEEE Intl. Conf. on Robotics &
Automation (ICRA), pages 4135–4142. IEEE, 2023.
[17] C.R. Qi, H. Su, K. Mo, and L.J. Guibas. PointNet: Deep Learning
on Point Sets for 3D Classification and Segmentation. In Proc. of
the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
2017.
[18] M.A. Richards. Fundamentals of radar signal processing. McGraw-
Hill Education, 2022.
[19] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional
networks for biomedical image segmentation. In Medical Image
Computing and Computer-Assisted Intervention–MICCAI 2015: 18th
International Conference, Munich, Germany, October 5-9, 2015, Pro-
ceedings, Part III 18, pages 234–241. Springer, 2015.
[20] C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet,
and M. Norouzi. Palette: Image-to-image diffusion models. In ACM
SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
[21] C. Shi, X. Chen, K. Huang, J. Xiao, H. Lu, and C. Stachniss. Keypoint
Matching for Point Cloud Registration using Multiplex Dynamic
Graph Attention Networks. IEEE Robotics and Automation Letters
(RA-L), 6:8221–8228, 2021.
[22] C. Shi, X. Chen, H. Lu, W. Deng, J. Xiao, and B. Dai. Rdmnet:
Reliable dense matching based point cloud registration for autonomous

You might also like