BotanicGarden A High-Quality Dataset For Robot Navigation in Unstructured Natural Environments
BotanicGarden A High-Quality Dataset For Robot Navigation in Unstructured Natural Environments
3, MARCH 2024
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2799
B. Representative Datasets
Over the past two decades, the field of mobile robotics
has witnessed the introduction of numerous publicly available
datasets, mainly consisting of structured environments such as
Fig. 1. Top: A bird view of the 3D survey map of BotanicGarden; Middle: the urban, campus, and indoor scenarios. Among the earliest efforts,
robot is walking through the narrow path and riverside; Bottom: a detailed view
of the 3D map in GNSS-denied thick woods.
the most notable presented datasets include MIT-DARPA [18],
Rawseeds [19], and KITTI [4]. These datasets offered a compre-
hensive range of sensor types and accurate ego-motion ground
r We build a novel multi-sensory dataset in an over 48000 truth derived from D-GNSS systems. During this early phase, the
m2 botanic garden with 33 long and short sequences main objective of these datasets was to fulfill basic testing and
and 17.1 km trajectories in total, containing dense and validation requirements. As a result, the collection environments
diverse natural elements that are scarce in previous were intentionally designed to be relatively simple. However,
resources. exactly due to the idealistic illuminations, weathers, and static
r We employed comprehensive sensors, including high-res scene layouts, these datasets have received concerns for being
and high-rate stereo gray and RGB cameras, spinning and too ideal for algorithm assessments [4].
MEMS 3D LiDARs, and low-cost and industrial-grade To complement previous datasets with a greater emphasis
IMUs, supporting a wide range of applications. By elabo- on real-life factors, significant efforts have been made in the
rate development of the system, we have achieved highly- subsequent years. On the one hand, several long-term datasets
precise hardware-synchronization. Both the availability of have been proposed, including NCLT [7], Oxford RobotCar
sensors and sync-quality are at the top-level in this field. [8], KAIST Day/Night [20], and 4-Seasons [11], incorporating
r We provide both highly-precise 3D map and trajectories diverse temporal variations, weather conditions, and seasonal
ground truth by dedicated surveying works and advanced effects. On the other hand, to address the need for more complex
map-based localization algorithm. We also provide dense and dynamic environments, ComplexUrban [9] and UrbanLoco
vision semantics labeled by experienced annotators. This [21] were developed. ComplexUrban focused on metropolitan
is the first field robot navigation dataset that provides such areas in South Korea, while UrbanLoco covered cities in Hong
all-sided and high-quality reference data. Kong and San Francisco, bringing in challenging features like
urban canyon, dense buildings, and congested traffics. Through-
out this stage, datasets have played a crucial role in pushing the
II. RELATED WORKS boundaries of algorithms, aiming to enhance their robustness for
real-world applications.
A. SO/SLAM-Based Navigation
Many indoor and 6-DoF datasets also exist. Famous reposi-
Traditional navigation systems are typically achieved with tories include TUM-RGBD [5], EuRoC [6], TUM-VI [22], and
GNSS (Global Navigation Satellite System), and filtered with more, which significantly promote the research of visual and
inertial data. GNSS can provide drift-free global positioning at visual-inertial navigation systems (VINS). Besides, in recent
meters level, while inertial data are in duty of attitude and can years, high-quality multi-modal datasets were also continuously
boost the frequency to more than 100Hz. However, as is well emerging, such as OpenLORIS [23], M2DGR [24], Newer Col-
known, GNSS requires an open-sky to locate reliably, while is lege [10], and Hilti SLAM [25]. These datasets encompass a
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
2800 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 3, MARCH 2024
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2801
TABLE I
COMPARISON OF DIFFERENT NAVIGATION DATASETS
Fig. 2. Left: The robot platform design and its base coordinate; Middle: the multi-sensory system and the corresponding coordinate (the camera below the VLP16
is only for visualization usage, thus is not annotated); Right: the synchronization system of the whole platform.
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
2802 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 3, MARCH 2024
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2803
Fig. 5. Top: sample frames of typical scene features (riversides, thick woods, grasslands, bridges, narrow paths, etc.); middle: the corresponding 3D map venues;
Bottom: dense semantic annotations of the corresponding frames.
TABLE III
SEQUENCES SPECIFICATIONS AND SOTA ALGORITHMS ASSESSMENT (VISUAL, LIDAR, AND SENSOR FUSION METHODS, SINGLE-RUN RESULTS)
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
2804 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 3, MARCH 2024
Fig. 8. Visualization of the SOTA-estimated trajectories against the ground truth. From left to right: sequence 1005–00, 1005–01, 1005–07, 1006–01, 1008–03,
1018–00, and 1018–13; from top to bottom: top view of the trajectories, and the Z-axis errors.
validated it using a total station, identifying an around 1 cm masks are meticulously generated with dense pixel-level human
global consistency for target points. annotations, as shown in Fig. 5. All data are provided in LabelMe
[42] format and support future reproductions. It is expected
that these data can well facilitate robust motion estimation and
G. Ground Truth Pose semantic 3D mapping research. Additional detailed information
Serving as the reference of navigation algorithms, ground can be accessed on our website.
truth poses are supposed to be complete and globally accurate.
This is why GNSS is widely used for ground truth generation,
while incremental techniques such as SLAM are not authen- IV. EXAMPLE DATASET USAGE
tic due to the cumulative drift. However, in sky-blocked and A. Vision/LiDAR/Multi-Sensor-Fusion Navigation
complex environments, conventional means such as D-GNSS,
Laser Tracker, and MoCap can hardly work consistently: our To verify the versatility of our dataset in navigation research,
garden scenario exactly belongs to this scope. To bridge the we select 7 sample sequences and conduct a thorough assessment
gap, we take advantage of the authentic GT-map, and develop on state-of-the-art algorithms (visual, LiDAR, and multi-sensor
a map-based localization algorithm to calculate GT-pose using fusion methods) against the ground truth, regarding the metrics
the on-robot VLP16 LiDAR. As the map is quite unstructured, of relative pose error (RPE) and absolute trajectory error (ATE)
and VLP16 is sparse, naive registration methods such as ICP [5]. The evaluation results are listed in Table III, and the trajec-
cannot correctly converge on its own. This requires an accurate tories comparisons are visualized in Fig. 8.
local tracking thread to provide a good initial pose for regis- From the evaluation results we get mainly three conclusions:
tration. To this end, we build a full-stack GT-pose system by 1) Our dataset can support a wide range of navigation frame-
fusing global initializer, VIO local tracker, and fine-registration works, including but not limited to stereo vision, visual-
modules. Firstly, the initializer searches the beginning frame in inertial, LiDAR-only, LiDAR-inertial, and visual-LiDAR-
a scan-referenced image database for possible candidates, and inertial based methods. This also demonstrates the good
subsequently an accumulated LiDAR segment can be registered spatial calibration and time synchronization quality of our
to the GT-map for final initialization; Then, the VIO local tracker dataset.
keeps estimating the robot motions to pre-register and undistort 2) Our dataset is a challenging benchmark for ground robots.
the LiDAR data; Finally, the fine-registration module employs As shown by the results, the RPE errors are around 5–10
point-to-plane ICP for final localization, as illustrated in Fig. 7. times larger than KITTI leaderboard (ORB-stereo even
As the scene is really complex, we have slowed down the data failed 2/7 of the tests due to the indistinct textures and
playback rate and human monitored the visualization panel to large view change at sharp corners); and it can be clearly
make sure the GT-poses have converged correctly. To assess the identified that, most algorithms have met significant Z-axis
accuracy of this method, we use a Leica MS60 laser tracker to error in the traverses, which should be paid more attention
crosscheck 32 stationary trajectory points in both normal- and in future research. Besides, a noteworthy finding is that,
dense-vegetated areas within the garden, resulting in 0.6 cm although designed loop closures in all sequences, only
and 2.3 cm accuracy respectively. Even considering the LiDAR 8/21 tests (visual methods) have succeeded in detection,
motion distortion that cannot be fully rectified (the up limit can indicating a high textural monotonicity of our data.
be set by the 2–5% distance drift of LiDAR-inertial odometry 3) Multi-sensor fusion is an inevitable trend of future navi-
[16], [41]), under an up to 15cm per frame motion speed, our gation research. It can be clearly seen that, compared with
GT-pose can still be defined with cm-level accuracy. vision- and LiDAR-centric methods, multi-sensor fusion
frameworks have earned very obvious elevation on both
accuracy and robustness performance: we thus expect that
H. Semantic Annotation our dataset can serve as a research incubator for novel
Semantic segmentation is the highest perception level of a sensor fusion mechanisms.
robot. As a comprehensive and high-quality dataset, we empha-
size the role that semantic information plays in navigation. Since
B. Other Possible Usage
our LiDARs are relatively sparse, we have arranged the anno-
tation at 2D-image level. Our semantic segmentation database While our dataset is primarily designed for navigation re-
consists of 1181 images in total, including 27 classes such search, its comprehensive data and ground truths enable its use-
as various types of vegetations (bush, grass, tree, tree trunks, fulness in various robotic tasks, including 3D mapping, semantic
water plants), fixed facilities, drivable regions (trails, roads, segmentation, image localization, depth estimation, etc. New
grassland), rivers, bridges, sky, and more. The segmentation chances and data will be continuously released on our website.
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2805