0% found this document useful (0 votes)
14 views

BotanicGarden A High-Quality Dataset For Robot Navigation in Unstructured Natural Environments

Uploaded by

SV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

BotanicGarden A High-Quality Dataset For Robot Navigation in Unstructured Natural Environments

Uploaded by

SV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2798 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO.

3, MARCH 2024

BotanicGarden: A High-Quality Dataset for Robot


Navigation in Unstructured Natural Environments
Yuanzhi Liu , Yujia Fu , Minghui Qin , Yufeng Xu , Baoxin Xu , Fengdong Chen , Bart Goossens ,
Poly Z.H. Sun , Hongwei Yu , Chun Liu , Long Chen , Wei Tao , and Hui Zhao

Abstract—The rapid developments of mobile robotics and au- I. INTRODUCTION


tonomous navigation over the years are largely empowered by
OBILE robots play a crucial role in today’s social de-
public datasets for testing and upgrading, such as sensor odometry
and SLAM tasks. Impressive demos and benchmark scores have
arisen, which may suggest the maturity of existing navigation
M velopment and productivity evolution. Over the years,
with the rapid progress of autonomous navigation, various ap-
techniques. However, these results are primarily based on moder- plications have emerged, such as robotaxi, unmanned logistics,
ate structured scenario testing. When transitioning to challenging service robots, and more [1]. Meanwhile, existing algorithms
unstructured environments, especially in GNSS-denied, texture- begin to saturate the benchmarks, which may suggest that current
monotonous, and dense-vegetated natural fields, their performance navigation techniques have achieved a maturity in moderate and
can hardly sustain at a high level and requires further validation structured scenarios. However, robots often need to perform
and improvement. To bridge this gap, we build a novel robot more complex tasks and work in unstructured environments,
navigation dataset in a luxuriant botanic garden of more than 48000
m2 . Comprehensive sensors are used, including Gray and RGB
which consequently imposes higher demands on the capabilities
stereo cameras, spinning and MEMS 3D LiDARs, and low-cost and robustness of navigation systems.
and industrial-grade IMUs, all of which are well calibrated and Modern navigation techniques such as Sensor Odometry (SO)
hardware-synchronized. An all-terrain wheeled robot is employed and Simultaneous Localization and Mapping (SLAM) [2] are
for data collection, traversing through thick woods, riversides, indeed highly dependent on good scene compatibility and posi-
narrow trails, bridges, and grasslands, which are scarce in previous tioning aids to avoid tracking losses and cumulative drift. In well
resources. This yields 33 short and long sequences, forming 17.1 km textured and structured environments, both vision- and LiDAR-
trajectories in total. Excitedly, both highly-accurate ego-motions based navigation methods can operate reliably by integrating
and 3D map ground truth are provided, along with fine-annotated inertial sensors and external positioning signals. However, in
vision semantics. We firmly believe that our dataset can advance
problematic unstructured scenarios involving GNSS denial, tex-
robot navigation and sensor fusion research to a higher level.
tural monotonicity, and especially within dense-vegetated natu-
Index Terms—Data sets for SLAM, field robots, data sets for ral fields, their performances can hardly sustain at a high level
robotic vision, navigation, unstructured environments. and necessitate further validation.
As is well known, due to the costly hardware and complicated
experiments, robot navigation research relies heavily on pub-
licly available datasets for testing and upgrading [3]. The most
famous resources, including KITTI [4], TUM-RGBD [5], and
EuRoC [6], have become the indispensable references in today’s
Manuscript received 15 October 2023; accepted 17 January 2024. Date of
publication 29 January 2024; date of current version 12 February 2024. This algorithm developments. Other newer datasets such as NCLT
letter was recommended for publication by Associate Editor X. Zuo and Editor [7], Oxford RobotCar [8], Complex Urban [9], Newer College
J. Civera upon evaluation of the reviewers’ comments. This work was supported [10], and 4-Seasons [11] also complement a wide scene variety.
by the National Key R&D Program of China under Grant 2018YFB1305005. However, such datasets are mainly with urbanized and indoor
(Yujia Fu, Minghui Qin, and Yufeng Xu contributed equally to this work.)
(Corresponding authors: Bart Goossens; Hui Zhao.)
environments, which cannot serve as qualified benchmarks for
Yuanzhi Liu, Yujia Fu, Minghui Qin, Yufeng Xu, Baoxin Xu, Wei Tao, and the aforementioned problematic scene settings. This motivates
Hui Zhao are with the School of Sensing Science and Engineering, Shanghai us to build a novel dataset in unstructured natural environments
Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]; to further promote research in robot navigation.
[email protected]; [email protected]; [email protected]; In this paper, we introduce a high-quality robot navigation
[email protected]; [email protected]; [email protected]).
Fengdong Chen is with the School of Instrumentation, Harbin Institute of dataset which is collected in a luxuriant botanic garden of over
Technology, Harbin 150001, China (e-mail: [email protected]). 48000 m2 . An all-terrain robot, equipped with strictly integrated
Bart Goossens is with the imec-IPI-Ghent University, 9000 Gent, Belgium stereo cameras, LiDARs, IMUs, and wheel odometry, traverses
(e-mail: [email protected]). diverse natural areas including dense woods, riversides, narrow
Poly Z.H. Sun is with the School of Mechanical Engineering, Shanghai Jiao
Tong University, Shanghai 200240, China (e-mail: [email protected]).
trails, bridges, and grasslands, as depicted in Fig. 1. Here GNSS
Hongwei Yu is with the Chinese Aeronautical Radio Electronics Research cannot work reliably due to the block of thick vegetations, and
Institute, Shanghai 200233, China (e-mail: [email protected]). the repetitive green features and unstructured surroundings may
Chun Liu is with the College of Surveying and Geo-Informatics, Tongji also shake the performance of motion and recognition modules.
University, Shanghai 200092, China (e-mail: [email protected]). The work most similar to ours could be Montmorency [12], while
Long Chen is with the Institute of Automation, Chinese Academy of Sciences,
Beijing 100190, China (e-mail: [email protected]). it focuses more on LiDAR mapping, lacking in sensors variety,
Website: https://github.com/robot-pesg/BotanicGarden scene scale and diversity, and authentic ground truth. Our main
Digital Object Identifier 10.1109/LRA.2024.3359548 contributions are as follows:
2377-3766 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2799

unworkable indoors and is out of precision in denied outdoor


areas such as urban canyon, tunnels, and forests. These failure
cases motivate the developments of modern SO/SLAM-based
navigation which employ vision and LiDAR as centric sensors.
SO is the process of tracking an agent’s location incrementally
over time, with perception and navigation sensors. It has been
widely researched over the years, forming mature implementa-
tions such as Visual and Visual-Inertial Odometry (VO/VIO),
which are compact and computationally lightweight. As an
extension of SO, SLAM is a process of building a map of
the environments while simultaneously keeping the track of the
agent’s locations within it. Compared with SO, SLAM could be
more accurate and robust: by loop closure corrections, SLAM
is able to optimize the map and path to bound cumulative
drifts; and it is also possible to re-localize after tracking losses
by searching the base map. Famous SO/SLAM frameworks
include VINS-Mono [13], ORB-SLAM [14], [15], LOAM and
its extensions [16], [17], etc. According to the benchmark results,
state-of-the-art methods exhibit good performance in structured
environments and can handle occasional challenges. However,
their robustness in complex unstructured scenarios characterized
by dense natural elements and monotonous textures remains
questionable and necessitates further validation.

B. Representative Datasets
Over the past two decades, the field of mobile robotics
has witnessed the introduction of numerous publicly available
datasets, mainly consisting of structured environments such as
Fig. 1. Top: A bird view of the 3D survey map of BotanicGarden; Middle: the urban, campus, and indoor scenarios. Among the earliest efforts,
robot is walking through the narrow path and riverside; Bottom: a detailed view
of the 3D map in GNSS-denied thick woods.
the most notable presented datasets include MIT-DARPA [18],
Rawseeds [19], and KITTI [4]. These datasets offered a compre-
hensive range of sensor types and accurate ego-motion ground
r We build a novel multi-sensory dataset in an over 48000 truth derived from D-GNSS systems. During this early phase, the
m2 botanic garden with 33 long and short sequences main objective of these datasets was to fulfill basic testing and
and 17.1 km trajectories in total, containing dense and validation requirements. As a result, the collection environments
diverse natural elements that are scarce in previous were intentionally designed to be relatively simple. However,
resources. exactly due to the idealistic illuminations, weathers, and static
r We employed comprehensive sensors, including high-res scene layouts, these datasets have received concerns for being
and high-rate stereo gray and RGB cameras, spinning and too ideal for algorithm assessments [4].
MEMS 3D LiDARs, and low-cost and industrial-grade To complement previous datasets with a greater emphasis
IMUs, supporting a wide range of applications. By elabo- on real-life factors, significant efforts have been made in the
rate development of the system, we have achieved highly- subsequent years. On the one hand, several long-term datasets
precise hardware-synchronization. Both the availability of have been proposed, including NCLT [7], Oxford RobotCar
sensors and sync-quality are at the top-level in this field. [8], KAIST Day/Night [20], and 4-Seasons [11], incorporating
r We provide both highly-precise 3D map and trajectories diverse temporal variations, weather conditions, and seasonal
ground truth by dedicated surveying works and advanced effects. On the other hand, to address the need for more complex
map-based localization algorithm. We also provide dense and dynamic environments, ComplexUrban [9] and UrbanLoco
vision semantics labeled by experienced annotators. This [21] were developed. ComplexUrban focused on metropolitan
is the first field robot navigation dataset that provides such areas in South Korea, while UrbanLoco covered cities in Hong
all-sided and high-quality reference data. Kong and San Francisco, bringing in challenging features like
urban canyon, dense buildings, and congested traffics. Through-
out this stage, datasets have played a crucial role in pushing the
II. RELATED WORKS boundaries of algorithms, aiming to enhance their robustness for
real-world applications.
A. SO/SLAM-Based Navigation
Many indoor and 6-DoF datasets also exist. Famous reposi-
Traditional navigation systems are typically achieved with tories include TUM-RGBD [5], EuRoC [6], TUM-VI [22], and
GNSS (Global Navigation Satellite System), and filtered with more, which significantly promote the research of visual and
inertial data. GNSS can provide drift-free global positioning at visual-inertial navigation systems (VINS). Besides, in recent
meters level, while inertial data are in duty of attitude and can years, high-quality multi-modal datasets were also continuously
boost the frequency to more than 100Hz. However, as is well emerging, such as OpenLORIS [23], M2DGR [24], Newer Col-
known, GNSS requires an open-sky to locate reliably, while is lege [10], and Hilti SLAM [25]. These datasets encompass a
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
2800 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 3, MARCH 2024

wide range of real-life challenges, providing valuable opportu- D. Discussions


nities for algorithm validation and improvement. In summary, there is a significant gap between unstruc-
Up to the present, there is a relatively abundant availability
tured and structured environments in terms of scene patterns,
of datasets in structured environments, which have become which poses much severer challenges for navigation algorithms.
increasingly comprehensive and challenging. However, while However, existing datasets still have notable limitations in this
existing algorithms have shown promising performance in such regard, particularly in environments with dense natural elements
scenarios, due to the wide variation in scene patterns, their and degraded GNSS services, where the obtainment of ground
capabilities in unstructured environments remains questionable truth remains a problem. This paper fills the gap by introducing a
and necessitate concrete and targeted validation. novel high-quality dataset in a luxuriant botanic garden. Table I
compares our work with key state-of-the-arts and several highly-
relevant unstructured counterparts, showing that our sensors
availability, time-sync, and ground truth quality are all at the
C. Datasets in Unstructured Environments top-level in this field. We thus believe that our work will be
extremely beneficial for mobile robotics community.
Unstructured environments refer to scenarios that lack clear,
regular, or well-defined features. In such scenarios, there is
typically a lack of apparent motifs or geometric shapes, which III. THE BOTANIC GARDEN DATASET
raise great difficulty for robotics algorithms to recognize and A. Acquisition Platform
track. Datasets in unstructured environments typically involve
those collected in sandy and rocky fields, undergrounds, rural To cope with the complex field environments, we employ
areas, rivers, and scenes rich in natural elements (forests, wilds, an all-terrain wheeled robot Scout V1.0 from AgileX for data
and diverse vegetations). Different from datasets in structured collection. It works in a powerful 4-wheel-drive (4WD) differ-
environments, efforts paid to unstructured scenarios are rela- ential mechanism, which can ensure high driving robustness
tively less. An overview of existing works is given below. and obstacle crossing ability in fields and wilds. Each wheel
For datasets with sandy and rocky scenarios, Furgale et al. contains a 1024-line encoder to provide ego-motions, and we
[26] created a long-range robot navigation dataset with stereo have developed a set of corresponding programs to calculate
cameras on Devon Island; Vayugundla et al. [27] recorded two the robot dead-reckoning odometers. To ensure a low latency
sequences on Mount Etna with stereo vision, IMU, and odometry communication, the robot is configured to link with the host
sensors; Hewitt et al. [28] collected a dataset in Katwijk beach via a highspeed CAN bus at 500kbps, which can lower the
with a wide array of high-quality sensors; and Meyer et al. transmission time to less than 1ms. Besides, the host controller is
[29] recorded diverse visual-inertial sequences in the Moroccan performed by an Intel NUC11 running with a Real-Time Linux
desert. These datasets were challenging for vision methods kernel1 to minimize the clock jitter and data buffer time. We have
mainly due to the monotonous texture of the scenes. customized the NUC to support dual-Ethernet with Precision
For underground environments, Leung et al. [30] collected a Time Protocol (PTP,2 also known as IEEE1588) capability,
2km sequence in a large mine of Chile, and Rogers et al. [31] which is able to be synchronized with other devices at sub-µs
created a grand dataset within a huge tunnel circuit. In such accuracy.
cases, the challenge mainly lies on the absence of GNSS, where On top of the robot chassis, we design a set of aluminum
the robots may rely solely on ego-sensors for navigation. profiles to carry the batteries, computers, controllers, sensors,
For scenes of rural areas and rivers, Chebrolu et al. [32] and and the display, as illustrated in Fig. 2. The computer used
Pire et al. [33] collected various sequences in croplands; and for data collection is an Advantech MIC-7700 Industrial PC
VI-Canoe [34] and USVInland [35] respectively built a dataset assembled with a PCIE expansion module. It houses an Intel
in rural rivers and inland waterways. They introduced challenges Core i7-6700TE 4C8T processor running with Ubuntu 18.04.1
related to the lack of distinct features and interference caused LTS and ROS Melodic systems. A total of 8 USB 3.0, 10 GigE,
by water and surrounding vegetations. and a set of GPIO and serial ports are available. All the GigE
For scenes rich of natural elements, which is also the scope of ports supports PTP, available for precise time synchronization.
our work, Rellis-3D [36] and TartanDrive [37] focused on multi- For high-speed data logging, 2 × 16 GB DDR4 memories
modal datasets in off-road terrains, and FinnForest [38] recorded (dual-channel) and a 2 TB Samsung 980 Pro NVME SSD
diverse visual-inertial sequences along wide roads in a large for- (of 3-bit MLC, over 1.5 GB/s sequential writes throughout
est. These datasets intentionally incorporated challenges related the whole storages) are equipped for real time database. To
to monotonous textures and lack of structural cues, yet they ensure full communication bandwidth, both the GigE cards (for
still managed to secure reliable GNSS signals, which may not sensor streaming) and the SSD are fastened to the PCIE slots
represent the most demanding case in such scope. Towards inner that directly linked to the CPU. Benefiting from our elaborate
and denser natural spaces, where GNSS cannot work reliably, development, this system can record over 500 MB/s data stream
RUGD [39], Montmorency [12], and Wild-Places [40] collected without losing a single piece of image, which is a common issue
diverse data in thick vegetations. However, exactly due to the in many other datasets.
blockage of GNSS signal, they failed to provide authentic ground
truth for ego-motion: Montmorency and Wild-Places employed B. Sensor Setup
SLAM algorithms to estimate the trajectories, while RUGD did
Our dataset focuses on robot navigation research based on
not release any trajectory data in the original paper. As a result,
conventional mainstream sensor modalities and their fusions.
they were better suited for the validation of scene perception
and place recognition tasks, rather than for strict-sensed robot
navigation, which primarily focuses on state estimation. This 1 [Online]. Available: https://wiki.linuxfoundation.org/realtime/start
serves as the motivation of our paper. 2 [Online]. Available: https://standards.ieee.org/ieee/1588/4355/

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2801

TABLE I
COMPARISON OF DIFFERENT NAVIGATION DATASETS

Fig. 2. Left: The robot platform design and its base coordinate; Middle: the multi-sensory system and the corresponding coordinate (the camera below the VLP16
is only for visualization usage, thus is not annotated); Right: the synchronization system of the whole platform.

TABLE II cost-effective spinning LiDAR with a 360° × 30°Field-of-View


SPECIFICATIONS OF SENSORS AND DEVICES (FoV), suitable for ground robotic navigation tasks. AVIA is a
MEMS 3D LiDAR with non-repetitive 70° × 77° circular FoV,
thus is more suitable for dense mapping and sensor-fusion with
co-heading cameras. They are both configured to scan at 10 Hz,
and can be synchronized via pulse-per-second (PPS) interface.
For inertial sensors, we provide a low-cost BMI088 IMU
(200 Hz) and an industrial-grade Xsens Mti-680G D-GNSS/INS
system (IMU@400 Hz, GNSS not in use) for comparison usage.
BMI088 is built-in and synchronized with AVIA LiDAR, and
To this end, we have employed comprehensive sensors includ- Xsens supports external trigger via pulse rising edges.
ing stereo Gray and RGB cameras, spinning and MEMS 3D
LiDARs, and low-cost and industrial-grade IMUs. Their speci-
C. Time Synchronization
fications are as listed in Table II. All the sensors are accurately
mounted on a compact self-designed aluminum carrier with In a precise robot system with rich sensors and multi-hosts,
precise 3D printing fittings, as shown in Fig. 2. time synchronization is extremely vital to eliminate perception
The stereo sensors are composed of two grayscale and two delay and ensure navigation accuracy. Towards a high-quality
RGB cameras with a baseline of around 255 mm. To facilitate dataset, we have taken very special cares on this problem. Our
research on robotic vision, we have chosen models from Tele- synchronization is based on a self-designed hardware Trigger
dyne DALSA with both high rate and resolution: M1930 and and Timing board and a PTP-based network, as illustrated in
C1930, working at 1920 × 1200 and 40 fps in our configuration. Fig. 2. The Trigger and Timing board is implemented by a
The CMOS used for the cameras is the PYTHON 2000 from compact STM32 MCU. It is programmed to produce three
ONSemi with 2/3" format and 4.8 µm pixel size, which has channels of pulses 1 Hz-40 Hz-400 Hz in the same phases. The 1
a good performance under subnormal illuminations. However, Hz channel (PPS) is used for the synchronization of VLP16 and
this sensor in its nature has very strong infrared response, thus AVIA accompanied with GPRMC signals: Every time the rising
we have customized IR-cutoff filters of 400–650 nm to exclude edge arrives, LiDAR immediately clears its internal sub-second
the side-effects on white-balance and exposure. The cameras counter, thus all the point clouds in the subsequent second can be
use GPIO as external trigger, and GigE for data streaming, timed cumulatively based on PPS arrival, which will then be ap-
which also supports PTP synchronizations. The attended lens pended with UTC integer time by GPRMC. The 40 Hz signal is
for imaging is Ricoh’s CC0614A (6 mm focus and F1.4 iris), used to trigger the cameras, when a rising edge arrives, the global
which has been adjusted to 5–10 m clear view to fit the scene. shutter will immediately start exposure until reaching a target
To support different testing demands, 2 LiDARs are used gain, and the image timestamp is acquired by adding half the
in collection: Velodyne VLP16 and Livox AVIA. VLP16 is a exposure time to the trigger stamp. The 400 Hz signal is used for

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
2802 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 3, MARCH 2024

Fig. 4. Camera-LiDAR calibration: left: the checkerboards reconstructed by


stereo vision; right: the registration of vision 3D reconstruction model and
Fig. 3. Hardware triggered 1 Hz-40 Hz pulses and rising edge offsets, LiDAR point cloud.
indicating a high sync-precision at sensors side.

4th degree polynomial Radial distortion model (k1 , k2 , p1 , p2 )


triggering the Xsens IMU: Xsens has its own internal clock, and for intrinsics. The calibration is conducted by manually posing
when the rising edge arrives, Xsens will be triggered an external a large checker board (11 × 8, 60 mm/square) at different dis-
interruption thus feedbacks its exact time, then our program can tances and orientations in front of the cameras. To avoid possible
bridge a transform thus stamp the neighboring sample instance. motion blur, the exposure has been controlled to ≤10 ms, and
The UTC time is maintained by MCU based on its onboard we finally achieve less than 0.1 pixels mean reprojection error
oscillator. Note that, to maintain the timing smoothness, we will in all the 4 cameras. Furthermore, based upon these intrinsics,
never interrupt the MCU clock during the collections, instead, the extrinsics are finely calculated via joint optimizations, and
an UTC stamp will be conferred at the begin of each course-day we have checked the epipolar coherence for a verification.
via NTP or GNSS timing. So far, the LiDAR-vision-IMU chain 2) Camera-IMU Calibration: The extrinsics between cam-
has been fully synchronized in hardware. With a sub-µs level eras and IMUs are determined using the famous Kalibr5 toolbox.
triggering consistency between the sensors (see Fig. 3), a high Thanks to our specially-designed detachable sensors suite, we
sync-precision should be obtained. are able to handheld it for 6-DoF movements. Before running the
The PTP-based network is designed for synchronizing multi- joint calibration, we have recorded 20 hours of IMU sequences
hosts and capturing trigger events, thus the wheel odometry to identify their intrinsics (noise densities and random walks of
can be aligned to an identical timeline with other sensors. Our the accelerometers and gyroscopes). During the calibration, we
network frame is built based on LinuxPTP3 library. We assign use a 6 × 6 Aprilgrid as stationary target and properly move the
MIC-7700 as grand master, and DALSA cameras and NUC11 sensor suite to excite all IMU axes. To avoid excessive motion
are configured as slaves. When the synchronization starts, the blur, we have conducted the calibration in good lights and limited
slaves will keep exchanging sync-packets with the master, and the exposure to ≤10 ms. Note that, this joint calibration can also
to ensure the smoothness of local clocks, we have not directly output time offset, whereas, as the sensors have already been
compensated the offsets, while instead employ a PID mechanism hardware-synced, thus to avoid the side effects, this workflow
to adjust the time and frequency. During the data collection, is limited to camera-IMU extrinsics only. The final mean repro-
once the camera is triggered, it will report its timestamp of jection error is less than 0.5 pixels.
PTP clock, and based on the MCU trigger stamp, our soft- 3) Camera-LiDAR Calibration: For the extrinsics of camera
ware will bridge a relation thus transforms the wheel odometry and LiDAR, we have developed a concise calibration toolbox
from PTP to MCU timeline. Here although the PTP network based on 3D checker boards. We define the left RGB camera as
and the real-time kernel are used, there exists a latency from center, then by sub-pixel extractions and extrinsics calculation,
the CAN bus of around 1ms, which has been compensated in we can fully reconstruct the known-sized checkerboards to an
advance. accurate 3D model. At LiDAR side, we choose AVIA as refer-
ence because it works in non-repetitive scan mechanism which
D. Spatial Calibration can integrate a dense point cloud in 1–2 s. Then the two models
are registered by point-to-plane ICP, and the camera-LiDAR
Spatial calibration, both for intrinsic and extrinsic parts, is a extrinsics are thus solved, as illustrated in Fig. 4. The registration
prerequisite for algorithm development. We ensure calibration has achieved a precision of 9.1 mm std.
quality through careful error evaluation and manual verification 4) Other Calibrations: Based on the aforementioned pro-
of the results. Note that, the calibration is performed based on cess, an arterial camera-LiDAR-IMU calibration chain has al-
the mounting positions of the sensors on the robot, as they have ready been established. The other sensors can either be calcu-
already been well assembled according to the CAD designs. lated from the CADs, or be concatenated from the calibration
1) Camera Calibration: For camera intrinsics and extrinsics chain. For example, AVIA manufacturer has provided explicit
calibration, we choose the Matlab camera calibration toolbox4 , coordinates relation between LiDAR and its built-in IMU; Xsens
which uses an interactive engine for inspecting the errors and and VLP16 also have explicit coordinates provided. To refine a
filtering the qualified instances. Considering the standard lens better extrinsic for VLP16, we have performed a scan registra-
FoV, we choose Pinhole imaging model (fx , fy , cx , cy ) and a tion with AVIA, and the related params were updated in the
chain. For the robot base, we have observed enough data from
3 [Online].Available: https://linuxptp.sourceforge.net/
4 [Online]. Available: https://www.mathworks.com/help/vision/camera-
calibration.html 5 [Online]. Available: https://github.com/ethz-asl/kalibr

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2803

Fig. 5. Top: sample frames of typical scene features (riversides, thick woods, grasslands, bridges, narrow paths, etc.); middle: the corresponding 3D map venues;
Bottom: dense semantic annotations of the corresponding frames.

TABLE III
SEQUENCES SPECIFICATIONS AND SOTA ALGORITHMS ASSESSMENT (VISUAL, LIDAR, AND SENSOR FUSION METHODS, SINGLE-RUN RESULTS)

days, loop closures, sharp turns, and monotonous textures, ideal


for field robotic navigation research. Table III and Fig. 8 show the
specifications and trajectories of 7 sample sequences which we
have thoroughly tested with SOTA algorithms. The full dataset
specifications can be accessed on our website.

F. Ground Truth Map


Fig. 6. Left: the surveying process. Right: the point cloud registration process. Ground truth could be the most important part of a dataset. As
indicated by Table I, most datasets fail to provide an authentic
GT-map, which is necessary for evaluating the mapping results
and plays a key role in robot navigation. To ensure the global ac-
curacy, we have not used any mobile-mapping based techniques
(e.g., SLAM), instead we employ a survey-grade stationary 3D
laser scanner and conduct a qualified surveying and mapping job
with professional colleagues. The scanner is the RTC360 from
Leica, which can output very dense and colored point cloud with
a 130 m scan radius and mm-level ranging accuracy, as shown
Fig. 7. GT-pose generation based on our map-localization algorithm. the specifications in Table II. For possible future benefits, we
have arranged two independent jobs both in early summer and
both the CADs and external measurements, achieving sub-cm middle autumn, which takes around 20 workdays in total, and
calibration and have integrated it in the main chain, also. respectively with 515 and 400 individual scans (each scan re-
quires at least 3mins overall, Fig. 6 shows a work photo during
the autumn survey). The scans are pre-registered by VI-SLAM
E. Data Collection and post-registered by Leica Cyclone Register360 software
Our datasets are collected at 5th, 6th, 8th, and 18th of October, based on ICP and graph optimization (illustrated in Fig. 6).
2022 in a luxuriant botanic garden of our university. Various The final registered maps are generated in E57 format, and the
unstructured natural features are covered inside, such as thick coverage area is 48000 m2 from our calculation. According to
woods, narrow trails, riverside, bridges, grasslands, as shown in the Leica Cyclone report, we have obtained an overall accuracy
Fig. 5. A total of 33 sequences are traversed, yielding 17.1 km of 11 mm across all possible links and loops within the map.
trajectories, including short and long travels, cloudy and sunny This workflow is mature and trustable, as we have previously

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
2804 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 3, MARCH 2024

Fig. 8. Visualization of the SOTA-estimated trajectories against the ground truth. From left to right: sequence 1005–00, 1005–01, 1005–07, 1006–01, 1008–03,
1018–00, and 1018–13; from top to bottom: top view of the trajectories, and the Z-axis errors.

validated it using a total station, identifying an around 1 cm masks are meticulously generated with dense pixel-level human
global consistency for target points. annotations, as shown in Fig. 5. All data are provided in LabelMe
[42] format and support future reproductions. It is expected
that these data can well facilitate robust motion estimation and
G. Ground Truth Pose semantic 3D mapping research. Additional detailed information
Serving as the reference of navigation algorithms, ground can be accessed on our website.
truth poses are supposed to be complete and globally accurate.
This is why GNSS is widely used for ground truth generation,
while incremental techniques such as SLAM are not authen- IV. EXAMPLE DATASET USAGE
tic due to the cumulative drift. However, in sky-blocked and A. Vision/LiDAR/Multi-Sensor-Fusion Navigation
complex environments, conventional means such as D-GNSS,
Laser Tracker, and MoCap can hardly work consistently: our To verify the versatility of our dataset in navigation research,
garden scenario exactly belongs to this scope. To bridge the we select 7 sample sequences and conduct a thorough assessment
gap, we take advantage of the authentic GT-map, and develop on state-of-the-art algorithms (visual, LiDAR, and multi-sensor
a map-based localization algorithm to calculate GT-pose using fusion methods) against the ground truth, regarding the metrics
the on-robot VLP16 LiDAR. As the map is quite unstructured, of relative pose error (RPE) and absolute trajectory error (ATE)
and VLP16 is sparse, naive registration methods such as ICP [5]. The evaluation results are listed in Table III, and the trajec-
cannot correctly converge on its own. This requires an accurate tories comparisons are visualized in Fig. 8.
local tracking thread to provide a good initial pose for regis- From the evaluation results we get mainly three conclusions:
tration. To this end, we build a full-stack GT-pose system by 1) Our dataset can support a wide range of navigation frame-
fusing global initializer, VIO local tracker, and fine-registration works, including but not limited to stereo vision, visual-
modules. Firstly, the initializer searches the beginning frame in inertial, LiDAR-only, LiDAR-inertial, and visual-LiDAR-
a scan-referenced image database for possible candidates, and inertial based methods. This also demonstrates the good
subsequently an accumulated LiDAR segment can be registered spatial calibration and time synchronization quality of our
to the GT-map for final initialization; Then, the VIO local tracker dataset.
keeps estimating the robot motions to pre-register and undistort 2) Our dataset is a challenging benchmark for ground robots.
the LiDAR data; Finally, the fine-registration module employs As shown by the results, the RPE errors are around 5–10
point-to-plane ICP for final localization, as illustrated in Fig. 7. times larger than KITTI leaderboard (ORB-stereo even
As the scene is really complex, we have slowed down the data failed 2/7 of the tests due to the indistinct textures and
playback rate and human monitored the visualization panel to large view change at sharp corners); and it can be clearly
make sure the GT-poses have converged correctly. To assess the identified that, most algorithms have met significant Z-axis
accuracy of this method, we use a Leica MS60 laser tracker to error in the traverses, which should be paid more attention
crosscheck 32 stationary trajectory points in both normal- and in future research. Besides, a noteworthy finding is that,
dense-vegetated areas within the garden, resulting in 0.6 cm although designed loop closures in all sequences, only
and 2.3 cm accuracy respectively. Even considering the LiDAR 8/21 tests (visual methods) have succeeded in detection,
motion distortion that cannot be fully rectified (the up limit can indicating a high textural monotonicity of our data.
be set by the 2–5% distance drift of LiDAR-inertial odometry 3) Multi-sensor fusion is an inevitable trend of future navi-
[16], [41]), under an up to 15cm per frame motion speed, our gation research. It can be clearly seen that, compared with
GT-pose can still be defined with cm-level accuracy. vision- and LiDAR-centric methods, multi-sensor fusion
frameworks have earned very obvious elevation on both
accuracy and robustness performance: we thus expect that
H. Semantic Annotation our dataset can serve as a research incubator for novel
Semantic segmentation is the highest perception level of a sensor fusion mechanisms.
robot. As a comprehensive and high-quality dataset, we empha-
size the role that semantic information plays in navigation. Since
B. Other Possible Usage
our LiDARs are relatively sparse, we have arranged the anno-
tation at 2D-image level. Our semantic segmentation database While our dataset is primarily designed for navigation re-
consists of 1181 images in total, including 27 classes such search, its comprehensive data and ground truths enable its use-
as various types of vegetations (bush, grass, tree, tree trunks, fulness in various robotic tasks, including 3D mapping, semantic
water plants), fixed facilities, drivable regions (trails, roads, segmentation, image localization, depth estimation, etc. New
grassland), rivers, bridges, sky, and more. The segmentation chances and data will be continuously released on our website.
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: BOTANICGARDEN: A HIGH-QUALITY DATASET FOR ROBOT NAVIGATION IN UNSTRUCTURED NATURAL ENVIRONMENTS 2805

V. CONCLUSION [19] G. Fontana, M. Matteucci, and D. G. Sorrenti, “Rawseeds: Building a


benchmarking toolkit for autonomous robotics,” Methods Exp. Techn.
This paper proposed BotanicGarden, a novel robot naviga- Comput. Eng., Berlin, Germany: Springer, pp. 55–68, 2014.
tion dataset in problematic and unstructured natural environ- [20] Y. Choi et al., “KAIST multi-spectral day/night data set for autonomous
ment involving GNSS denial, monotonous texture, and dense and assisted driving,” IEEE Trans. Intell. Transport. Syst., vol. 19, no. 3,
pp. 934–948, Mar. 2018.
vegetations. In comparison to existing works, we have paid a
[21] W. Wen et al., “Urbanloco: A full sensor suite dataset for mapping and
lot of attention to dataset quality, incorporating comprehensive localization in urban scenes,” in Proc. IEEE Int. Conf. Robot. Automat.,
sensors, precise time synchronization, rigorous data logging, and 2022, pp. 2310–2316.
high-quality ground truth, all of which are at the top-level of this [22] D. Schubert, T. Goll, N. Demmel, V. Usenko, J. Stückler, and D. Cremers,
field. In the future, we will continue to extend this dataset, such “The TUM VI benchmark for evaluating visual-inertial odometry,” in Proc.
IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2018, pp. 1680–1687.
as appending data of new seasons and weathers, equipping more [23] X. Shi et al., “Are we ready for service robots? The openloris-scene datasets
sensor modalities, and so on. We believe that our dataset can ease for lifelong SLAM,” in Proc. IEEE Int. Conf. Robot. Automat., 2020,
the research in natural environments and expedite advancements pp. 3139–3145.
in robot navigation. [24] J. Yin, A. Li, T. Li, W. Yu, and D. Zou, “M2DGR: A multi-sensor and
multi-scenario SLAM dataset for ground robots,” IEEE Robot. Automat.
Lett., vol. 7, no. 2, pp. 2266–2273, Apr. 2022.
[25] M. Helmberger, K. Morin, B. Berner, N. Kumar, G. Cioffi, and D. Scara-
muzza, “The hilti SLAM challenge dataset,” IEEE Robot. Automat. Lett.,
REFERENCES vol. 7, no. 3, pp. 7518–7525, Jul. 2022.
[26] P. Furgale, P. Carle, J. Enright, and T. D. Barfoot, “The Devon Island rover
[1] L. Chen et al., “Milestones in autonomous driving and intelligent vehicles: navigation dataset,” Int. J. Robot. Res., vol. 31, no. 6, pp. 707–713, 2012.
Survey of surveys,” IEEE Trans. Intell. Veh., vol. 8, no. 2, pp. 1046–1056, [27] M. Vayugundla, F. Steidle, M. Smisek, M. J. Schuster, K. Bussmann, and
Feb. 2023. A. Wedler, “Datasets of long range navigation experiments in a moon
[2] C. Cadena et al., “Past, present, and future of simultaneous localization and analogue environment on mount etna,” in Proc. Int. Symp. Robot., 2018,
mapping: Toward the robust-perception age,” IEEE Trans. Robot., vol. 32, pp. 1–7.
no. 6, pp. 1309–1332, Dec. 2016. [28] R. A. Hewitt et al., “The Katwijk beach planetary rover dataset,” Int. J.
[3] Y. Liu et al., “Standard datasets for autonomous navigation and mapping: Robot. Res., vol. 37, no. 1, pp. 3–12, 2018.
A full-stack construction methodology,” IEEE Trans. Intell. Veh., early [29] L. Meyer et al., “The MADMAX data set for visual-inertial rover naviga-
access, Jan. 30, 2024, doi: 10.1109/TIV.2024.3360273. tion on Mars,” J. Field Robot., vol. 38, no. 6, pp. 833–853, 2021.
[4] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? [30] K. Leung et al., “Chilean underground mine dataset,” Int. J. Robot. Res.,
The kitti vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern vol. 36, no. 1, pp. 16–23, 2017.
Recognit., 2012, pp. 3354–3361. [31] J. G. Rogers, J. M. Gregory, J. Fink, and E. Stump, “Test your SLAM! the
[5] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A bench- SubT-Tunnel dataset and metric for mapping,” in Proc. IEEE Int. Conf.
mark for the evaluation of RGB-D SLAM systems,” in Proc. IEEE/RSJ Robot. Automat., 2020, pp. 955–961.
Int. Conf. Intell. Robot. Syst., 2012, pp. 573–580. [32] N. Chebrolu, P. Lottes, A. Schaefer, W. Winterhalter, W. Burgard, and C.
[6] M. Burri et al., “The EuRoC micro aerial vehicle datasets,” Int. J. Robot. Stachniss, “Agricultural robot dataset for plant classification, localization
Res., vol. 35, no. 10, pp. 1157–1163, 2016. and mapping on sugar beet fields,” Int. J. Robot. Res., vol. 36, no. 10,
[7] N. Carlevaris-Bianco, A. K. Ushani, and R. M. Eustice, “University of pp. 1045–1052, 2017.
michigan north campus long-term vision and lidar dataset,” Int. J. Robot. [33] T. Pire, M. Mujica, J. Civera, and E. Kofman, “The Rosario dataset: Mul-
Res., vol. 35, no. 9, pp. 1023–1035, 2016. tisensor data for localization and mapping in agricultural environments,”
[8] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: Int. J. Robot. Res., vol. 38, no. 6, pp. 633–641, 2019.
The Oxford RobotCar dataset,” Int. J. Robot. Res., vol. 36, no. 1, pp. 3–15, [34] M. Miller, S. J. Chung, and S. Hutchinson, “The visual–inertial canoe
2017. dataset,” Int. J. Robot. Res., vol. 37, no. 1, pp. 13–20, 2018.
[9] J. Jeong, Y. Cho, Y. S. Shin, H. Roh, and A. Kim, “Complex urban dataset [35] Y. Cheng, M. Jiang, J. Zhu, and Y. Liu, “Are we ready for unmanned
with multi-level sensors from highly diverse urban environments,” Int. J. surface vehicles in inland waterways? The usvinland multisensor dataset
Robot. Res., vol. 38, no. 6, pp. 642–657, 2019. and benchmark,” IEEE Robot. Automat. Lett., vol. 6, no. 2, pp. 3964–3970,
[10] M. Ramezani, Y. Wang, M. Camurri, D. Wisth, M. Mattamala, and M. Apr. 2021.
Fallon, “The newer college dataset: Handheld LiDAR, inertial and vision [36] P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “RELLIS-3D dataset:
with ground truth,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2020, Data, benchmarks and analysis,” in Proc. IEEE Int. Conf. Robot. Automat.,
pp. 4353–4360. 2021, pp. 1110–1116.
[11] P. Wenzel et al., “4Seasons: A cross-season dataset for multi-weather [37] S. Triest, M. Sivaprakasam, S. J. Wang, W. Wang, A. M. Johnson, and
SLAM in autonomous driving,” in Proc. DAGM German Conf. Pattern S. Scherer, “TartanDrive: A large-scale dataset for learning off-road
Recognit., 2021, pp. 404–417. dynamics models,” in Proc. IEEE Int. Conf. Robot. Automat., 2022,
[12] J. F. Tremblay, M. Béland, R. Gagnon, F. Pomerleau, and P. Giguère, pp. 2546–2552.
“Automatic three-dimensional mapping for tree diameter measurements [38] I. Ali et al., “FinnForest dataset: A forest landscape for visual SLAM,”
in inventory operations,” J. Field Robot., vol. 37, no. 8, pp. 1328–1346, Robot. Auton. Syst., vol. 132, pp. 103610–103622, 2020.
2021. [39] M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A RUGD
[13] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monoc- dataset for autonomous navigation and visual perception in unstructured
ular visual-inertial state estimator,” IEEE Trans. Robot., vol. 34, no. 4, outdoor environments,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.,
pp. 1004–1020, Aug. 2018. 2019, pp. 5000–5007.
[14] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM: A [40] J. Knights et al., “Wild-places: A large-scale dataset for lidar place recogni-
versatile and accurate monocular SLAM system,” IEEE Trans. Robot., tion in unstructured natural environments,” in Proc. IEEE Int. Conf. Robot.
vol. 31, no. 5, pp. 1147–1163, Oct. 2015. Automat., 2023, pp. 11322–11328.
[15] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, [41] W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar-
“Orb-slam3: An accurate open-source library for visual, visual–inertial, inertial odometry,” IEEE Trans. Robot., vol. 38, no. 4, pp. 2053–2073,
and multimap slam,” IEEE Trans. Robot., vol. 37, no. 6, pp. 1874–1890, Aug. 2022.
Dec. 2021. [42] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “LabelMe:
[16] J. Zhang and S. Singh, “LOAM: Lidar odometry and mapping in real-time,” A database and web-based tool for image annotation,” Int. J. Comput. Vis.,
Proc. Robot. Sci. Syst., vol. 2, no. 9, pp. 1–9, 2014. vol. 77, pp. 157–173, 2008.
[17] T. Shan and B. Englot, “LeGO-LOAM: Lightweight and ground-optimized [43] T. Shan, B. Englot, C. Ratti, and D. Rus, “LVI-SAM: Tightly-coupled
LiDAR odometry and mapping on variable terrain,” in Proc. IEEE/RSJ Int. lidar-visual-inertial odometry via smoothing and mapping,” in Proc. IEEE
Conf. Intell. Robot. Syst., 2018, pp. 4758–4765. Int. Conf. Robot. Automat., 2021, pp. 5692–5698.
[18] A. S. Huang et al., “A high-rate, heterogeneous data set from the darpa [44] J. Lin and F. Zhang, “R3LIVE: A robust, real-time, RGB-colored, LiDAR-
urban challenge,” Int. J. Robot. Res., vol. 29, no. 13, pp. 1595–1601, inertial-visual tightly-coupled state estimation and mapping package,” in
2012. Proc. IEEE Int. Conf. Robot. Automat., 2022, pp. 10672–10678.
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:20:44 UTC from IEEE Xplore. Restrictions apply.

You might also like