0% found this document useful (0 votes)
3 views

Kinect 1

sdafdsf

Uploaded by

Motín Ar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Kinect 1

sdafdsf

Uploaded by

Motín Ar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Towards a benchmark for RGB-D SLAM evaluation

Jürgen Sturm1 Stéphane Magnenat2 Nikolas Engelhard3 François Pomerleau2


Francis Colas2 Daniel Cremers1 Roland Siegwart2 Wolfram Burgard3

Abstract— We provide a large dataset containing RGB-D


image sequences and the ground-truth camera trajectories
with the goal to establish a benchmark for the evaluation
of visual SLAM systems. Our dataset contains the color and
depth images of a Microsoft Kinect sensor and the ground-
truth trajectory of camera poses. The data was recorded at
full frame rate (30 Hz) and sensor resolution (640x480). The
ground-truth trajectory was obtained from a high-accuracy
motion-capture system with eight high-speed tracking cameras
(100 Hz). Further, we provide the accelerometer data from (a) Typical office scene (b) Motion capture system
the Kinect. Finally, we propose an evaluation criterion for
measuring the quality of the estimated camera trajectory of
visual SLAM systems.

I. I NTRODUCTION

Simultaneous localization and mapping (SLAM) has a


long history in robotics and computer-vision research [11], (c) Microsoft Kinect sensor (d) Checkerboard with reflective
[6], [1], [15], [7], [4]. Different sensor modalities have been with reflective markers markers used for calibration
explored in the past, including 2D laser scanners [12], [3], Fig. 1: The office environment and the experimental setup
3D scanners [14], [16], monocular cameras [13], [7], [9], in which the RGB-D dataset with ground truth camera poses
[19], [20] and stereo systems [8]. Recently, low-cost RGB- was recorded.
D sensors became available, of which the most prominent
one is the Microsoft Kinect. Such sensors provide both color
images and dense depth maps at video frame rates. Henry et
II. E XPERIMENTAL S ETUP AND DATA ACQUISITION
al. [5] were the first to use the Kinect sensor in a 3D SLAM
system. Others have followed [2], and we expect to see more We acquired a large set of data recordings containing
approaches using RGB-D data for visual SLAM in the near both the RGB-D data from the Kinect and the ground truth
future. estimates from the mocap system. We moved the Kinect
Various datasets and benchmarks have been proposed for along different trajectories in typical office environments
laser- and camera-based SLAM, such as the Freiburg, Intel (see Fig. 1a). The recordings differ in their translational
and Newcollege datasets [18], [17]. However until now, no and angular velocities (fast/slow movements) and the size
suitable dataset or benchmark existed that can be used to of the environment (one desk, several desks, whole room).
evaluate, measure, and compare the performance of RGB- We also acquired data for three specific trajectories for
D SLAM systems. As we consider objective evaluation debugging purposes, i.e., we moved the Kinect (more or less)
methods to be highly important for measuring progress in the individually along the x/y/z-axes and rotated it individually
field (and demonstrating this in a verifiable way), we decided around the x/y/z-axes.
to provide such a dataset. To the best of our knowledge, this We captured both the color and depth images from an
is the first RGB-D dataset for visual SLAM benchmarking. off-the-shelf Microsoft Kinect sensor using PrimeSense’s
OpenNI-driver. All data was logged at full resolution
1 Jürgen Sturm and Daniel Cremers are with the Computer Vision and
(640×480) and full frame rate (30 Hz) of the sensor on a
Pattern Recognition Group, Computer Science Department, Technical Uni- Linux laptop running Ubuntu 10.10 and ROS Diamondback.
versity of Munich, Germany. {sturmju,cremers}@in.tum.de Further, we recorded IMU data from the accelerometer in
2 S. Magnenat, F. Pomerlau, F. Colas and R. Seigwart are
the Kinect at 500 Hz and also read out the internal sensor
with the Autonomous Systems Lab, ETH Zurich, Switzerland.
{stephane.magnenat,francis.colas}@mavt.ethz.ch parameters from the Kinect factory calibration.
and [email protected] Further, we obtained the camera trajectory by using an
3 Nikolas Engelhard and Wolfram Burgard are with external motion capturing system from MotionAnalysis at
the Autonomous Intelligent Systems Lab, Computer
Science Department, University of Freiburg, Germany. 100 Hz (see Fig. 1b). We attached reflective targets to the
{engelhar,burgard}@informatik.uni-freiburg.de Kinect (see Fig. 1c) and used a modified checkerboard for
calibration (Fig. 1d) to obtain the transformation between the and evaluations. In this way, we hope to detect (and resolve)
optical frame of the Kinect sensor and the coordinate system potential problems present in our current dataset, such as
of the motion capture system. Finally, we also video-taped calibration and synchronization issues between the Kinect
all recordings with an external video camera to capture the and our mocap system as well as the effects of motion blur
camera motion and the environment from a different view and the rolling shutter of the Kinect. Furthermore, we want
point. to investigate ways to measure the performance of a SLAM
The original data has been recorded as a ROS bag file. system not only in terms of the accuracy of the estimated
In total, we collected 50 GB of Kinect data, divided into camera trajectory, but also in terms of the quality of the
separate nine sequences. The dataset is available online under resulting map of the environment.
the Creative Commons Attribution license at
R EFERENCES
https://cvpr.in.tum.de/research/ [1] F. Dellaert. Square root SAM. In Proc. of Robotics: Science and
datasets/rgbd-dataset Systems (RSS), Cambridge, MA, USA, 2005.
[2] N. Engelhard, F. Endres, J. Hess, J. Sturm, and W. Burgard. Real-
The website contains—next to additional information about time 3D visual SLAM with a hand-held RGB-D camera. In Proc. of
the data formats—videos for simple visual inspection of the the RGB-D Workshop on 3D Perception in Robotics at the European
Robotics Forum, Vasteras, Sweden, 2011.
dataset. [3] G. Grisetti, C. Stachniss, and W Burgard. Improved techniques for grid
mapping with rao-blackwellized particle filters. IEEE Transactions on
III. E VALUATION Robotics (T-RO), 23:34–46, 2007.
For evaluating visual SLAM algorithms on our dataset, [4] G. Grisetti, C. Stachniss, and W. Burgard. Non-linear constraint
network optimization for efficient map learning. IEEE Transactions
we propose a metric similar to the one introduced by [10]. on Intelligent Transportation systems, 10(3):428–439, 2009.
The general idea is to compute the relative error between [5] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. RGB-D mapping:
the true and estimated motion w.r.t. the optical frame of the Using depth cameras for dense 3D modeling of indoor environments.
In Proc. of the Intl. Symp. on Experimental Robotics (ISER), Delhi,
RGB camera. As we have ground-truth pose information for India, 2010.
all time indices, we propose to compute the error as the sum [6] H. Jin, P. Favaro, and S. Soatto. Real-time 3-D motion and structure
of distances between the relative pose at time i and time of point features: Front-end system for vision-based control and
interaction. In Proc. of the IEEE Conf. on Computer Vision and Pattern
i + ∆, i.e., Recognition (CVPR), 2000.
n
X [7] G. Klein and D. Murray. Parallel tracking and mapping for small AR
2 workspaces. In Proc. of the IEEE and ACM International Symposium
error = [(x̂i+∆ x̂i ) (xi+∆ xi )] (1)
on Mixed and Augmented Reality (ISMAR), Nara, Japan, 2007.
i=1 [8] K. Konolige, M. Agrawal, R.C. Bolles, C. Cowan, M. Fischler, and
where i = 1, . . . , n are the time indices where ground B.P. Gerkey. Outdoor mapping and navigation using stereo vision. In
Intl. Symp. on Experimental Robotics (ISER), 2007.
truth information is available, ∆ is a free parameter that [9] K. Konolige and J. Bowman. Towards lifelong visual maps. In Proc. of
corresponds to the time scale, xi is the ground truth pose the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS),
at time index i, x̂i the estimated pose at time index i, pages 1156–1163, 2009.
[10] R. Kümmerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti,
stands for the inverse motion composition operator. If the es- C. Stachniss, and A. Kleiner. On measuring the accuracy of SLAM
timated trajectory has missing values, i.e., there are timesteps algorithms. Autonomous Robots, 27:387–407, 2009.
ij1 , . . . , ijm for which no pose x̂i could be estimated, the [11] F. Lu and E. Milios. Globally consistent range scan alignment for
environment mapping. Autonomous Robots, 4(4):333–349, 1997.
ratio of missing poses m/n should be stated as well. [12] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM:
All data necessary to evaluate our measure are present A factored solution to the simultaneous localization and mapping
in the dataset. We plan to release a Python script that problem. In Proceedings of the AAAI National Conference on Artificial
Intelligence, Edmonton, Canada, 2002. AAAI.
computes these measures automatically given the estimated [13] D. Nistér. Preemptive ransac for live structure and motion estimation.
trajectory and the respective dataset. To prevent that (future) Machine Vision and Applications, 16:321–329, 2005.
approaches are over-fitted on the dataset, we recorded all [14] A. Nüchter, K. Lingemann, J. Hertzberg, and H. Surmann. 6D SLAM –
3D mapping outdoor environments: Research articles. J. Field Robot.,
scenes twice, and held back the ground-truth trajectory in 24:699–722, August 2007.
these secondary recordings. With this, we plan to provide a [15] E. Olson, J. Leonard, and S. Teller. Fast iterative optimization of pose
comparative offline evaluation benchmark for visual SLAM graphs with poor initial estimates. In Proc. of the IEEE Intl. Conf. on
Robotics and Automation (ICRA), 2006.
systems. [16] B. Pitzer, S. Kammel, C. DuHadway, and J. Becker. Automatic recon-
struction of textured 3D models. In Proc. of the IEEE Intl. Conf. on
IV. C ONCLUSIONS Robotics and Automation (ICRA), 2010.
[17] M. Smith, I. Baldwin, W. Churchill, R. Paul, and P. Newman. The new
In this paper, we have presented a novel RGB-D dataset college vision and laser data set. Intl. Journal of Robotics Research
for benchmarking visual SLAM algorithms. The dataset con- (IJRR), 28(5):595–599, 2009.
tains color images, depth maps, and associated ground-truth [18] C. Stachniss, P. Beeson, D. Hähnel, M. Bosse, J. Leonard, B. Steder,
R. Kümmerle, C. Dornhege, M. Ruhnke, G. Grisetti, and A. Kleiner.
camera pose information. Further, we proposed an evaluation Laser-based slam datasets and benchmarks at http://openslam.org.
metric that can be used to assess the performance of a visual [19] H. Strasdat, J. M. M. Montiel, and A. Davison. Scale drift-aware
SLAM system. We thus propose a benchmark that allows large scale monocular slam. In Proc. of Robotics: Science and Systems
(RSS), Zaragoza, Spain, 2010.
researchers to objectively evaluate visual SLAM systems. [20] J. Stühmer, S. Gumhold, and D. Cremers. Real-time dense geometry
Our next step is to evaluate our own system [2] on this dataset from a handheld camera. In Proc. of the DAGM Symposium on Pattern
in order to provide a baseline for future implementations Recognition (DAGM), Darmstadt, Germany, 2010.

You might also like