0% found this document useful (0 votes)
16 views

Improving Repeatability of Experiments by Automatic Evaluation of SLAM Algorithms

Uploaded by

bob wu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Improving Repeatability of Experiments by Automatic Evaluation of SLAM Algorithms

Uploaded by

bob wu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Madrid, Spain, October 1-5, 2018

Improving Repeatability of Experiments


by Automatic Evaluation of SLAM Algorithms
Francesco Amigoni1 , Valerio Castelli1 , and Matteo Luperto2

Abstract— The development of good experimental methodolo- particular, we focus on the domain of SLAM (Simultaneous
gies for robotics takes often inspiration from general principles Localization And Mapping) [13] and we first show that the
of experimental practice. Repeatability prescribes that experi- performance of SLAM algorithms presents some variability
ments should involve several trials in order to guarantee that
results are not achieved by chance, but are systematic, and when the algorithms are applied to data collected with
statistically significant trends can be identified. In this paper, different runs in the same environment. This aspect is often
we propose an approach to improve the repeatability of exper- disregarded and SLAM algorithms are usually evaluated on
iments performed in robotics. In particular, we focus on the data acquired with single runs in different environments. We
domain of SLAM (Simultaneous Localization And Mapping) then introduce a system that exploits simulations to generate
and we introduce a system that exploits simulations to generate
a large number of test data on which SLAM algorithms are a large number of test data on which SLAM algorithms are
automatically evaluated in order to obtain consistent results, automatically evaluated in order to obtain consistent results,
according to the principle of repeatability. according to the principle of repeatability. Our system is
finally validated showing that a SLAM algorithm applied to
I. INTRODUCTION
the test data we generate shows a performance very similar
Development of good experimental methodologies for to that obtained when the algorithm is applied to data coming
robotics is a topic that has attracted increasing interest [1]. from real robots.
The discussion has evolved from early methodological pro- This paper is organized as follows. The next section
posals [2], [3] to a tangible impact on publications, with motivates the contribution we provide. Section III illustrates
special issues [4] and special kinds of articles (reproducible our proposed method and the system we developed, which is
articles or R-articles) [5]. Several practical solutions have experimentally validated in Section IV. Section V concludes
been advanced to support good experimental methodologies, the paper.
ranging from the use of datasets [6], [7], to the development
of platforms for benchmarking [8], [9], and to the definition II. MOTIVATION
of robotic competitions [10], [11]. In this section, we motivate the approach we present in
Among the several aspects that are involved in good this paper to improve the repeatability of experiments in
experimental methodologies, the principles of reproducibility robotics. We focus on a specific, yet significative and widely
and repeatability are central. They refer to two similar but not studied, domain, that of SLAM. Broadly speaking, in a
fully overlapped characteristics of experimental practice [12]. SLAM problem, a robot should localize itself within a map
Reproducibility is the possibility to verify, in an independent of the environment that it is building at the same time, on the
way, the results of an experiment. This means that exper- basis of data coming from its sensors, typically laser range
imenters, other than those claiming for the validity of the scanners and encoders.
results, should be able to achieve the same results when We consider a well-known SLAM algorithm based on
starting from the same initial conditions, using the same particle filters, called GMapping [14]. In brief, it maintains
type of instruments and parameters, and adopting the same a predefined number of hypotheses (particles) about the map
experimental techniques. Repeatability, instead, refers to the of the environment and the pose of the robot, which are
fact that a single result is not sufficient to ensure the success continuously updated according to the information provided
of an experiment. A successful experiment must involve a by new observations (laser range scans and odometry read-
number of trials, possibly performed at different times and ings). The selection of the particles that should be maintained
in different places, in order to guarantee that results have or eliminated at each update step is based on a maximum
not been achieved by chance, but are systematic, and that likelihood probabilistic approach, so that particles that are
statistically significant trends can be identified. less likely to represent the current knowledge of the robot
In this paper we propose an approach that enhances (including the observations) tend to be replaced.
the repeatability of experiments performed in robotics. In We also consider a commonly used metric to evaluate
1 F. Amigoni and V. Castelli are with the Artificial Intelligence the performance of SLAM algorithms [15], that provides a
and Robotics Laboratory, Politecnico di Milano, Piazza Leonardo da measure of the translational and rotational components of
Vinci 32, 20133 Milano, Italy [email protected], the localization error, which is calculated by comparing the
[email protected] trajectory of the robot as reconstructed by a SLAM algorithm
2 M. Luperto is with the Applied Intelligent Systems Laboratory, Univer-
sità degli Studi di Milano, Via Festa del Perdono 7, 20122 Milano, Italy and the ground truth trajectory. The details of the metric are
[email protected] explained in Section III-B.

978-1-5386-8094-0/18/$31.00 ©2018 IEEE 7237


Authorized licensed use limited to: University of Science & Technology of China. Downloaded on June 19,2023 at 08:58:51 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Translational localization errors (in m) of GMap-
ping over 12 runs in 5 indoor environments.
building 1 building 2 building 3 building 4 building 5
average 0.191 0.195 0.225 0.262 0.195
std dev 0.027 0.040 0.032 0.034 0.052

Fig. 1 shows the translational localization errors of GMap-


ping when it is applied to data acquired in 5 indoor en-
vironments. Each curve is relative to an environment and
represents the translational localization errors obtained in
12 runs (with the same experimental setting: same robot
configuration, same initial pose, same parameters for the
algorithm, . . . ), ordered from the largest to the smallest. The
mean and standard deviation of the errors are reported in Fig. 2: The workflow of our system.
Table I. It is clear that there is great variability (which is
common to all the environments). As a consequence, the
performance measured according to a single run is hardly statistically significant conclusions can be drawn. We focus
informative about the average performance of GMapping in on SLAM algorithms, and on GMapping in particular, but
the environment. This is clearly in contrast with the principle many of the following considerations and results could be
of repeatability. The sources of this variability are the noise extended to other SLAM algorithms and to other domains,
affecting the perceptions and the movements of the robot as well. An overview of the workflow of our system is shown
and the inherent stochasticity of GMapping. While it is in Fig. 2. Details are provided in the following of this section.
possible that this variability could be less significant for In general, running experiments for testing SLAM algo-
some environments and some SLAM algorithms, the problem rithms involves three main elements: the algorithm, the test
seems rather general and little investigated. data, and the metric. The algorithm we consider in this paper
Indeed, evaluating SLAM algorithms on the basis of the is GMapping [14], since it is very well-know and could be
performance measured in a single run (or in few runs) considered as a good representative of SLAM algorithms.
seems to be common also if, as we have shown, it does The test data usually come from datasets (like Radish [6] and
not provide a reliable assessment. For instance, in [14], the Rawseeds [7]) or from direct acquisition using real robots.
performance of GMapping is evaluated with a single run However, these two collection methods provide test data
in three environments, while in [16] the authors consider a relative to single runs (or to a small number of runs) and can
single simulation run per environment and three real-world be hardly applied to provide data corresponding to several
runs in a test arena. In this work, we propose an original runs. For this reason, we employ automated simulations to
approach that automates the execution of data collection generate test data (Section III-A). Finally, in other to evaluate
runs in simulated environments in order to inexpensively the quality of the results returned by GMapping when it is
generate several test data on which SLAM algorithms are applied to the test data, we need a metric and an automatic
automatically evaluated. way to apply it (Section III-B).
The code of our system is available1 . Datasets are currently
III. OUR APPROACH available upon request, amounting to more than 300 GB.
In this section, we present our proposal for improving
the repeatability of experiments in robotics. We develop a A. Automated Simulations
system that generates a large number of test data on which We first define a set of environments E where we collect
the test data on which the performance of GMapping is
evaluated. We consider a set of 100 indoor environments rep-
translational localization error [m]

resenting real world buildings belonging to different building


building 1 types (like schools, offices, and university campuses) and
0.3 building 2 having different sizes and shapes (to increase diversity of
building 3
building 4
data). Each environment E 2 E is represented as a png
building 5 image, with a resolution of 0.05 ⇥ 0.05 meters per grid
0.2
cell (pixel). Environments in E are taken from 3 different
sources. E includes 11 of the 20 floor plans of [17], some
0.1 of which are from the Radish repository (we select the
version of environments without furniture). Their size ranges
exploration runs from 100 m2 to 1000 m2 . E also includes 25 floor plans of
Fig. 1: Performance of GMapping in 12 runs in 5 indoor 1 https://github.com/AIRLab-POLIMI/
environments. predictivebenchmarking

7238
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on June 19,2023 at 08:58:51 UTC from IEEE Xplore. Restrictions apply.
buildings of the MIT university campus, from [18]. Their size closures is an interesting future direction of work.)
ranges from 1000 m2 to 30 000 m2 . Finally, we complete E At the end of each run RE we have thus the set D(RE )
with our dataset of 64 floor plans [19], 26 offices and 38 of data (laser range scans and odometry readings) collected
schools, whose size ranges from 100 m2 to 10 000 m2 . along the path followed to cover environment E. These data
For each environment E 2 E we collect, using simula- are fed to GMapping that, at the end of the exploration,
tions, a set of test data DE to be fed to GMapping. Note that, produces the grid map M (RE ) and the estimated poses of
in addition to reducing the costs of data collection, simula- the robot x1:TRE from time step 1 to TRE (the time step
tions also easily provide the ground truth of the trajectories at which the exploration run RE ended). The process is
followed by the robot that, as discussed in Section III-B, is automatically iterated until a number of runs |RE | (where
required by the metric we employ. RE is the set of runs performed in E) are performed for each
Simulations are performed in Stage, using the ROS GMap- environment E, as we discuss in Section III-C. Eventually,
ping2 and Navigation packages3 . Mapping is performed for each environment E 2 E, we have the set DE =
using 40 particles and processing a new scan whenever {D(RE ) for all RE 2 RE } of test data and the correspond-
the robot travels 1 m, rotates 0.25 rad, or 5 s have passed ing results produced by GMapping, namely the set of grid
since the last update of the map. We employ a virtual robot maps {M (RE ) for all RE 2 RE } and the set of estimated
equipped with a two-dimensional laser range scanner with poses {x1:TRE for all RE 2 RE }. (We note that the test
a field of view of 270 , an angular resolution of 0.5 , and data DE could be used to evaluate other SLAM algorithms
a range of 30 m. In our simulations, we assume that the without the need to re-run simulations.) We point out that the
virtual robot has a translational odometry error of up to test data DE are relative to the particular configuration of the
0.01 m/m and a rotational odometry error of up to 2 /rad, virtual robot (and of its sensors) that we have considered. For
which provide a reasonable approximation of the odometry example, changing the field of view or the range of the laser
accuracy of real wheeled robots. The actual amount of error range scanner leads to generate a set of data that could be
is randomly chosen by the simulator (at the start of each run) different. However, generating a new set of test data DE for
with uniform probability in the range [ 0.005, +0.005]m/m an environment E is relatively cheap (for example, in a large
and [ 1, +1] /rad, respectively. Although Stage, as any environment, like that of Fig. 6b, an exploration run requires,
simulator, does not fully capture all the aspects of the real on average, 43.8 minutes). Similarly, the data {M (RE )} and
world, its use allows us to generate data easily. Moreover, as {x1:TRE }, representing the results of GMapping, depend on
shown in Section IV, these data are quite similar (in order the configuration of the algorithm (and, of course, on test
to evaluate GMapping performance) to those obtained with data DE ).
real robots.
Given an environment E and a starting pose for the robot B. Automated Metric
(close to the center of the environment, the same for all runs), The research community has developed several metrics to
a run RE explores E using the frontier-based exploration assess the performance of SLAM algorithms. Some of them
approach of [20] according to which the robot moves to the involve a comparison with the ground truth [16], [21], [22],
closest frontier, where a frontier is a region on the boundary while some others are based on evaluating the usefulness of
between known and unknown space, collecting laser range the results (e.g., maps or localization) produced by SLAM
scans and odometry readings at each time step (every 100 ms algorithms [7], [23]. In our work, we employ a variant of the
in our case). These (timestamped) data are both fed to the localization error metric proposed by [15], which measures
ROS GMapping node and stored in a ROS bag file4 . A the performance of SLAM algorithms according to their
run RE ends when two consecutive snapshots of the grid ability to accurately estimate the actual trajectory followed
map produced by GMapping (taken every 120 s for small by the robot. The idea is to measure the deformation energy
environments and every 600 s for large environments) are that is required to superimpose the estimated trajectory onto
similar enough, according to the mean square error metric the ground truth trajectory: the smaller the energy, the higher
that evaluates the difference between images. Empirically, the accuracy of the reconstruction. We choose this metric
this automated criterion for termination amounts to fully because it does not rely on any particular map representation
map the environment E in almost all runs. Note that the format nor on any specific sensor.
exploration process finds frontiers on the map of the envi- In [15], the metric is defined as follows.
ronment incrementally built by GMapping. As a consequence
of this, and of the localization errors simulated in Stage, Definition III.1. Let x1:T be the poses of the robot estimated
exploration runs follow different paths in the environment. by a SLAM algorithm from time step 1 to T during an
These paths include several loop closures, although our exploration run of environment E; in our case, xt 2 SE(2),
online exploration approach does not actively seek to find with SE(2) being the special Euclidean group of order 2.
them. (The use of offline approaches that optimize loop Let x⇤1:T be the associated ground truth poses of the robot
during mapping.
2 http://wiki.ros.org/gmapping Let i,j = xj xi be the relative transformation that moves
3 http://wiki.ros.org/navigation the pose xi onto xj and let i,j⇤
= x⇤j x⇤i .
4 http://wiki.ros.org/Bags Finally, let be a set of N pairs of relative transformations

7239
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on June 19,2023 at 08:58:51 UTC from IEEE Xplore. Restrictions apply.
over all the exploration, = h i,j , i,j⇤
i . performed on E.
The localization error performance metric is defined as: The sample mean and sample standard deviation of the
1 X translational localization error in E are:
"( ) = ( i,j ⇤ 2
i,j ) =
P
N i,j "t ( R E )
R 2R
1 X "t (E) = E E
= [trans( i,j ⇤ 2 ⇤ 2 |RE |
i,j ) + rot( i,j i,j ) ] =
N i,j
v P
= "t ( ) + "r ( ), u
u [" ( RE ) "t (E)]2
t RE 2RE t
where the sums are over the elements of , is the inverse of s("t (E)) = .
the standard motion composition operator, and trans(·) and |RE |
rot(·) are used to separate the translational and rotational The sample mean and sample standard deviation of the
components of the error. rotational localization error in E are defined as:
P
In order to apply this metric, we need to address two "r ( R E )
R 2R
issues. First, the metric as defined above is intrinsically "r (E) = E E
devoted to evaluate a single run RE in the environment E. |RE |
Second, the metric requires to define the set of pairs of
v P
relative transformations. u
[" ( "r (E)]2
u RE )
Addressing the first issue amounts to moving from the t RE 2RE r
evaluation of the performance measured on a single run s("r (E)) = .
|RE |
in an environment E (namely, on test data D(RE )) to the
We now turn to the second issue discussed above, namely
evaluation of performance measured on all the runs in E
the determination of the set = h i,j , i,j ⇤
i of pairs of rela-
(namely, on test data DE ). In principle, one would like
tive transformations. In each pair, the relative transformation
to evaluate the expected localization error of a generic
i,j between two poses as estimated by the SLAM algorithm
exploration run in an environment E.
is associated to the relative transformation i,j⇤
between the
Definition III.2. Let p E be the probability of observing the ground truth poses. In [15], human expertise is exploited to
set E set of relative transformations during an exploration determine the N pairs of relative transformations in . After
run in an environment E. a run, a human operator analyzes the pairs of laser range
The mean translational localization error in E, E["t (E)], is scans acquired by the robot to determine which ones refer to
the expected value of the translational component of the the same part of the environment and manually aligns them.
localization error over all the possible exploration runs on (This is done in order to cope with the difficulty of collecting
E: ground truth trajectories in real world scenarios.) The amount
X of displacement required for the alignment is stored as the
E["t (E)] = "t ( E ) ⇤ p E .
ground truth of the relative transformation i,j ⇤
between the
E
poses xi and xj from which the laser range scans have been
The standard deviation of the translational localization error acquired. The human operator can match laser range scans at
in E, ["t (E)], is: semantically relevant places (e.g., loop closures), providing
p
["t (E)] = E["t (E)2 ] E["t (E)]2 . ground truth for global consistency. Clearly, this method does
not efficiently scale as the numbers of laser range scans, runs,
Similarly, the mean rotational localization error in E, and environments increase.
E["r (E)], is the expected value of the rotational component Since we are using simulations, we can assume to have
of the localization error over all the possible exploration the ground truth trajectories followed by the robot. Hence,
runs on E: we propose a new way to determine RE of Definition III.3
X
E["r (E)] = "r ( E ) ⇤ p E . that is independent of human intervention. Although, in
E
principle, RE could contain all the possible pairs of relative
transformations (i.e., for all the i and j in 1 : TRE ), this
The standard deviation of the rotational localization error of
solution is unpractical, because the size of RE would be
environment E, ["r (E)], is:
p quadratic in the number of poses on the robot’s trajectory.
["r (E)] = E["r (E)2 ] E["r (E)]2 . We propose to build RE by randomly sampling a set of
relative transformations, whose size trade-offs between sam-
We approximate the above quantities with their sampled
pling quality and computational complexity. The procedure
versions, since the weak law of large numbers guarantees
is based on the central limit theorem to approximate the
their convergence to the theoretical definitions as the number
sampling distribution with a normal distribution [25]. The
of exploration runs |RE | in an environment E increases [24].
quality of the sampling is relative to the accuracy of the
Definition III.3. Let E be a set of environments, E 2 E one estimation of the localization error. More precisely, we set the
of such environments, and RE the set of exploration runs confidence level and the margin of error of the estimation and

7240
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on June 19,2023 at 08:58:51 UTC from IEEE Xplore. Restrictions apply.
# of occurrences
we determine the number of relative transformations sampled 40 40
for estimating the localization error as:
20 20
z↵/2 ⇤ s2
N= , (1)
d2 0 0
0.28 0.3 0.32 0.22 0.24 0.26
where s is the usual unbiased estimator of the population
2
"t () [m]
variance, d is the margin of error, ↵ is the complement
of the desired confidence level and z↵/2 is its associated Fig. 3: Distribution of the translational localization error "t ()
z-score. To validate our approach, we empirically verify in two environments.
the distribution normality assumption on a representative
set of environments. Fig. 3 shows the sample distribution
of the translational localization error "t () in two of these other. However, Chebyshev’s weak law of large numbers
environments. The distributions are obtained by repeatedly guarantees the convergence of the sample mean to the true
extracting 200 different samples of relative transformations mean under the assumption that the covariances tend to be
imposing a 99 % confidence level and a margin of error of zero on average [24]. Then, we assume the distribution of
± 0.02 m. It is evident that the shape of the distributions the sample mean to be approximately normal and we exploit
is approximately normal. The above process is sound if we the same formulation of Equation (1) to obtain |RE |.
assume the relative transformations to be independent and Given E, the estimation of the sample size |RE | is
identically distributed random variables. In principle, this performed as follows. The process starts with an initial
may not be the case for all pairs of relative transformations, estimate of the variance of the localization error, obtained
for example, relative transformations that involve pairs of from a small sample of 10 runs. We use this value to compute
poses that are close to each other are similar and not an initial estimate of the number of required runs. We then
independent. However, the number of possible relative trans- perform that number of runs and compute a new estimate
formations is so large that, given any two random relative of the variance and its associated sample size, iteratively
transformations, the likelihood that they are dependent can repeating the process until the newly estimated sample size is
be assumed negligible for all practical purposes. not larger than the number of already performed runs. In our
To show that sampling relative transformations leads to case, we end up with different values of |RE | for different
a metric that actually captures the quality of SLAM results, environments E, with an average of |RE | = 36 (and a total
Fig. 4 shows a good and a bad map of the same environment, of about 3, 600 simulated exploration runs in Stage).
with the bad map being visibly broken with a room that is Note that the sample size Nt required for an accurate
significantly misaligned. This visual difference is correctly estimate of the translational localization error may differ
reflected by the metric, as the translational and rotational from the sample size Nr required to accurately estimate the
localization errors of the good map are of 0.54 m and rotational localization error; in this case, we consider the
0.02 rad, respectively, while those of the bad map are 2.42 m maximum of Nt and Nr .
and 0.29 rad, respectively.
In summary, given data relative to all runs RE performed IV. EXPERIMENTAL VALIDATION
in the environment E, we calculate mean and standard In this section, we show the effectiveness of the proposed
deviation of "t (E) and "r (E), namely of the two components approach and we validate it.
of the localization error, according to Definition III.3. Fig. 5 shows the translational localization errors of GMap-
According to what discussed at the end of Section III-A, ping in all the runs performed in the 100 environments of
the values of the mean and standard deviation of "t () and E (the rotational localization errors are similar). The vari-
"r () depend on the virtual robot configuration. For example, ability of the performance in any given environment and the
for the environment of Fig. 4, the value of "t () is 0.68 m presence of some outliers are evident, reinforcing the need
if the range of the laser range scanner is 30 m and 0.91 m
if the range is 15 m. (The intuitive explanation is that, with
a reduced range, the robot travels a longer distance and the
error increases.)

C. Size of the Sample of Runs |RE |


Given an environment E 2 E, the values of "t (E),
s("t (E)), "r (E), and s("r (E)) can provide good approx-
imations of the true mean and standard deviation of the two
components of the localization error, namely can satisfy the
repeatability principle, if they are calculated over an enough
large number of runs |RE |. (a) good map (b) bad map
We cannot in general assume that different runs of GMap-
ping in the same environment are independent from each Fig. 4: A good and a bad map of the same environment.

7241
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on June 19,2023 at 08:58:51 UTC from IEEE Xplore. Restrictions apply.
2.00 TABLE II: Translational ([m]) and rotational ([rad]) compo-
nents of the localization error for the dataset of [26].

translational localization error [m]


empty hall furniture
1.50
"t () "r () "t () "r ()
dataset (real robot) 0.189 0.058 0.267 0.070
simulation 0.223 0.030 0.245 0.045
1.00

In the first one, the environment is an empty hall without


0.50
furniture, while in the second one the same environment is
furnished. In both cases, the robot is configured in the same
0.00 way but, unfortunately, the characteristics of the laser range
0 25 50 75 100
scanner are not explicitly reported. So, we analyzed the raw
environments
scans from the dataset, inferring that the field of view should
Fig. 5: The translational localization errors for all the runs be 80 , the angular resolution 1 , the range 30 m, and the
in our environments. frequency 10 Hz. Since they are not explicitly reported, we
assume the translational and rotational odometry errors of
the robot to be upper bounded by 0.01 m/m and 2 /rad,
respectively. We replicate the two settings in our simulation
framework of Section III-A, using the configuration of the
robot employed in [26]. Both the data from the dataset
(collected by the real robot) and the data collected in our
simulations are fed to GMapping and the reconstructed
trajectories are compared to the ground truth trajectory
using the metric of Section III-B. The components of the
localization error are reported in Table II (the simulation
(a) Ea , 1400 m2 (b) Eb , 2400 m2 data are averaged over 10 runs). Comparing the values in
table, we can claim that, for this setting, results obtained
Fig. 6: Two environments for which the performance of in our simulations are a rather good approximation of those
GMapping measured with a single run is not informative. obtained with a real robot.
We consider another dataset collected in the Artificial
Intelligence and Robotics Laboratory (AIRLab) at the Po-
to adopt an experimental methodology, like that embedded litecnico di Milano. The environment has a size of 9 ⇥ 9
in our system, that satisfies the repeatability principle. m2 (see Fig. 7 right) and is covered by an OptiTrack
Fig. 6 shows an example of the utility of the proposed motion capture system to record the ground truth trajectory
system. In environment Ea of Fig. 6a, GMapping has a of the robot. We use a three-wheeled differential drive robot,
mean translational localization error of "t (Ea ) = 0.31 m, called Robocom, equipped with a SICK LMS100 laser range
but the translational localization error of one of the runs is scanner, with a field of view of 270 , an angular resolution of
0.43 m. In environment Eb of Fig. 6b, GMapping has a mean 0.25 , a range of 20 m, and a frequency of 50 Hz (Fig. 7 left).
translational localization error of "t (Eb ) = 0.52 m, but the The translational and rotational errors affecting the odometry
translational localization error of one of the runs is 0.39 m. of Robocom are manually estimated to be not larger than
Therefore, looking only at the two single runs, one could 0.01 m/m and 4 /rad, respectively. In the real world, we
conclude that GMapping performs better in Eb than in Ea , perform 10 runs, each involving the autonomous exploration
but, on average (and with a statistically sound number of of the area (Fig. 7 center) following the same frontier-based
runs, see Section III-C), the opposite is true. approach of Section III-A. We also perform 10 simulations,
We now validate our approach based on simulation by as before, reproducing the environment in our simulation
comparing results obtained with it against those obtained framework and using the Robocom configuration (Fig. 7
with real robots. We use some datasets collected with real right). The comparison of the components of the localization
robots (both publicly available and acquired in our lab) error made by GMapping when applied to data coming from
and we implement simulated versions of the same settings. Robocom and from the simulations is shown in Table III.
Note that, in order to use the metric of Section III-B, the The difference between the "t () of the simulations and that
datasets we use for validation must include the ground truth of the real robot is rather small (a difference of 2.5 cm on
trajectories of the robot. the mean and less than 1 cm on the standard deviation). The
We first consider the dataset of [26], which is composed of difference relative to "r () is more significant and can be
four runs executed by a real robot in an L-shaped industrial explained with the poor rotation mechanism of Robocom,
hall of size 10 ⇥ 12 m2 . Ground truth data of the robot tra- which sometimes introduces errors larger than the 4 /rad
jectories are obtained with a motion capture system and are threshold we estimated. Overall, the results of Tables II
reported in the dataset. We extract two runs from the dataset. and III show that the performance of GMapping obtained

7242
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on June 19,2023 at 08:58:51 UTC from IEEE Xplore. Restrictions apply.
TABLE III: Translational and rotational components of the [4] F. Bonsignorio and A. del Pobil, “Toward replicable and measurable
localization error for the AIRLab experiment. robotics research [from the guest editors],” IEEE RAM, vol. 22, no. 3,
pp. 32–35, 2015.
translational error [m] rotational error [rad]
[5] F. Bonsignorio, “A new kind of article for reproducible research in
"t () s("t ()) "r () s("r ())
Robocom 0.086 0.026 0.066 0.010 intelligent robotics [from the field],” IEEE RAM, vol. 24, no. 3, pp.
simulation 0.101 0.019 0.022 0.004 178–182, 2017.
[6] A. Howard and N. Roy, “The robotics data set repository (radish),”
2003. [Online]. Available: http://radish.sourceforge.net/
[7] G. Fontana, M. Matteucci, and D. Sorrenti, “Rawseeds: Building
a benchmarking toolkit for autonomous robotics,” in Methods and
Experimental Techniques in Computer Engineering, F. Amigoni and
V. Schiaffonati, Eds. Springer, 2014, pp. 55–68.
[8] J. Weisz, Y. Huang, F. Lier, S. Sethumadhavan, and P. Allen,
“RoboBench: Towards sustainable robotics system benchmarking,” in
Proc. ICRA, 2016, pp. 3383–3389.
[9] D. Pickem, P. Glotfelter, L. Wang, M. Mote, A. Ames, E. Feron, and
M. Egerstedt, “The Robotarium: A remotely accessible swarm robotics
research testbed,” in Proc. ICRA, 2017, pp. 1699–1706.
[10] F. Amigoni, E. Bastianelli, J. Berghofer, A. Bonarini, G. Fontana,
Fig. 7: Robocom (left). The map built by Robocom in the N. Hochgeschwender, L. Iocchi, G. Kraetzschmar, P. Lima, M. Mat-
AIRLab (center). The map build in a simulation run (right). teucci, P. Miraldo, D. Nardi, and V. Schiaffonati, “Competitions for
benchmarking,” IEEE RAM, vol. 22, no. 3, pp. 53–61, 2015.
[11] L. Iocchi, D. Holz, J. Ruiz-del-Solar, K. Sugiura, and T. van der Zant,
“RoboCup@Home: Analysis and results of evolving competitions for
with data collected with our simulator is comparable to that domestic and service robots,” Artif Intell, vol. 229, pp. 258–281, 2015.
obtained with data collected with real robots. This outcome [12] I. Hacking, Representing and Intervening. Cambridge University
Press, 1983.
suggests the validity of our simulation-based approach to [13] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. The MIT
automatically evaluate SLAM algorithms. Press, 2005.
[14] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for
V. CONCLUSIONS grid mapping with Rao-Blackwellized particle filters,” IEEE T Robot,
vol. 23, no. 1, pp. 34–46, 2007.
In this paper we have presented an approach to address the [15] R. Kümmerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti,
C. Stachniss, and A. Kleiner, “On measuring the accuracy of SLAM
limited repeatability of experiments performed to evaluate algorithms,” Auton Robot, vol. 27, no. 4, pp. 387–407, 2009.
SLAM algorithms. The proposed system exploits simulations [16] J. Santos, D. Portugal, and R. Rocha, “An evaluation of 2D SLAM
to generate a large amount of test data with relatively small techniques available in Robot Operating System,” in Proc. SSRR, 2013,
pp. 1–6.
effort and automates the evaluation of SLAM algorithms. The [17] R. Bormann, F. Jordan, W. Li, J. Hampp, and M. Hägele, “Room
validation has shown that GMapping performs similarly on segmentation: Survey, implementation, and analysis,” in Proc. ICRA,
test data collected by real robots and on test data generated 2016, pp. 1019–1026.
[18] E. Whiting, J. Battat, and S. Teller, “Generating a topological model of
with our approach. Note that the availability of several test multi-building environments from floorplans,” in Proc. CAADFutures,
data, relative to different runs in the same environment and 2007, pp. 115–128.
to different environments, promotes also the reproducibility [19] M. Luperto, A. Quattrini Li, and F. Amigoni, “A system for building
semantic maps of indoor environments exploiting the concept of
of experimental results. building typology,” in Proc. RoboCup, 2013, pp. 504–515.
While we have considered a specific algorithm (GMap- [20] B. Yamauchi, “A frontier-based approach for autonomous exploration,”
ping) and a specific simulator (Stage), most modules of our in Proc. CIRA, 1997, pp. 146–151.
[21] B. Balaguer, S. Carpin, and S. Balakirsky, “Towards quantitative
system could be generalized with small adjustments to other comparisons of robot algorithms: Experiences with SLAM in simu-
SLAM algorithms and other simulators, and, in principle, lation and real world systems,” in IROS Workshop on Performance
to other domains. Preliminary results obtained with Karto Evaluation and Benchmarking for Intelligent Robots and Systems,
2007.
SLAM5 seem to confirm the findings of this paper. A draw- [22] S. Schwertfeger and A. Birk, “Map evaluation using matched topology
back of the proposed approach, as it is currently structured, graphs,” Auton Robot, vol. 40, no. 5, pp. 761–787, 2016.
is that the test data it generates depend on the configuration [23] T. Colleens, J. Colleens, and D. Ryan, “Occupancy grid mapping: An
empirical evaluation,” in Proc. MED, 2007, pp. 1–6.
of the virtual robot and of its sensors. Making the approach [24] S. Karlin and H. Taylor, A First Course in Stochastic Processes.
more platform-independent is one of the challenges for future Academic Press, 1975.
work. [25] P. Billingsley, Probability and Measure. Wiley, 1995.
[26] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A
benchmark for the evaluation of RGB-D SLAM systems,” in Proc.
R EFERENCES IROS, 2012, pp. 573–580.
[1] F. Amigoni and V. Schiaffonati, “Models and experiments in robotics,”
in Springer Handbook of Model-Based Science, L. Magnani and
T. Bertolotti, Eds. Springer, 2017, pp. 799–815.
[2] F. Bonsignorio, J. Hallam, and A. del Pobil, “Gem
guidelines,” http://www.heronrobots.com/EuronGEMSig/downloads/
GemSigGuidelinesBeta.pdf, last visited July 2018.
[3] F. Amigoni, M. Reggiani, and V. Schiaffonati, “An insightful compar-
ison between experiments in mobile robotics and in science,” Auton
Robot, vol. 27, no. 4, pp. 313–325, 2009.

5 http://wiki.ros.org/slam_karto

7243
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on June 19,2023 at 08:58:51 UTC from IEEE Xplore. Restrictions apply.

You might also like