0% found this document useful (0 votes)

16 views14 pages

Real Time SLAM

Uploaded by

avi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views14 pages

Real Time SLAM

Uploaded by

avi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Vincke et al.

EURASIP Journal on Embedded Systems 2012, 2012:5

http://jes.eurasipjournals.com/content/2012/1/5

RESEARCH Open Access

Real time simultaneous localization and

mapping: towards low-cost multiprocessor
embedded systems
Bastien Vincke1* , Abdelhaﬁd Elouardi1 and Alain Lambert2

Abstract
Simultaneous localization and mapping (SLAM) is widely used by autonomous robots operating in unknown
environments. Research community has developed numerous SLAM algorithms in the last 10 years. Several works
have presented many algorithms’ optimizations. However, they have not explored a system optimization from the
system hardware architecture to the algorithmic development level. New computing technologies (SIMD
coprocessors, DSP, multi-cores) can greatly accelerate the system processing but require rethinking the algorithm
implementation. This article presents an eﬃcient implementation of the EKF-SLAM algorithm on a multi-processor
architecture. The algorithm-architecture adequacy aims to optimize the implementation of the SLAM algorithm on a
low-cost and heterogeneous architecture (implementing an ARM processor with SIMD coprocessor and a DSP core).
Experiments were conducted with an instrumented platform. Results aim to demonstrate that an optimized
implementation of the algorithm, resulting from an optimization methodology, can help to design embedded
systems implementing low-cost multiprocessor architecture operating under real-time constraints.

Introduction be integrated in most of embedded systems in commercial

Autonomous robots must be able to localize them- objectives or industrial applications.
selves. Simultaneous localization and mapping (SLAM) Simultaneous localization and mapping systems using
algorithms aim to build an environment map while esti- low-cost sensors have been recently designed. Abrate
mating the robot pose. Many researches were conducted et al. [9] provide an implementation of the EKF-SLAM
to develop SLAM algorithms like extended Kalman filter algorithm on a Khepera robot. The robot hosts limited
for SLAM (EKF-SLAM) [1,2], FAST SLAM [3], GRAPH range, sparse and noisy IR sensors. Experimental results
SLAM [4], DP-SLAM [5] which aim to improve consis- have shown the importance of the sensor characteristics,
tency, accuracy or robustness. Other algorithms derivate the primitives (lines) extraction and data association. Yap
from the EKF-SLAM, such as algorithms using unscented and Shelton [10] use cheap, noisy and sparse sonar sensors
Kalman filter (UKF) [6] which increases the localization embedded in a P3-DX robot. To cope with these low-cost
accuracy against the classical EKF algorithm based on a sensors, the implemented SLAM algorithm uses a multi-
linearized model. Only few works deal with the implemen- scan approach and an orthogonality assumption to map
tation of low-cost SLAM embedded systems. indoor environments.
Most of SLAM implementations rely on the use of Classical SLAM algorithms are too computationally
accurate and dense measurements provided by expensive intensive to run on an embedded computing unit. They
sensors like laser rangefinder sensors [7] or time of flight require at least laptop-level performances. Gifford et al.
cameras [8]. High-priced smart sensors are not suitable to [11] present a low-cost approach to autonomous multi-
robot mapping and exploration for unstructured envi-
ronments. The robot hosts a Gumstix computing unit
*Correspondence: [email protected] (600 Mhz), 6 IR scanning range arrays, a 3-axis gyroscope
1 Univ Paris-Sud, CNRS, Institut d’Electronique Fondamentale, F-91405 Orsay,
France and odometers. Running DP-SLAM alone on the Gumstix
Full list of author information is available at the end of the article with 15 particles takes on average 3 s per update. While

© 2012 Vincke et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly cited.
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 2 of 14
http://jes.eurasipjournals.com/content/2012/1/5

using 25 particles, it takes more than 10 s per update. Section “Evaluation methodology and algorithm imple-
Authors have underlined the difficulty to find the right mentation” details the evaluation methodology, provides
SLAM parameters to fit within the available computing a first algorithm implementation and analyzes this imple-
power and the real-time processing. Magnenat et al. [12] mentation in terms of processing time. A Hardware–
present a system based on the co-design of a low-cost software optimization is proposed and analyzed in Section
sensor (a slim rotating scanner), a SLAM algorithm, a “Hardware–software optimization and improvements”. It
computing unit, and an optimization methodology. The presents SIMD optimizations and DSP parallelization. A
computing unit is based on an ARM processor (533 Mhz) performance comparison is then performed between the
running a FASTSLAM 2.0 algorithm [13]. Magnenat et al. optimized and non-optimized instances. Finally, Section
[12] use an evolution strategy to find the best configura- “Conclusion” concludes this article.
tion of the algorithm and setting of the parameters.
As pointed out by [11,12], the first improvement of EKF-SLAM algorithm
a SLAM algorithm is an efficient setting of the various Overview
parameters of the algorithm. Other modifications were Extended Kalman filter for SLAM estimates a state vector
investigated to reach real-time constraints. These modifi- containing both the robot pose and the landmark loca-
cations are necessary due to the low computing power and tions. We consider that the robot is moving on a plane.
limited memory resources available on embedded sys- The algorithm uses 3D points as landmarks. It uses pro-
tems. Features restriction for EKF-SLAM algorithm has prioceptive sensors to compute a predicted vector and
been implemented to decrease the processing time [14]. then corrects this state using exteroceptive sensors. In
Schroter et al. [15] focused on reducing the memory foot- this article, we consider a wheeled robot embedding two
print of particle-based gridmap SLAM by sharing the map odometers (attached to each rear wheel) and a camera.
between several particles.
Robust laser-based SLAM navigation has long existed in State vector and covariance matrix
robot applications, but systems implement sensors that, With N landmarks, the state vector is defined as:
in some cases, are more expensive than the final prod-
uct. Neato Robotics has developed a vacuum cleaner that x = (x, z, θ, xa1 , ya1 , za1 , . . . , xaN , yaN , zaN )T (1)
implements a navigation system using a SLAM algorithm.
The approach is based on a low-cost system implementing where:
a designed laser rangefinder [16]. • x,z are the ground coordinates (x -axis, z -axis) of the
This article presents an efficient implementation of the robot rear axle center. We suppose that the robot is
EKF-SLAM algorithm on a multi-processor architecture. always moving on the ground, so y = 0 (no elevation)
The approach is based on an algorithm implementation and y does not appear in Equation (1).
adequate to a defined architecture. The aim is to optimize • θ is the orientation of a local frame attached to the
the implementation of the SLAM algorithm on a low-cost robot with respect to the global frame.
and heterogeneous architecture implementing an SIMD • xa1 , ya1 , za1 , . . . , xaN , yaN , zaN are the 3D coordinates
coprocessor (NEON) and a DSP core. The hardware of the N landmarks in the global frame.
includes several low-cost sensors. As [17], we chose to
use a low-cost camera (exteroceptive sensor) and odome- The state covariance matrix is defined as:
ters (proprioceptive sensors). Following [12], we efficiently ⎡ ⎤
tune the parameters of the SLAM algorithm. We improve Pxx Pxz Pxθ Pxxa1 .. Pxza
N
⎢ Pzx Pzz Pzθ Pzx .. Pzza ⎥
on previous works by proposing an adequate implemen- ⎢ a1 ⎥
⎢ P N
⎥
tation of the EKF-SLAM algorithm on a multiprocessing ⎢ θx Pθz Pθθ Pθxa1 .. PθzaN ⎥
P=⎢ ⎥ (2)
architecture (ARM processor, SIMD NEON coprocessor, ⎢ Pxa x Pxa z Pxa θ Pxa xa .. Pxa za ⎥
⎢ 1 1 1 1 1 1 N ⎥
DSP core). The specifications related to the NEON copro- ⎣ .. .. .. .. .. .. ⎦
cessor and the DSP core improve the processing time and PzaN x PzaN z PzaN θ PzaN xa1 .. PzaN zaN
the system performance. Results aim to demonstrate that
an optimized implementation of the algorithm, result- Prediction
ing from an evaluation methodology, can help to design The prediction step relies on the measurements of the
embedded systems implementing low-cost multiproces- proprioceptive sensors, the odometers, embedded on our
sor architecture operating under real-time constraints. experimental platform. A non linear discrete-time state-
Section “EKF-SLAM algorithm” introduces the EKF- space model is considered to describe the evolution of the
SLAM algorithm. Section “Multiprocessor architecture robot configuration x:
and system configuration” presents the embedded mul-
tiprocessor architecture and the system configuration. xk|k−1 = f(xk−1|k−1 , uk ) + vk (3)
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 3 of 14
http://jes.eurasipjournals.com/content/2012/1/5

where uk is a known two-dimensional control vector, The innovation and its covariance matrix: The pinhole
assumed constant between the times indexed by k − 1 and model is used to project a known landmark position into
k, and vk is an unknown state perturbation vector that the image:
accounts for the model uncertainties. xk−1|k−1 represents ⎛ ⎞
the state vector at time k-1, xk|k−1 represented the state ui
⎝ vi ⎠ = pinhole(xcam
ai , yai , zai )
cam cam
vector after the prediction step, xk|k represents the state
1
vector after the estimation step. The classical evolution ⎛ cam ⎞
model, described in [18], is considered: ⎡ ⎤ xai (7)
fku suv cu ⎜ zacam i ⎟
= ⎣ 0 fkv cv ⎦ ⎜ yai ⎟
cam
⎛ ⎞ ⎝ zcam ⎠
xk−1 + δs cos θk−1 + δθ2 0 0 1 ai
⎜ zk−1 + δs sin θk−1 + δθ ⎟ 1
⎜ 2 ⎟
⎜ θk−1 + δθ ⎟
⎜ ⎟ where:
⎜ xa1 ,k−1 ⎟
⎜ ⎟
⎜ ya1 ,k−1 ⎟ • (ui , vi ) is the position of the i -th landmark in the
⎜ ⎟
f(xk−1|k−1 , δs, δθ) = ⎜
⎜ za1 ,k−1 ⎟
⎟ image.
⎜ ⎟ • (xcam
ai , yai , zai ) is the position of the i -th landmark
⎜ .. ⎟ cam cam
⎜ .. ⎟
⎜ ⎟ in the camera frame.
⎜ xaN ,k−1 ⎟ • f is the focal length.
⎜ ⎟
⎝ yaN ,k−1 ⎠ • (ku , kv ) is the number of pixels per unit length.
zaN ,k−1 • suv is a factor accounting for the skew due to
(4) non-rectangular pixels. In our case, we take suv =0.

where uk = (δs, δθ); δs is the longitudinal motion and δθ Equation (7) can be written as the predicted observation
is the rotational motion [19]: equation for a single landmark:
⎛ xcam ⎞
wr δϕr +wl δϕl cu + fku zcam
ai
δs u
δθ
= g(ϕl , ϕr ) = wr δϕr −w
2
l δϕl
(5) hi (xk|k−1 ) = i
=⎝ ai
ycam
⎠ (8)
e vi c + fk i
a
v v zcam
ai

where: The pose of a landmark in the camera frame is deﬁned

• wr and wl are respectively the radius of the right and from its pose (xai ,yai ,zai ) in the global frame:
left wheel. ⎛ ⎞ ⎛⎡ ⎤⎛ ⎞⎞ ⎛ ⎞
• e is the length of the rear axle. xcam
ai cos θ 0 sin θ xai − x 0
• δϕi = δpi 2π ⎝ ycam ⎠ = ⎝⎣ 0 1 0 ⎦ ⎝ yai ⎠⎠−⎝ 0 ⎠
ρ with i ∈ {r, l} (r =right, l =left), δpi :
ai
number of steps, ρ: odometer resolution. δϕi is the zacam
i
− sin θ 0 cos θ zai − z D
angular movement of the right/left wheel. (9)

The state covariance matrix is deﬁned as: Where D is the length between the camera and the robot
rear axle center.
∂f ∂f T During the observation step, the algorithm matches M
Pk|k−1 = Pk−1|k−1 + Qk (6)
∂x ∂x
⎛ 0(M⎞<= N) whose observations are added in
landmarks
where h
⎡ ⎤ hk = ⎝ .. ⎠.
1 0 −δs sin θk−1|k−1 + δθ2 0 .. 0 hM−1
⎢ 0 1 δs cos θk−1|k−1 + δθ 0 .. 0 ⎥ Thus, the innovation is:
⎢ 2 ⎥
⎢0 0 1 0 .. 0 ⎥
∂f
• ∂x =⎢ ⎢0 0
⎥
Yk = ẑk − hk (xk|k−1 )
⎢ 0 1 .. 0 ⎥ ⎥
(10)
⎣ .. .. .. .. .. .. ⎦ where ẑk is the measurement for all the M predicted
0 0 0 0 .. 1 observations.
• Q k is the covariance matrix of the process noise.
The innovation covariance Sk is:
Estimation Sk = Hk Pk|k−1 HTk + Rk (11)
The estimation of the state is made using the camera
which returns the position in the image (ui , vi ) of the i-th where Hk is the Jacobian of hk and Rk is the observation
landmark. noise covariance.
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 4 of 14
http://jes.eurasipjournals.com/content/2012/1/5

State estimation: The state is updated using the classical point inside τ . For each candidate point p : (px , py ), the Np
EKF equations: value of the weighted ZMSSD is:

Kk = Pk|k−1 Hk S−1
k Np = w(px , py ) × ZMSSD (13)
xk|k = xk|k−1 + Kk Yk (12)
and
Pk|k = (I − Kk Hk )Pk|k−1

Visual landmarks ZMSSD = (d(i, j) − md )
The landmarks used in the observation equation are i,j
extracted from images. Landmark initialization defines 2
des des
the initial coordinates and the initial covariance of land- − im px + i − , py + j − − mi
2 2
marks localization (also called interest points or features). (14)
In [20], we have evaluated the processing time of cor-
ner detectors like Harris, Shi-Tomasi or FAST. Harris and where:
Shi-Tomasi detectors were more time consuming than
the FAST detector and do not provide significantly better • w(px , py ) is the Gaussian weights defined by the
localization results than FAST. Consequently, there is no landmark covariance.
need to implement more sophisticated algorithms such as • i ∈[ 0; des − 1] and j ∈[ 0; des − 1] and des is the
Harris or Shi and Tomasi. FAST [21] (Features from Accel- descriptor size.
erated Segment Test) corner detector relies on a simple • d is the feature descriptor.
test performed for a pixel p by examining a circle of 16 • md and mi are respectively the means of the pixel
pixels (a Bresenham circle of radius 3) centered on p. A values in the descriptor and in the image window.
feature is detected at p if the intensities of at least 12 con- • im is the image.
tiguous pixels are all above or all below the intensity of p
with a threshold t. Even if this detector is not highly robust The observation pobs will be selected using p = (p ∈
to noises and depends on a threshold it produces stable τ |Np = min(Npj ), ∀pj ∈ τ ).
landmarks and is computationally very efficient [21]. The descriptor, used to identify the landmark dur-
The FAST detector [21] is related to the wedge-model ing the matching, is classically a small image window
style of detector evaluated using a circle surrounding a of 9×9 pixels to 16×16 pixels around the interest point.
candidate pixel. To optimize the detector processing-time, Davison [22] claims that this sort of descriptor is able to
this model is used to made a decision classifier which is serve as long-term landmark feature.
applied to the image (Figure 1).
Landmark initialization based on davison method
Matching based on zero-mean sum of squared differences Landmark initialization consists of defining the initial
The EKF-SLAM matches the previously detected feature coordinates and the initial covariance of landmarks (inter-
with a new one using zero-mean sum of squared differ- est points). Various methods exist and can be classified
ences (ZMSSD). as an undelayed or delayed method. Undelayed method
The covariance of the projected feature localization adds landmarks with only one measurement whereas the
defines a searching area τ . This area includes the robot delayed method needs two or more frames. We chose
localization uncertainty and the landmarks localization to use the widely spread delayed method proposed by
uncertainty. We use the ZMSSD to find the best candidate [2] which is both efficient and adequate to implement.

Figure 1 Image (320 × 240 Pixels) of the embedded camera and result of the FAST detector.
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 5 of 14
http://jes.eurasipjournals.com/content/2012/1/5

Furthermore the work of Munguia and Grau [23] shows

that the delayed method have the same performance as
the undelayed method.
In order to compute the 3D depth of a newly detected
landmark, as [2], we initialize a 3D line into the map
along which the landmark must lie. This line starts
at the estimated camera position and heads to infinity
along the feature viewing direction. The line is composed
of 100 particles which represent depth hypothesis. The
prior probability used is uniform and the range is 0.5 to
15 m. At subsequent time, each particle (a feature depth
hypothesis) is projected into the image, matched and its
Figure 3 System architecture.
probability is re-weighted.
When the ratio of the standard deviation of depth to
the expected value is below a threshold, the distribution
is approximated as a Gaussian and the landmark is initial- evaluate the implementation of SLAM algorithms using
ized. The landmark pose Ai = (xai , yai , zai ) is added to x land vehicles and sensors, like steering encoders and a
and the Ai covariance is added into P. camera.
The use of wheel and steer encoders is obvious in
Multiprocessor architecture and system robotics and navigation. Simple kinematic motion mod-
configuration els can be used to integrate velocity and heading mea-
In order to test and validate the EKF-SLAM algo- surements from wheel and steer encoders to provide
rithm, experiments were conducted with an instrumented an estimation of the mobile robot location and orien-
mobile robot called Minitruck [24]. The platform was tele- tation. Estimations are regularly subject to considerable
operated during the experiments. For our first evaluation, errors due to misalignment, offsets and wheels slippage.
the experiment consists to operate inside a large corridor It is possible to implement basic models to approxi-
of our research lab (see Figure 2). mate and correct offset and slippage errors on-line lead-
We have developed a system architecture on the top of ing to significant improvement of performances. We
a multi-processor board (Gumstix Overo) based on the chose two HEDS 5540 odometers for our experimental
OMAP3530 chip (see Figure 3). The OMAP chip inte- vehicle.
grate a RISC processors (ARM Cortex A8 500 Mhz) with The feature detection in SLAM application relies
an SIMD NEON coprocessor, a DSP (TMS320C64x + 430 on the embedded sensors. We chose to achieve this
Mhz) and a graphical processor unit (POWERVR SGX). extraction using a vision sensor (a cheap USB web-
This board communicates with an additional processor cam, Philips SPC530NC, delivering 30 fps). We chose
for control and data acquisition (Atmega168 16 Mhz). to use all possible images (30 fps) because it is much
Multiple sensors (odometers and a camera) are inter- easier to perform point matching if the movement
faced to this architecture (Figure 2). The variety of sensors is small. Conventional approaches for vision systems
enables us to evaluate the SLAM algorithms with different design are usually based on general purpose com-
types of sensor data and take advantage of the information puters interfaced with cameras. The new comput-
complementary of these sensors. Our objective is to ing technologies (SIMD, DSP, multi-cores) can greatly
accelerate algorithm processing, but require rethinking
these algorithms by optimizing the parallelism. This
parallel processing is pushed to integrate near the
sensors parallel computing units [25]. We have used
a Gumstix processing module based on OMAP3530
architecture. It is an heterogeneous architecture (ARM
Cortex-A8 500 Mhz processor with power consump-
tion less than 300 mW, SIMD NEON integrated
coprocessor, DSP C64x processor and a 3D graphics
accelerator) that communicates via a WLAN connection
(802.11 g).
The WLAN connection is used only to control speed
Figure 2 Minitruck in action embedding a multi-sensor system.
and direction of the vehicle. In the future, a dedicated
algorithm to autonomous navigation will be implemented
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 6 of 14
http://jes.eurasipjournals.com/content/2012/1/5

and thus the WLAN connection will be used to achieve 17: hk ← (ui , vi )
only the system monitoring. A coprocessor (ATMega168) 18: Yk ← ẑk − hk
k
takes care of data acquisition. It controls the robot 19: Hk ← ∂h∂x
speed and its direction using two pulse-width modula- xk|k−1

tion (PWM) signals. It decodes signals coming from the 20: end if
odometers embedded in the rear wheels. It communi- 21: end for
cates with the main board using an I2C interface. This 22: ESTIMATION:
interface allows the main processor to retrieve odome- 23: Sk ← Hk Pk|k−1 HTk + Rk (see Eq (11))
ters data and to send instructions corresponding to speed 24: Kk ← Pk|k−1 Hk S−1k (see Eq (11))
and direction. 25: xk|k ← xk|k−1 + Kk Yk (see Eq (12))
To evaluate the designed system, an experiment was 26: Pk|k ← (I − Kk Hk )Pk|k−1 (see Eq (12))
achieved in a corridor of our lab. Frames have been 27: INITIALIZATION:
grabbed at 30 fps with 320×240 resolution. Odometer 28: for Each L ∈ χ do L: Aspiring new Landmark
data were sampled at 30 Hz. During the experiment, 29: Lobs ← ZMSSD(L) (see Eq (13))
references are periodically drawn on the ground by an 30: Update the particles weight according Lobs
embedded marker. (see [2])
31: Compute σdepth , depth
σdepth
32: if depth < then
Evaluation methodology and algorithm 33: Compute L, PL
implementation 34: append(xk|k−1 , L); append(Pk|k−1 , PL )
Our evaluation methodology is based on the identification 35: remove(χ, L)
of the processing tasks requiring a significant computing 36: end if
time. It is based on several steps: we analyze first the exe- 37: end for
cution time of tasks and their dependencies on the algo- 38: if Lack of Landmark then see [8]
rithm’s parameters. A threshold is fixed for each param- 39: append(χ, New Landmarks)
eter. The algorithm is then partitioned in order to have 40: end if
functional blocks (FBs) performing defined calculations. 41: end if
Each block is then evaluated to determine its process- 42: end while
ing time. Function blocks that require the most important
execution time are then optimized to reduce the global Prediction process
processing time. This phase updates the mobile robot position (xk|k−1 )
Algorithm 1 summarizes the main tasks of EKF-SLAM. according to its proprioceptive data acquired from
The algorithm is composed of two process: Prediction odometers (ϕl , ϕr ). The processing time of the predic-
and Correction. The correction process implements three tion process is constant. It just updates the 3D vector
tasks: matching, estimation and initialization. containing the robot pose and its 3×3 covariance matrix.
During the prediction step, the landmarks localization
and uncertainties do not change: landmarks are defined in
Algorithm 1 EKF-SLAM the global frame.
1: χ ← Ø List of Landmarks for initialization
2: Robot pose initialization
3: while localization is required do Correction process
4: DATA ← Sensors Data acquisition The processing time of the correction process is not con-
5: if DATA = (ϕl , ϕr ) then Odometer’s data stant. The following of this section studies the processing
6: PREDICTION time of each task of the process and their dependencies.
7: (δs, δθ) ← g(ϕl , ϕr ) (see Eq (5))
8: xk|k−1 ← f(xk−1|k−1 , δs, δθ) (see Eq (4))
∂f ∂f T
9: Pk|k−1 ← ∂x Pk−1|k−1 ∂x + Qk (see Eq (6)) Matching task Each landmark in the state vector must
10: else if DATA = Camera then be projected in the camera frame using the pinhole
11: FAST detector applied on the image model (see L. 2). The computing time of these projections
12: MATCHING: depends only on the number of landmarks in the state
13: for Each Landmark Ni ∈ xk|k−1 do vector (L. 2). For each projected landmark on the focal
14: ui , vi , τi ← pinhole(xk|k−1 , Ni ) (see Eq (8)) plane, ZMSSD matches an observation. Both the size of
15: if (ui , vi ) ∈ Camera Frame then the descriptor and the size of the searching area τ will
16: ẑk ← ZMSSD(τi , Ni ) (see Eq (13)) affect the computing time (see Equation (13)).
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 7 of 14
http://jes.eurasipjournals.com/content/2012/1/5

The processing time of the matching task depends on variables. For real-time implementation, it is important to
several parameters: get a constant, or at least a bounded computation time. To
solve this constraint we have to:
• The number of landmarks in the state vector.
• The number of visible landmarks on the focal plane. • set the maximum number of landmarks in the state
• The size of the descriptor. vector. The size of the state vector will be fixed.
• Both the localization uncertainty of the mobile robot Therefore, no dynamic memory allocation will be
and the landmarks. needed.
• set the maximum number of landmarks observed.
In practice, all the previously defined parameters should This keeps the computation time of the estimation
be set in order to bound the computing time. The first task constant using a fixed size matrix multiplication.
three parameters can be set by the users. The uncertainty • set the maximum number of landmarks being
depends on the followed path and cannot be bounded. initialized in order to bound the computation time of
the initialization task. Unfortunately, it will not be
Estimation task The estimation task uses the classical sufficient to keep the computation time of the
Kalman equations to update both the robot and land- initialization task constant due to its internal
marks uncertainties. The processing time of the estima- matching step.
tion task is time-consuming and depends on: • bound the computing time induced by the
uncertainties. The only solution to get a bounded
• The number of landmarks in the state vector.
global-processing-time is to set a maximum
• The number of landmarks observed.
execution time for the matching task. Due to the
constant processing time of the prediction and the
The size of the matrix and thus the computing cost
estimation task, the execution time of both the
of the matrix multiplication in the Equations (11) and
matching and initialization task can be bounded
(12) depend on the number of landmarks in the state
(33 ms - (tprediction + testimation )). We chose to use all
vector. Moreover, Equation (11) depends on the num-
possible images (30 fps).We set a maximum
ber of landmarks observed. As for the matching process,
execution time for the matching task. The algorithm
these parameters (size of the state vector and number of
proceeds in a way to match a maximum of landmarks
observations) should be bounded in order to achieve this
in a bounded time. The initialization task has a
estimation task in a constant computing time.
dynamic execution time depending on the real
processing time of the matching task and the number
Initialization task For each landmark under initializa- of landmarks being initialized. The lower bound of
tion, each particle (a feature depth hypothesis) is projected this dynamic execution allows at least a minimum
into the image, matched and its probability is re-weighted. number of landmarks to be initialized.
If there is a lack of landmarks under initialization, we
add aspiring new landmarks. The processing time of the
initialization task depends on: Map management
To keep the size of the state vector constant, we need
• The number of landmarks being initialized. to delete some landmarks when inserting new ones. The
• The size of the descriptor. new state vector includes new landmarks (whose initial-
• Both the localization uncertainty of the mobile robot ization has just been performed) and previously used
and the landmark. landmarks. Auat Cheein and Carelli [26] proposes an effi-
cient method to select landmarks for the estimation task.
The number of landmarks being initialized and the size It is based on the evaluation of the influence of a given
of the descriptors can be bounded. For each landmark feature on the convergence of the state covariance matrix.
being initialized, we have to update the probability of The method matches all possible landmarks and com-
each localization hypothesis using a matching process. As putes (I − Kk Hk ) from Equation (12). Unfortunately, we
for the matching task, the computing time depends on cannot implement it exactly as proposed by [26] due to
the localization uncertainty of the mobile robot and the the high computing time. We chose to add the landmarks,
landmarks. based on the previous estimation step, by selecting the
previous landmarks which have the best previous influ-
Thresholds definition ence on the convergence of the state covariance matrix.
Previous section shows that the computation time of At time k, we select the landmark which had the smallest
each task of the EKF-SLAM algorithm depends on many (I − Kk−1 Hk−1 ).
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 8 of 14
http://jes.eurasipjournals.com/content/2012/1/5

Table 1 Functional block partitioning

Functional block (FB) Description Line
1 Prediction The entire prediction process 7, 8, 9
2 FAST The FAST corner detector application 11
3 Landmark projection The projection of one landmark on the camera plane 14
4 ZMSSD-M The correlation computation between one candidate point of the 16
image and one descriptor during the Matching Task
5 Hi Hi computation for one observation 19
6 Estimation The entire estimation task 23 to 26
7 ZMSSD-I The correlation computation between one candidate point of the 29
image and one descriptor during the Initialization Task
8 Weight updating The update of the particle weight for the initialization step 30, 31
9 Addition of a new landmark The insertion of a new landmark under initialization 39

Functional block partitioning vector and the matched observations. The accuracy of
All the previously defined tasks do not have a fixed com- the localization depends monotonically on the number of
puting time, their computing time depends on the experi- processed landmarks.
ment. We have defined FBs which have a fixed computing The given EKF-SLAM (Algorithm 1) is processed
times to optimize the implementation. The computing sequentially on the embedded ARM processor operating
time of the FB do not depend on the experiment. Exper- at 500 MHz (no coprocessor is implemented). In the fol-
iments will only affect the number of iterations of some lowing, all times given correspond to times evaluated on
FBs(3,4,5,6,7,8 and 9). From the previous algorithm, we the embedded system using the ARM processor. The data
have defined 9 FBs and their runtimes are studied in below acquisition time is constant:
Table 1.
• The odometer data acquisition is achieved in 0.7 ms
Each FB has a fixed computing time and some FB can
occur more than one time (Landmark projection, ZMSSD, (this processing time is due to the I2C
Hi , Weight updating, Addition of a new landmark). communication with the Atmega168 processor).
• Each image acquisition takes 1.8 ms (due to USB data
transfer).
Processing time evaluation
As an application scenario, the robot moves over a square The prediction step does not require significant process-
of 6 m side. At the end of the trajectory, it joined ing time, it takes only 0.093 ms per iteration. As for the
the initial starting position. Using only odometers, the matching task, the estimation task cannot be achieved in
final localization has an error of 1.6 m. With the EKF- a constant processing time. Estimation task processing
SLAM algorithm, the localization has been significantly time depends on the total number of landmarks and
improved. The final error is approximately 0.4 m. EKF- the number of matched landmarks. Figure 4 shows the
SLAM includes all viewed landmarks in the state vector. processing time of the estimation task according to the
Indeed, the localization result depends on the number of number of landmarks in the state vector. The estimation
landmarks but the size of the state vector and the number task is entirely processed on the ARM processor (no
of observations must be bounded to achieve a bounded use of coprocessor). Obviously it will be impossible to
computing time. The overall accuracy of the EKF-SLAM take into account all the landmarks detected when the
depends on the number of the landmarks in the state algorithm is processed: the computation time will be

Figure 4 Processing time of the estimation task.

Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 9 of 14
http://jes.eurasipjournals.com/content/2012/1/5

higher than the 33 ms allowed. It is necessary to ﬁnd a Hardware–software optimization and

compromise between the number of landmarks and the improvements
processing time. OMAP3530 architecture description
The OMAP3530 is an heterogeneous architecture
designed by TI (Texas Instruments) and implements an
Experimental results ARM Cortex-A8 500 MHz processor, a NEON coproces-
An experiment was conducted to evaluate the processing sor with SIMD instructions, a DSP C64x processor and a
time of the different blocks of the algorithm (including 3D graphics accelerator.
tasks with unboundable processing time). For this exper- The NEON unit is similar to the MMX and SSE
iment, we set the size of the descriptor to 16 × 16 pixels extensions existing on an X86 processor. It is opti-
and we set the thresholds as follows: mized for Single Instruction Multiple Data (SIMD) oper-
ations. The NEON unit has two floating point pipelines,
• Maximum number of landmarks in the state an integer pipeline and a 128 bits load/store/permute
vector: 25. pipeline. An efficient implementation on the SIMD
• Maximum number of observed landmarks: 25. NEON architecture improves the processing time. NEON
• Maximum number of landmarks being initialized: 20. instructions perform “Packed SIMD” processing as
follows:
First, we can analyze the runtime of the 8 previously
defined FBs of the algorithm. We have used the integrated • Registers are considered as vectors of the same data
cycle counter register (CCNT) of the ARM processor to type elements
compute the processing time of each FB. The prediction • Data types can be: signed/unsigned 8, 16, 32, 64-bits
process (FB1) occurs in 0.093 ms. Table 2 summarizes, for or single precision floating point
the other FBs, the processing time per iteration, the mean • Instructions perform the same operation on multiple
of the number of iterations and the mean of the processing data simultaneously as shown in Figure 5. The
time per correction process. The estimation task could not number of simultaneous operations depends on the
be processed in some iterations of the correction process, data type: NEON supports up to 16 operations at the
especially when there is no matched landmark. same time using 8-bits data.
The mean processing time by frame is approximately
80.8 ms which corresponds to the sum of all processing SIMD optimization results
times: prediction process (FB1) and correction process In the Algorithm 1, the time-consuming FBs are: the
(FB2 to FB9). The processing time of the estimation task estimation block (FB6), the initialization blocks (FB7,
(FB6) is approximately 70.5 ms and it represents about FB8 and FB9), the FAST detector block (FB2) and the
87% of the global processing time. The FAST detector ZMSSD-M block (FB4). FAST detector is already an opti-
(FB2) represents 3.4 ms. The ZMSSD-M task (FB4) takes mized instance using machine learning [21]. Moreover,
2.63 ms per correction process. Finally, the initialization FAST has been already implemented on an FPGA based
task (FB7, FB8 and FB9) takes 3.9 ms. These six FBs rep- architecture [27]. We chose to optimize the other FBs.
resent 99.6% of the global processing time. We focused on The matching task computes ZMSSD which computes
an efficient implementation of these FBs to enhance the the image correlation. It performs the same operation
global processing time. (addition, subtraction, multiplication and comparison) on

Table 2 Processing time of the correction process FBs on the main processor (ARM)
Functional block (FB) Processing time per Mean of the number of Mean of the processing
iteration (μs) iterations per correction time per correction
process process (μs)
2. FAST 3400 1 3400
3. Landmark projection 9 19 180
4. ZMSSD-M 11.29 233 2630
5. Hi 14.5 4.5 66
6. Estimation 88845 0.8 70568
7. ZMSSD-I 11.29 123 1388
8. Weight updating 638 4.0 2586
9. Addition of a new landmark 103 0.18 18
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 10 of 14
http://jes.eurasipjournals.com/content/2012/1/5

Figure 5 Data processing in a NEON architecture.

8 bits data. The computation of the ZMSSD can be opti- to avoid the use of two loops. Formally the ZMSSD is
mized using the SIMD NEON architecture. The estima- written as:
tion task is based on floating point matrix multiplication,
it could efficiently be optimized using the SIMD NEON
coprocessor (the ARM Cortex A8 does not include any ZMSSD = ((d − md ) − (im − mi ))2 (15)
i,j
floating point unit (FPU)). The initialization FBs will be
studied at Section “Parallel implementation on a DSP
where d = d(i, j) and im = im(px + i − des des
2 , py + j − 2 )
processor”.
By expanding the ZMSSD, we obtain:

ZMSSD (FB4) ZMSSD = ((d−md )2 −2(d−md )(im−mi )+(im−mi )2 )
The EKF-SLAM matches features using ZMSSD. ZMSSD i,j
is computed for each landmark using Equation 2. We (16)
chose to use a descriptor with 16×16 8-bits pixels size due
to the efficiency of SIMD NEON architecture to deal with = (d2 −2d.md +m2d −2d.im+2d.mi + 2md .im
128/64 bits vectors. i,j

− 2md .mi + im2 − 2im.mi + m2i ) (17)

Basic implementation The basic implementation of the d(k,l)
Using md = kl des×des and mi =
ZMSSD function block computes the means of the pixel
im(px +k− des ,py +l− des
)
values in a window mi (md can be precalculated when kl
2
des×des
2
, we simplify the sum:
the landmark is detected). Then the ZMSSD (ZMSSD) is
d(k, l)
computed using loops (Algorithm 2). md = (18)
des × des
i,j i,j kl
Algorithm 2 Basic ZMSSD
1: mi ← 0 = d(k, l) (19)
2: ZMSSD ← 0 kl
im(px + k − des
+l− des
3: for Each i ∈[ 0; des − 1], j ∈[ 0; des − 1] do 2 , py 2 )
mi ← mi + im(px + i − des mi = (20)
2 , py + j − 2 )
des
4: des × des
i,j i,j kl
5: end for
6: mi ← mi /(des × des) des des
= im px + k − , py + l − (21)
7: for Each i ∈[ 0; des − 1],j ∈[ 0; des − 1] do kl
2 2
8: ZMSSD ← ZMSSD + i,j ((d(i, j) − md ) −
(im(px + i − des
+j− des The equation becomes:
2 ) − mi ))
2
2 , py
9: end for ⎡ ⎛ ⎞2 ⎛ ⎞2 ⎤
⎢ ⎥
This implementation takes 12.60 μs on the ARM ZMSSD = ⎣2 dim − ⎝ d⎠ − ⎝ im⎠ ⎦
processor. i,j i,j i,j

/(des × des) + d2 + im2 − 2 dim
Eﬃcient scalar implementation The second implemen- i,j i,j i,j
tation aims to modify the calculation of ZMSSD in order (22)
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 11 of 14
http://jes.eurasipjournals.com/content/2012/1/5

Using the notation: 8: V descriptor ← load8 (d(i, j))

9: VSi ← VSi + V image
• Sd = i,j d(i, j) the sum of the descriptor pixels (this
10: VSSi ← VSSi + V image×V image
sum can
be precalculated). 11: VSSi ← VSSi + V image×V image
• Si = i,j im(px + i − des
2 , py + j − 2 ) the sum of the
des
12: end for
image pixels. 13: Si ← sum(VSi) Sums the component of a vectors
• SSi = i,j im(px +i− des2 , py +j− 2 )×im(px +i− 2 ,
des des
14: SSi ← sum(VSSi)
py + j −
2 ) the sum of squared image pixel values.
des
15: Sdi ← sum(VSdi)
• SSd = i,j d(i, j)×d(i, j) the sum of the squared 16: ZMSSD ←
descriptor pixel values (this sum can be (((2Sd×Si) − Sd2 − Si2 )/256) + SSi + SSd − 2Sdi
precalculated).

• Sdi = i,j d(i, j)im(px + i − des des
2 , py + j − 2 ) the
This instance uses 8 pixels at time. SIMD NEON archi-
sum of the product of the descriptor pixels and the tecture allows computing eight addition or eight multipli-
image pixels. cation simultaneously. The processing time of the vector
implementation decreases to 1.27 μs.
The final equation is:
ZMSSD =[ ((2Sd×Si) − Sd2 − Si2 )/(des × des)] Computation time results Table 3 summarizes the
+ SSi + SSd − 2Sdi (23) processing time of the three different implementations
of the ZMSSD functional block. The SIMD implemen-
The implementation of Algorithm 2 becomes
tation is approximately 10 times faster than a basic
Algorithm 3:
implementation.
Algorithm 3 Efficient scalar ZMSSD Estimation (FB6)
1: Si ← 0 ARM Cortex A8 do not integrate a FPU. That’s why
2: SSi ← 0 the processing time of the estimation FB is significant
3: Sdi ← 0 (Figure 4). To optimize the matrix multiplication, we
4: for Each i ∈[ 0; des − 1], j ∈[ 0; des − 1] do have used the EIGEN3 library [28] which provides SIMD
5: Si ← Si + im(px + i − des 2 , py + j − 2 )
des
NEON optimized functions. Figure 6 presents the results
6: SSi ← SSi + im(px + i − 2 , py + j − des
des
2 ) of the processing-time of the estimation task implemented
×im(px + i − des2 , p y + j − des
2 ) on the ARM processor (non-optimized task) and those
7: Sdi ← Sdi + im(px + i − des 2 , py + j − 2 )×d(i, j)
des using the SIMD NEON coprocessor (optimized task). The
8: end for processing time of the optimized task is approximately
9: ZMSSD ← eight times faster than those of the non-optimized one.
(((2Sd×Si)−Sd2 −Si2 )/(des×des))+SSi+SSd−2Sdi This gain is due to the lack of the FPU in the Cor-
tex A8 and to the efficiency of the NEON to evaluate
In this instance, we use only one loop. This reduces a multiply and accumulate instruction in only one CPU
memory access. Using this implementation, the comput- cycle.
ing time decrease from 12.60μs to 11.29μs.
Parallel implementation on a DSP processor
Vector implementation SIMD NEON architecture Digital signal processors (DSP) are usually used in vision
allows vector processing and performs the same operation systems [29]. They integrate a number of resources that
on all the vector processing-units. We have implemented serve to enhance image processing versatility. The use of
a vectorized instance of the ZMSSD functional block as digital signal processing with data sharing ensures that
follows (Algorithm 4): image processing will be achieved in parallel. With a DSP
based image processing, it is possible to parallelize the
Algorithm 4 SIMD vectorized ZMSSD
1: V 8x8 V image ← 0 V8x8: 8×8 bits vector
Table 3 ZMSSD processing time
2: V 8x8 V descriptor ← 0
3: V 16x8 VSi ← 0 V16x8: 8×16 bits vector Processing time Percentage of
the basic
4: V 32x4 VSSi ← 0 V32x4: 4×32 bits vector implementation
5: V 32x4 VSdi ← 0
Basic implementation 12.60 100
6: for Each i ∈[ 0; des − 1], j = 0, 8 do Scalar implementation 11.29 89.6
7: V image ← load8 (im(px + i − des 2 , py + j − 2 ))
des
SIMD implementation 1.27 10.8
load 8 pixels
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 12 of 14
http://jes.eurasipjournals.com/content/2012/1/5

Figure 6 Processing time of the estimation task on the ARM and NEON coprocessor.

EKF-SLAM algorithm on the multiprocessor architecture ARM processor writes the image (320×240 pixels), the
(ARM, NEON and DSP processors). This allows enhanc- robot position and its uncertainty on the shared mem-
ing the global processing time especially when we consider ory. Data transfer between the ARM processor and DSP
to operate in real-time constraints. The landmarks match- processor for a 320×240 gray image is done in one mil-
ing (FB3 to FB5) and the robot position estimation (FB6) lisecond. When the initialization of a landmark is com-
tasks must be processed sequentially. Fortunately, the pleted, the DSP processor returns the position and the
initialization tasks (FB7, FB8 and FB9) can run simultane- uncertainty of possible new landmarks.
ously with the matching and estimation tasks.
Rethinking the implementation to obtain a parallel Global results
implementation, the instance of Algorithm 1 with block We have improved the EKF SLAM implementation using
partitioning leads to the Algorithm 5. the SIMD NEON coprocessor and the DSP processor.
We have implemented the matching and estimation tasks
on a NEON coprocessor and the initialization tasks on a
Algorithm 5 Multiprocessed EKF-SLAM DSP processor. FAST corner detector is already an opti-
1: Robot pose initialization mized algorithm using machine learning [21]. For the
2: while localization is required do latest experiment, we set the same thresholds as Section
3: if DATA = Odometers then “Experimental results”.
4: PREDICTION Table 4 summarizes the processing time per iteration
5: else if DATA = Camera then and the mean processing time per Frame of each FB. The
6: FAST detector computing time of the initialization task (blocks 7, 8 and
7: ARM Processor MATCHING and ESTIMATION 9) implemented on the DSP processor is approximately 4.0
(FB 3, 4, 5 and 6) ms. The DSP processor computes the initialization task
8: DSP Processor INITIALIZATION (FB 7, 8 and 9) while the ARM-NEON processors compute the predic-
9: end if tion, FAST detection, matching and estimation tasks.
10: end while With this implementation and since the processing-time
of the initialization task (4.0 ms) is smaller compared
The architecture of the OMAP3530 can interface the to the sum of the processing times of the matching and
ARM and DSP processors using a shared memory. estimation tasks (13.0 ms for blocks 3, 4, 5 and 6), the
Figure 7 shows the data transfer mechanism using a overall computing time is reduced to the sum of the
shared DDR memory area. For each acquired image, the processing-times of the prediction process (0.093 ms),

Figure 7 ARM-DSP interface with a shared memory.

Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 13 of 14
http://jes.eurasipjournals.com/content/2012/1/5

Table 4 FBs processing times on ARM, NEON and DSP processors

Nonoptimized implementation (μs) ARM only Optimized implementation (μs)
Functional bloc (FB) Processing time Mean processing Processing time Mean processing Processing unit
per iteration time per frame per iteration time per frame
1. Prediction 93 93 93 93 ARM
2. FAST 3400 3400 3400 3400 ARM
3. Landmark projection 9 180 9 180 ARM
4. ZMSSD-M 11.29 2630 1.27 295 NEON
5. Hi 14.5 66 14.5 66 ARM
6. Estimation 88845 70568 15690 12552 NEON
Initialization task (FB7, 8 and 9) 3992 3922 4025 4025 DSP
Total – 80859 – 16586 –

the FAST detector (3.4 ms), the matching and estima- Other future developments will be centered around a
tion tasks (13.0 ms). The mean processing time per frame Hardware–software co-design to improve the system per-
with the optimized implementation is 17.6 ms (we add 1 formances implementing a system-on-chip with a field
ms for the ARM/DSP data transfer) whereas the nonop- programmable gate array (FPGA). The use of a con-
timized implementation has a processing time of 80.85 figurable architecture accelerates greatly the design and
ms. The optimized processing time represents 22% of the validation of a proof of real-time and system-on-chip
nonoptimized one. The processing time has been reduced concept.
by 78%.
Competing interests
The authors declare that they have no competing interests.
Conclusion
Author details
This article proposed an efficient implementation of the 1 Univ Paris-Sud, CNRS, Institut d’Electronique Fondamentale, F-91405 Orsay,

EKF-SLAM algorithm on a multiprocessor architecture. France. 2 IFSTTAR, IM, LIVIC, F-78000 Versailles, France.
The overall accuracy of the EKF-SLAM depends on
Received: 24 November 2011 Accepted: 16 June 2012
the number of the landmarks in the state vector and Published: 18 July 2012
the matched observations. Both are linked to the time
allowed to the embedded architecture to compute the References
robot pose. Based on the application constraints (real- 1. M Dissanayake, P Newman, S Clark, H Durrant-Whyte, M Csorba, A
solution to the simultaneous localization and map building (SLAM)
time localization) and an evaluation methodology, we problem. IEEE Trans. Robot. Autom. 17, pp. 229–241 (2001)
have implemented the algorithm in consideration of the 2. A Davison, I Reid, N Molton, O Stasse, MonoSLAM: real-time single camera
underlying hardware architecture. A runtime analyses SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29, pp. 1052–1067 (2007)
3. M Montemerlo, S Thrun, D Koller, B Wegbreit, in National Conference on
shows that the FBs and the initialization task represents Artificial Intelligence, FastSLAM: a factored solution to the simultaneous
99.6% of the global processing time. We have used an localization and mapping problem. Orlando, Florida, USA, 2002, pp.
optimized instance of the FAST detector. Two FBs (in 593–598
4. J Folkesson, HI Christensen, in IEEE International Conference on Robotics
matching and estimation tasks) have been optimized on and Automation, Graphical SLAM-a self-correcting map. LA, New Orleans,
an SIMD NEON architecture. The initialization task has USA, 2004, pp. 383–390
been parallelized on a DSP processor. This optimization 5. A Eliazar, R Parr, in International Joint Conference on Artificial Intelligence.
DP-SLAM: fast, robust simultaneous localization and mapping without
required a modification of the algorithm implementation. predetermined landmarks. vol. 18. Acapulco, Mexico, 2003, pp. 1135–1142
Using the optimized implementation, the global process- 6. S Thrun, Probabilistic robotics. Assoc. Comput. Mach. 45(3), pp. 52–57
ing time was reduced by a factor equal to 4.7. The results (2002)
7. C Brenneke, O Wulf, B Wagner, in IEEE/RSJ International Conference on
demonstrate that an embedded systems (with a low-cost Intelligent Robots and Systems, Using 3d laser range data for slam in
multiprocessor architecture) can operate under real-time outdoor environments. Las Vegas, Nevada, USA, 2003, pp. 188–193
constraints, if the software implementation is designed 8. A Prusak, O Melnychuk, H Roth, I Schiller, Pose estimation and map
carefully. To scale with larger environment, we are going building with a time-of-flight-camera for robot navigation. Int. J. Intell.
Syst. Technol. Appl. 5(3), pp. 355–364 (2008)
to include an approach of local/global mapping as pro- 9. F Abrate, B Bona, M Indri, in European Conference on Mobile Robots,
posed by [30]. Using this approach, we will be able to Experimental EKF-based SLAM for mini-rovers with IR sensors only.
map larger environment. The map joining system will be Freiburg, Germany, 2007
10. T Yap, C Shelton, in IEEE International Conference on Robotics and
implemented on the GPU coprocessor integrated on the Automation, SLAM in large indoor environments with low-cost, noisy, and
OMAP3530. sparse sonars. Kobe, Japan, 2009, pp. 1395–1401
Vincke et al. EURASIP Journal on Embedded Systems 2012, 2012:5 Page 14 of 14
http://jes.eurasipjournals.com/content/2012/1/5

11. C Giﬀord, R Webb, J Bley, D Leung, M Calnon, J Makarewicz, B Banz, A

Agah, in IEEE International Conference on Technologies for Practical Robot
Applications, Low-cost multi-robot exploration and mapping. Woburn,
Massachusetts, USA, 2008, pp. 74–79
12. S Magnenat, V Longchamp, M Bonani, P Rétornaz, P Germano, H Bleuler, F
Mondada, in IEEE International Conference on Robotics and Automation,
Affordable SLAM through the co-design of hardware and methodology,
Anchorage, Alaska, 2010, pp. 5395–5401
13. M Montemerlo, S Thrun, D Koller, B Wegbreit, in International Joint
Conference on Artificial Intelligence, FastSLAM 2.0: An improved particle
filtering algorithm for simultaneous localization and mapping that
provably converges. Acapulco, Mexico, 2003, pp. 1151–1156
14. S Rezaei, J Guivant, E Nebot, in IEEE/RSJ International Conference on
Intelligent Robots and Systems, Car-like robot path following in large
unstructured environments. Las Vegas, Nevada, USA, 2003, pp. 2468–2473
15. C Schröter, H Böhme, H Gross, in European Conference on Mobile Robots,
Memory-efficient gridmaps in Rao-Blackwellized particle filters for SLAM
using sonar range sensors. Freiburg, Germany, 2007, pp. 138–143
16. K Konolige, J Augenbraun, N Donaldson, C Fiebig, P Shah, in IEEE
International Conference on Robotics and Automation, A low-cost laser
distance sensor. Pasadena, California, USA, 2008, pp. 3002–3008
17. P Pirjanian, N Karlsson, L Goncalves, E Di Bernardo, Low-cost visual
localization and mapping for consumer robotics. Indust. Robot. 30(2),
pp. 139–144 (2003)
18. E Seignez, M Kieffer, A Lambert, E Walter, T Maurin, Real-time
bounded-error state estimation for vehicle tracking. IEEE Int. J. Robot. Res.
28, pp. 34–48 (2009)
19. R Siegwart, I Nourbakhsh, Introduction to Autonomous Mobile Robots (The
MIT Press, London, 2004)
20. B Vincke, A Elouardi, A Lambert, in IEEE/SICE International Symposium on
System Integration, Design and evaluation of an embedded system based
SLAM applications. Sendai, Japan, 2010, pp. 224–229
21. E Rosten, R Porter, T Drummond, Faster and better: a machine learning
approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 32,
pp. 105–119 (2009)
22. A Davison, in IEEE International Conference on Computer Vision, Real-time
simultaneous localisation and mapping with a single camera. Nice,
France, 2003, pp. 1403–1410
23. R Munguia, A Grau, in European Conference on Mobile Robots, Freiburg,
Germany, 2007, pp. 1–6
24. E Seignez, A Lambert, T Maurin, in IEEE International Conference On
Information And Communication Technologies: From Theory To Application,
An experimental platform for testing localization algorithms. Damascus,
Syria, 2006, pp. 748–753
25. A Elouardi, S Bouaziz, A Dupret, L Lacassagne, JO Klein, R Reynaud, in
International Journal on Computer Science and Applications, A smart
architecture for low-level image computing. 2008, pp. 1–19
26. F Auat Cheein, R Carelli, Analysis of different feature selection criteria
based on a covariance convergence perspective for a SLAM algorithm.
Sensors. 11, pp. 62–89 (2010)
27. M Kraft, A Schmidt, A Kasinski, in International Conference on Computer
Vision Theory and Applications, High-speed image feature detection using
FPGA implementation of fast algorithm. Funchal, Madeira, Portugal, 2008,
pp. 174–179
28. Eigen, (2012). http://eigen.tuxfamily.org/
29. K Gunnam, D Hughes, J Junkins, N Kehtarnavaz, A vision-based DSP
embedded navigation sensor. IEEE Sens. J. 2(5), pp. 428–442 (2002)
30. P Piniés, J Tardós, Large-scale slam building conditionally independent
Submit your manuscript to a
local maps: application to monocular vision. IEEE Trans. Robot. 24(5), journal and benefit from:
pp. 1094–1106 (2008)
7 Convenient online submission
doi:10.1186/1687-3963-2012-5 7 Rigorous peer review
Cite this article as: Vincke et al.: Real time simultaneous localization and 7 Immediate publication on acceptance
mapping: towards low-cost multiprocessor embedded systems. EURASIP 7 Open access: articles freely available online
Journal on Embedded Systems 2012 2012:5.
7 High visibility within the field
7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

Robotics SLAM For Dummies - Riisgaard and Blas (MIT OCW)
No ratings yet
Robotics SLAM For Dummies - Riisgaard and Blas (MIT OCW)
127 pages
A. 2. The Union Executive The President and The Vice-President
100% (2)
A. 2. The Union Executive The President and The Vice-President
15 pages
Simultaneous Localization and Mapping SLAM Using RTAB Map
No ratings yet
Simultaneous Localization and Mapping SLAM Using RTAB Map
7 pages
CoreSLAM A SLAM Algorithm in Less Than 200 Lines o
No ratings yet
CoreSLAM A SLAM Algorithm in Less Than 200 Lines o
7 pages
Slam05 Ekf Slam
No ratings yet
Slam05 Ekf Slam
67 pages
A Ya Soul I Man Stu Dien Arbeit Report
No ratings yet
A Ya Soul I Man Stu Dien Arbeit Report
58 pages
05 Ekf Slam
No ratings yet
05 Ekf Slam
66 pages
EmbIntellSyst AutonomNavig 20210405
No ratings yet
EmbIntellSyst AutonomNavig 20210405
39 pages
slam_paper
No ratings yet
slam_paper
23 pages
SLAM Algorithm for Omni-Directional Robots
No ratings yet
SLAM Algorithm for Omni-Directional Robots
23 pages
20 Slam
No ratings yet
20 Slam
20 pages
A sliding window filter for SLAM
No ratings yet
A sliding window filter for SLAM
17 pages
Electronics 12 01113
No ratings yet
Electronics 12 01113
21 pages
GSLAM - A General SLAM Framework and Benchmark
No ratings yet
GSLAM - A General SLAM Framework and Benchmark
11 pages
Implementationofslam
No ratings yet
Implementationofslam
7 pages
2021 - Review of SLAM Algorithms For Indoor Mobile Robot With LIDAR and RGB-D Camera - Kolhatkar
No ratings yet
2021 - Review of SLAM Algorithms For Indoor Mobile Robot With LIDAR and RGB-D Camera - Kolhatkar
13 pages
Wireless Communications and Mobile Computing - 2020 - Ullah - Simultaneous Localization and Mapping Based On Kalman Filter
No ratings yet
Wireless Communications and Mobile Computing - 2020 - Ullah - Simultaneous Localization and Mapping Based On Kalman Filter
12 pages
V57 - Journal Manuscript Format MS Office 2007-2019
No ratings yet
V57 - Journal Manuscript Format MS Office 2007-2019
19 pages
Many Slides From Autonomous Systems Lab (ETH Zürich) :: Simultaneous Localization and Mapping
No ratings yet
Many Slides From Autonomous Systems Lab (ETH Zürich) :: Simultaneous Localization and Mapping
44 pages
Graph Slam
No ratings yet
Graph Slam
28 pages
Mathematical Problems in Engineering - 2014 - Ni - A Bioinspired Neural Model Based Extended Kalman Filter For Robot SLAM
No ratings yet
Mathematical Problems in Engineering - 2014 - Ni - A Bioinspired Neural Model Based Extended Kalman Filter For Robot SLAM
11 pages
IntegratingArtificialIntelligencewithSLAMTechnologyforRoboticNavigationandLocalizationinUnknownEnvironments
No ratings yet
IntegratingArtificialIntelligencewithSLAMTechnologyforRoboticNavigationandLocalizationinUnknownEnvironments
11 pages
generalized-architecture-for-simultaneous-localization-autocalib
No ratings yet
generalized-architecture-for-simultaneous-localization-autocalib
7 pages
Slam
No ratings yet
Slam
16 pages
icp_slam
No ratings yet
icp_slam
5 pages
Monocular Vision-Based Robot Localization and Targ
No ratings yet
Monocular Vision-Based Robot Localization and Targ
12 pages
Hybrid Filter Based Simultaneous Localization and Mapping For A Mobile Robot
No ratings yet
Hybrid Filter Based Simultaneous Localization and Mapping For A Mobile Robot
10 pages
A Review of Recent Developments in Simultaneous Localization and Mapping
No ratings yet
A Review of Recent Developments in Simultaneous Localization and Mapping
6 pages
Square Root SAM: Frank Dellaert College of Computing Georgia Institute of Technology
No ratings yet
Square Root SAM: Frank Dellaert College of Computing Georgia Institute of Technology
8 pages
Localization and Navigation Analysis of Mobile Robot Based
No ratings yet
Localization and Navigation Analysis of Mobile Robot Based
6 pages
Introduction To Mobile Robotics: SLAM: Simultaneous Localization and Mapping
No ratings yet
Introduction To Mobile Robotics: SLAM: Simultaneous Localization and Mapping
48 pages
Implementation of A Simultaneous Localization and Mapping Algorithm in An Autonomous Robot
No ratings yet
Implementation of A Simultaneous Localization and Mapping Algorithm in An Autonomous Robot
31 pages
A Line Feature Based SLAM With Low Grade Range Sensors Using Geometric Constraints and Active Exploration For Mobile Robot
No ratings yet
A Line Feature Based SLAM With Low Grade Range Sensors Using Geometric Constraints and Active Exploration For Mobile Robot
15 pages
New PPTX Presentation
No ratings yet
New PPTX Presentation
7 pages
Eru 2010 SLAM Paper
No ratings yet
Eru 2010 SLAM Paper
3 pages
A Survey of Simultaneous Localization and Mapping With An
No ratings yet
A Survey of Simultaneous Localization and Mapping With An
17 pages
Real-Time Radar SLAM: 11. Workshop Fahrerassistenzsysteme Und Automatisiertes Fahren
No ratings yet
Real-Time Radar SLAM: 11. Workshop Fahrerassistenzsysteme Und Automatisiertes Fahren
10 pages
Montemerlo.fastslam Tr
No ratings yet
Montemerlo.fastslam Tr
6 pages
Simultaneous Localization and Mapping For Autonomous Robot Navigation
No ratings yet
Simultaneous Localization and Mapping For Autonomous Robot Navigation
5 pages
Simultaneous Localization and Mapping
No ratings yet
Simultaneous Localization and Mapping
66 pages
An Evaluation of 2D SLAM Techniques Availablein Robot Operating System
No ratings yet
An Evaluation of 2D SLAM Techniques Availablein Robot Operating System
6 pages
FPGA Implementation of A Sequential Extended Kalman Filter Algorithm Applied To Mobile Robotics Localization Problem
No ratings yet
FPGA Implementation of A Sequential Extended Kalman Filter Algorithm Applied To Mobile Robotics Localization Problem
4 pages
Applsci 09 02105 PDF
No ratings yet
Applsci 09 02105 PDF
17 pages
Zhang 2016
No ratings yet
Zhang 2016
6 pages
Research and Implementation of SLAM Based On LIDAR For Four-Wheeled Mobile Robot
No ratings yet
Research and Implementation of SLAM Based On LIDAR For Four-Wheeled Mobile Robot
5 pages
Slam
No ratings yet
Slam
11 pages
Simplistic Sonar Based SLAM For Low-Cost Unmanned Aerial Quadrocopter Systems
No ratings yet
Simplistic Sonar Based SLAM For Low-Cost Unmanned Aerial Quadrocopter Systems
9 pages
A Linear Approximation For Graph-Based Simultaneous Localization and Mapping
No ratings yet
A Linear Approximation For Graph-Based Simultaneous Localization and Mapping
8 pages
A Flexible and Scalable SLAM System With Full 3D Motion Estimation
No ratings yet
A Flexible and Scalable SLAM System With Full 3D Motion Estimation
6 pages
Paz Submap Slam
No ratings yet
Paz Submap Slam
7 pages
Implementation of SLAM On Mobile Robots and Stitching of The Generated Maps
No ratings yet
Implementation of SLAM On Mobile Robots and Stitching of The Generated Maps
13 pages
Localization and Mapping Aproximation For Autonomous Ground Platforms, Implementing SLAM Algorithms
No ratings yet
Localization and Mapping Aproximation For Autonomous Ground Platforms, Implementing SLAM Algorithms
5 pages
Research Proposal: 2.1 Taxonomy of SLAM Problem
No ratings yet
Research Proposal: 2.1 Taxonomy of SLAM Problem
6 pages
FPGA Implementation of A Sequential Extended Kalman Filter Algorithm Applied To Mobile Robotics Localization Problem
No ratings yet
FPGA Implementation of A Sequential Extended Kalman Filter Algorithm Applied To Mobile Robotics Localization Problem
4 pages
HTM Guide 3
No ratings yet
HTM Guide 3
95 pages
Robocentric Map Joining: Improving The Consistency of EKF-SLAM
No ratings yet
Robocentric Map Joining: Improving The Consistency of EKF-SLAM
9 pages
Bharat Sevak Samaj
No ratings yet
Bharat Sevak Samaj
14 pages
Toyota Fortuner (Em21N0E) : Junction Connector (CAN)
No ratings yet
Toyota Fortuner (Em21N0E) : Junction Connector (CAN)
1 page
iot-220112132928
No ratings yet
iot-220112132928
31 pages
Simultaneous Localization and Mapping ! SLAM: York University, Toronto, Ontario June 20, 2005
No ratings yet
Simultaneous Localization and Mapping ! SLAM: York University, Toronto, Ontario June 20, 2005
12 pages
FileList
No ratings yet
FileList
21 pages
Insurance Law Quiz
100% (2)
Insurance Law Quiz
4 pages
Monocular Lasaer
No ratings yet
Monocular Lasaer
5 pages
SOP For Tracing Missing Children-24.4.17
No ratings yet
SOP For Tracing Missing Children-24.4.17
36 pages
Sheet 7 - Applications To Bernolli Equation (2) - Cavitation and NPSH
No ratings yet
Sheet 7 - Applications To Bernolli Equation (2) - Cavitation and NPSH
4 pages
Artificial Vision For Autonomous Driving
No ratings yet
Artificial Vision For Autonomous Driving
7 pages
Summer Camp Parent Handbook 2024 2
No ratings yet
Summer Camp Parent Handbook 2024 2
10 pages
CDX-FM1257 - FM1259
No ratings yet
CDX-FM1257 - FM1259
62 pages
Business Ethics Activity Case Study
No ratings yet
Business Ethics Activity Case Study
1 page
Instant Ebooks Textbook Good Manufacturing Practices For Pharmaceuticals Graham P. Bunn (Editor) Download All Chapters
100% (2)
Instant Ebooks Textbook Good Manufacturing Practices For Pharmaceuticals Graham P. Bunn (Editor) Download All Chapters
49 pages
Dampak Covid-19 Terhadap Sistem Kesehatan Indonesia Di Masa Depan Covid-19 Impact On The Future Indonesia Health System
No ratings yet
Dampak Covid-19 Terhadap Sistem Kesehatan Indonesia Di Masa Depan Covid-19 Impact On The Future Indonesia Health System
12 pages
CHRISTIAN LAW OF SUCCESSION Lenin Fam Law
100% (1)
CHRISTIAN LAW OF SUCCESSION Lenin Fam Law
2 pages
spathiphyllum-cultivation-manual-2019_en
No ratings yet
spathiphyllum-cultivation-manual-2019_en
5 pages
Cosonic (Radial Thru-Hole) RI Series
No ratings yet
Cosonic (Radial Thru-Hole) RI Series
2 pages
Chapter 4: Income Statement and Related Information: Intermediate Accounting, 10th Edition Kieso, Weygandt, and Warfield
No ratings yet
Chapter 4: Income Statement and Related Information: Intermediate Accounting, 10th Edition Kieso, Weygandt, and Warfield
39 pages
G20 Presentation
No ratings yet
G20 Presentation
14 pages
Whythedrive Theutilitaria Nand Hedonic Benefits of Self-Expression Through Consumption
No ratings yet
Whythedrive Theutilitaria Nand Hedonic Benefits of Self-Expression Through Consumption
6 pages
Transaksi Interperusahaan - Obligasi Upstream
No ratings yet
Transaksi Interperusahaan - Obligasi Upstream
11 pages
Martin Njuguna Auditing 2 Review Quiz
No ratings yet
Martin Njuguna Auditing 2 Review Quiz
8 pages
Levels of Programming Languages
No ratings yet
Levels of Programming Languages
4 pages
FAQ's For Online Application
No ratings yet
FAQ's For Online Application
6 pages
Graphing Grouped Data: Single-Valued Classes
No ratings yet
Graphing Grouped Data: Single-Valued Classes
14 pages
Sagot Ko Sa Ethics
No ratings yet
Sagot Ko Sa Ethics
1 page
36 Dlplus
No ratings yet
36 Dlplus
2 pages
Full Assignment
No ratings yet
Full Assignment
20 pages
Technology & Market Assessment in Iraq PDF
No ratings yet
Technology & Market Assessment in Iraq PDF
14 pages
Mary Gage's Resume
No ratings yet
Mary Gage's Resume
2 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet

Real Time SLAM

Uploaded by

Real Time SLAM

Uploaded by

Vincke et al.

EURASIP Journal on Embedded Systems 2012, 2012:5

RESEARCH Open Access

Real time simultaneous localization and

Introduction be integrated in most of embedded systems in commercial

where: The pose of a landmark in the camera frame is deﬁned

Furthermore the work of Munguia and Grau [23] shows

Table 1 Functional block partitioning

Figure 4 Processing time of the estimation task.

higher than the 33 ms allowed. It is necessary to ﬁnd a Hardware–software optimization and

Figure 5 Data processing in a NEON architecture.

− 2md .mi + im2 − 2im.mi + m2i ) (17)

Using the notation: 8: V descriptor ← load8 (d(i, j))

Figure 7 ARM-DSP interface with a shared memory.

Table 4 FBs processing times on ARM, NEON and DSP processors

11. C Giﬀord, R Webb, J Bley, D Leung, M Calnon, J Makarewicz, B Banz, A

Submit your next manuscript at 7 springeropen.com

You might also like