0% found this document useful (0 votes)
32 views18 pages

A review of visual SLAM for robotics_ evolution, properties, and future applications - frobt-11-1347985

Uploaded by

huseyinumut2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views18 pages

A review of visual SLAM for robotics_ evolution, properties, and future applications - frobt-11-1347985

Uploaded by

huseyinumut2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

TYPE Review

PUBLISHED 10 April 2024


DOI 10.3389/robt.2024.1347985

A review o visual SLAM or


OPEN ACCESS robotics: evolution, properties,
EDITED BY
Patrick Sebastian,
University o Technology Petronas, Malaysia
and uture applications
REVIEWED BY
Chinmay Chakraborty, Basheer Al-Tawil*, Thorsten Hempel, Ahmed Abdelrahman and
Birla Institute o Technology, Mesra, India
Kishore Bingi, Ayoub Al-Hamadi
University o Technology Petronas, Malaysia
Institute or Inormation Technology and Communications, Otto-von-Guericke-University,
Irraivan Elamvazuthi,
Magdeburg, Germany
University o Technology Petronas, Malaysia
Edmanuel Cruz,
Technological University o Panama, Panama

*CORRESPONDENCE
Basheer Al-Tawil, Visual simultaneous localization and mapping (V-SLAM) plays a crucial role
[email protected] in the eld o robotic systems, especially or interactive and collaborative
RECEIVED 01 December 2023 mobile robots. The growing reliance on robotics has increased complexity in
ACCEPTED 20 February 2024 task execution in real-world applications. Consequently, several types o V-
PUBLISHED 10 April 2024
SLAM methods have been revealed to acilitate and streamline the unctions
CITATION o robots. This work aims to showcase the latest V-SLAM methodologies,
Al-Tawil B, Hempel T, Abdelrahman A and
Al-Hamadi A (2024), A review o visual SLAM
oering clear selection criteria or researchers and developers to choose
or robotics: evolution, properties, and uture the right approach or their robotic applications. It chronologically presents
applications. the evolution o SLAM methods, highlighting key principles and providing
Front. Robot. AI 11:1347985.
doi: 10.3389/robt.2024.1347985
comparative analyses between them. The paper ocuses on the integration o
the robotic ecosystem with a robot operating system (ROS) as Middleware,
COPYRIGHT
© 2024 Al-Tawil, Hempel, Abdelrahman and explores essential V-SLAM benchmark datasets, and presents demonstrative
Al-Hamadi. This is an open-access article gures or each method’s workfow.
distributed under the terms o the Creative
Commons Attribution License (CC BY). The
KEYWORDS
use, distribution or reproduction in other
orums is permitted, provided the original V-SLAM, interactive mobile robots, ROS, benchmark, Middleware, workfow, robotic
author(s) and the copyright owner(s) are
applications, robotic ecosystem
credited and that the original publication in
this journal is cited, in accordance with
accepted academic practice. No use,
distribution or reproduction is permitted
which does not comply with these terms. 1 Introduction
Robotics is an interdisciplinary eld that involves the creation, design, and operation
o tasks using algorithms and programming (Bongard, 2008; Joo et al., 2020; Awais
and Henrich 2010; Fong et al., 2003). Its impact extends to manuacturing, automation,
optimization, transportation, medical applications, and even NASA’s interplanetary
exploration (Li et al., 2023b; Heyer, 2010; Sheridan, 2016; Mazumdar et al., 2023). Service
robots, which interact with people, are becoming more common and useul in everyday
lie (Hempel et al., 2023; Lynch et al., 2023). Te imperative o integrating automation with
human cognitive abilities becomes evident in acilitating a successul collaboration between
humans and robots. Tis helps service robots be more eective in dierent situations
where they interact with people (Prati et al., 2021; Strazdas et al., 2020; Zheng et al., 2023).
Furthermore, using multiple robots together can help them handle complex tasks better
(Zheng et al., 2022; Li et al., 2023b; Fiedler et al., 2021). o manage and coordinate various
processes, a robot operating system (ROS) plays a signicant role (Buyval et al., 2017).
It is an open-source ramework that aids roboticists in implementing their research and
projects with minimal complexity. ROS oers a multitude o eatures, including hardware
integration, control mechanisms, and seamless device implementation into the system, thus
acilitating the development and operation o robotic systems (Altawil and Can 2023).

Frontiers in Robotics and AI 01 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

FIGURE 1
Article organizational chart.

As shown in Figure 1, the paper is divided into six sections. depends on the map, mapping depends on localization. Tus, the
Section 1 gives the brie introduction about robotics and SLAM. question is known as the “Chicken and Egg” question (aheri
Section 2 presents an overview o the V-SLAM paradigm that delves and Xia, 2021). In robotics, there are dierent tools to help
into its undamental concepts. robots obtain inormation rom surroundings and build their
Section 3 presents the state-o-the-art V-SLAM methods, map. One way is to use sensors such as LiDAR, which uses
oering insights into the latest advancements o them. Moving light detection and ranging sensors to make a 3D map (Huang,
orward, section 4 explores the evolution o V-SLAM and discusses 2021; Van Nam and Gon-Woo, 2021). Another way is to use
the most commonly used datasets. Section 5 ocuses on techniques cameras, such as monocular and stereo cameras, which are
or evaluating SLAM methods, aiding in the selection o appropriate applied in visual SLAM (V-SLAM). In this method, the robot
methods. Finally, Section 6 provides the conclusion o the article, uses pictures to gure out where it is and creates the required
summarizing the key points we discovered while working on our map (Davison et al., 2007). Regarding the paper’s intensive details,
review paper. we provide able 1 that summarizes and includes the description
Recently, we require robots that can move around and o abbreviations used in the article based on SLAM principles
work well in places they have never been beore. In this and undamentals.
regard, simultaneous localization and mapping (SLAM) emerges Due to the signicance o visual techniques in interactive robotic
as a undamental approach or these robots. Te primary goal applications, our research ocuses on V-SLAM methodologies and
o SLAM is to autonomously explore and navigate unknown their evaluation. V-SLAM can be applied to mobile robotics that
environments by simultaneously creating a map and determining utilizes cameras to create a map o their surroundings and easily
their own position (Durrant-Whyte, 2012; Mohamed et al., 2008). locate themselves within their work space (Li et al., 2020). It uses
Furthermore, it provides real-time capabilities, allowing robots to techniques such as computer vision to extract and match visual data
make decisions on-the-y without relying on pre-existing maps. Its or localization and mapping (Zhang et al., 2020; Chung et al., 2023).
utility extends to the extraction, organization, and comprehension o It allows robots to map complex environments while perorming
inormation, thereby enhancing the robot’s capacity to interpret and tasks such as navigation in dynamic elds (Placed et al., 2023;
interact eectively with its environment (Pal et al., 2022; Lee et al., Khoyani and Amini 2023). It places a strong emphasis on accurate
2020; Aslan et al., 2021). It is crucial to enable these robots to tracking o camera poses and estimating past trajectories o the robot
autonomously navigate and interact in human environments, thus during its work (Nguyen et al., 2022; Awais and Henrich 2010).
reducing human eort and enhancing overall productivity (Ara, Figure 2 provides a basic understanding o V-SLAM. It takes an
2022). Te construction o maps is based on the utilization o sensor image rom the environment as an input, processes it, and produces
data, such as visual data, laser scanning data, and data rom the a map as an output. In V-SLAM, various types o cameras are
inertial measurement unit (IMU), ollowed by rapid processing used to capture images or videos. A commonly used camera is the
(Macario Barros et al., 2022). monocular camera, which has a single lens, providing 2D visual
Historically, prior to the advent o SLAM technology, inormation (Civera et al., 2011). However, due to its limitation o
localization and mapping were treated as distinct entities. lacking depth inormation, researchers ofen turn to stereo cameras,
However, it was seen that there is a strong internal dependency which are equipped with two lenses set at a specic distance to
between mapping and localization. Although accurate localization capture images rom dierent perspectives, enabling depth details

Frontiers in Robotics and AI 02 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

TABLE 1 List o abbreviations used in this article. Kinect and stereo cameras, suitable or robust and accurate SLAM
systems (Luo et al., 2021).
Abbreviation Explanation Abbreviation Explanation
Previous research demonstrated the eectiveness o V-SLAM
V-SLAM Visual LSD Large-scale methods, but they are ofen explained with very ew details and
simultaneous direct separate gures (Khoyani and Amini, 2023; Fan et al., 2020), making
localization and it challenging to understand, compare, and make selections among
mapping
them. As a result, our study ocuses on simpliying the explanation
ROS Robot OKVIS Open o V-SLAM methodologies to enable readers to comprehend them
Operating keyrame-based easily. Te main contributions o the study can be described as
System visual–inertial ollows:
Lidar Light detection DVO Dense visual
and ranging odometry • Investigation into V-SLAM techniques to determine the most
appropriate tools or use in robotics.
BA Bundle RPGO Robust • Creation o a graphical and illustrative structural workow or
adjustment pose-graph each method to enhance the comprehension o the operational
optimization
processes involved in V-SLAM.
BoW Bag o words IMU Inertial • Presentation o signicant actors or the evaluation and
measurement selection criteria among the V-SLAM methods.
unit • Compilation o a comparative table that lists essential
parameters and eatures or each V-SLAM method.
PAM Parallel GPS Global
tracking and positioning • Presentation and discussion o relevant datasets employed
mapping system within the domain o robotics applications.

FAS Features rom MAV Micro air


accelerated vehicle
segment test
2 Visual SLAM paradigm
ROVIO Robust AGV Automated-
visual–inertial guided vehicle
As discussed in Introduction, V-SLAM uses sensor data to
odometry
provide valuable inormation to the system (Khoyani and Amini,
HRI Human–robot UAV Unmanned 2023). Mobile robots and autonomous vehicles require the ability to
interaction aerial vehicle understand their environment to complete their tasks and achieve
their goals (Ai et al., 2021). Tis understanding is essential or them
DAM Dense tracking AR Augmented
to be successul in their operations (Bongard, 2008).
and mapping reality
Te V-SLAM ramework is composed o sequential steps
LCP Loop closure VR Virtual reality that are organized to create the system and process its data;
process see Figure 3, which explains the processes perormed within V-
SLAM in parallel with the demonstrated pictures. Tis includes the
SS Semantic RoLi Range o light
segmentation intensity creation o a detailed map, a trajectory estimator, and the precise
positioning and orientation o the cameras attached to that system
DSt Dense stereo ILR Illumination (Beghdadi and Mallem, 2022; Kazerouni et al., 2022). Within this
and light ramework, various scenarios can be eectively implemented and
robustness
operated, such as pixel-wise motion segmentation (Hempel and
DSe Dense BRIEF Binary Robust Al-Hamadi, 2020), semantic segmentation (Liu and Miura, 2021),
semantics Independent and ltering techniques (Wang et al., 2023; Grisetti et al., 2007).
Elementary Tese approaches aim to achieve a proessional approach or a
Features
visual representation o the processes involved in V-SLAM. Te
SCE Spatial operational ramework has been systematically divided into our
coordinate sections, which can be listed and explained herein.
errors

2.1 Data acquisition and system


(Gao et al., 2020; Meng et al., 2018). Another valuable option in initialization
V-SLAM is the use o RGB-D cameras, which are capable o
capturing both color inormation (RGB) and depth inormation (D) In this stage o V-SLAM, we systematically prepare input data
(Meng et al., 2018). Although monocular cameras are inexpensive using system hardware, which includes capturing and preparing
and lightweight, they may require additional sensors in order to images. It involves installing cameras such as RGB-D cameras, depth
provide accurate data. In contrast, RGB-D and stereo cameras cameras, or inrared sensors or collecting data and initializing the
provide depth inormation. Tis makes RGB-D, such as Microsof’s system (Beghdadi and Mallem, 2022).

Frontiers in Robotics and AI 03 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

FIGURE 2
Schematic representation o a robotic system’s architecture, highlighting the incorporation o SLAM and its location within the system.

Te system gathers data, with a particular emphasis on crucial comprising obstacles, topography, and occupancy. It unctions as
ltering details aimed at eectively eliminating any noise present in a undamental data structure or several robotics navigation and
the input data (Mane et al., 2016; Grisetti et al., 2007). Te rened localization techniques (Grisetti et al., 2007). A eature-based map
data are then sent to the next stage or urther processing to extract is a representation which captures the eatures o the environment,
eatures rom the input inormation (Ai et al., 2021). As a result, such as landmarks or objects, to acilitate localization and navigation
progress in SLAM methods has resulted in the creation o numerous tasks (Li et al., 2022a). A point cloud map is a representation o a
datasets accessible to researchers to evaluate V-SLAM algorithms physical space or object made rom lots o 3D dots, showing how
(El Bouazzaoui et al., 2021). things are arranged in a place. It is created using special cameras or
sensors and helps robots and computers understand what is around
them (Chu et al., 2018).
2.2 System localization Afer setting up keyrames during the localization stage, the
workow progresses to eld modeling. Ten, key points and eature
In the second stage o V-SLAM, the system ocuses on nding lines are identied and detected, which is crucial or generating a
its location, which is an important part o the entire process map (Schneider et al., 2018). It is a process that builds and updates
(Scaradozzi et al., 2018). It involves the execution o various the map o an unknown environment and is used to continuously
processes that are crucial or successully determining where the track the robot’s location (Chen et al., 2020). It is a two-way process
robot is. Feature tracking plays a central role during this phase, with that works together with the localization process, where they depend
a primary ocus on tasks such as eature extraction, matching, re- on each other to achieve SLAM processes. It gathers real-time
localization, and pose estimation (Picard et al., 2023). It aims to align data about the surroundings, creating both a geometric and a
and identiy the rames that guide the estimation and creation o the visual model r13 (accessed on 14 November 2023). In addition, the
initial keyrame or the input data (Ai et al., 2021). A keyrame is a process includes the implementation o bundle adjustments (BAs)
set o video rames that includes a group o observed eature points to improve the precision o the generated map beore it is moved
and the camera’s poses. It plays an important role or the tracking to the nal stage (Acosta-Amaya et al., 2023). BA is a tool that
and localization process, helping in eliminating drif errors or simultaneously renes the parameters essential or estimating and
camera poses attached to the robot (Sheng et al., 2019; Hsiao et al., reconstructing the location o observed points in available images.
2017). Subsequently, this keyrame is sent or urther processing in It plays a crucial role in eature-based SLAM (Bustos et al., 2019;
the next stage, where it will be shaped into a preliminary map, a Eudes et al., 2010).
crucial part or the third stage o the workow (Aloui et al., 2022;
Zhang et al., 2020).
2.4 System loop closure and process
tuning
2.3 System map ormation
Te nal stage in the V-SLAM workow involves ne-tuning
Te third stage o the V-SLAM workow ocuses on the the process and closing loops, resulting in the optimization o the
crucial task o building the map, an essential element in V-SLAM nal map. In V-SLAM, the loop closure procedure examines and
processes. Various types o maps can be generated using SLAM, maintains previously visited places, xing any errors that might
including topological maps, volumetric (3D) maps, such as point have occurred during the robot’s exploration within an unknown
cloud and occupancy grid maps, and eature-based or landmark environment. Tese errors typically result rom the estimation
maps. Te choice o the map type is based on actors such as processes perormed in earlier stages o the SLAM workow
the sensors employed, application requirements, environmental (sintotas et al., 2022; Hess et al., 2016). Loop closure and process
assumptions, and the type o dataset used in robotic applications tuning can be done using dierent techniques, such as the extended
(aheri and Xia, 2021; Fernández-Moral et al., 2013). In robotics, Kalman lter SLAM (EKF-SLAM). EKF-SLAM combines loop
a grid map is a representation o a physical environment, with closure and landmark observation data to adjust the map in the
each cell representing a particular location and storing data Kalman lter’s state estimate. Tis tool helps address uncertainties

Frontiers in Robotics and AI 04 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

FIGURE 3
Visual SLAM architecture: an overview o the our core components necessary or visual SLAM: data acquisition, system localization, system mapping,
and system loop closure, and process tuning, enabling mobile robots to perceive, navigate, and interact with their environment.

in the surrounding world (map) and localize the robot within it have collectively simplied and enhanced its strategy in real-lie
(Song et al., 2021; Ullah et al., 2020). applications (Beghdadi and Mallem, 2022; Duan et al., 2019).
Te bag-o-words (BoW) approach is another technique Te landscape o V-SLAM is composed o a variety o
used to enable robots to recognize and recall previously visited methodologies, which can be divided into three categories, namely,
locations. Tis is similar to how humans remember places they only visual SLAM, visual-inertial SLAM, and RGB-D SLAM
have been to in the past, even afer a long time, due to the (Macario Barros et al., 2022; Teodorou et al., 2022), as shown in
activities that took place there. BoW works by taking the visual Figure 4. In this section, we provide a brie overview o the current
eatures o each image and converting them into a histogram state-o-the-art V-SLAM algorithms and techniques, including
o visual words. Tis histogram is then used to create a xed- their methodology, eciency, time requirements, and processing
size vector representation o the BoW, which is stored or capacity, as well as whether they are designed to run on-board or
use in matching and loop-closing processes (Cui et al., 2022; o-board computer systems (ourani et al., 2022). Additionally, we
sintotas et al., 2022). combine various graphical representations to create a single and
Finally, graph optimization is used as a correction tool or loop comprehensive visual representation o the method workow, as
closure processes. It renes the nal map and robot’s trajectory by shown in Figure 5.
optimizing the graph based on landmarks. Tis technique involves
a graph-based representation o the SLAM issue, where vertices
represent robot poses and map characteristics and edges represent 3.1 Only visual SLAM
constraints or measurements between the poses. It is commonly
used as a correction tool in graph-based SLAM types (Zhang et al., It is a SLAM system designed to map the environment around
2017; Chou et al., 2019; Meng et al., 2022). the sensors while simultaneously determining the precise location
In conclusion, these comprehensive workow processes and orientation o those sensors within their surroundings. It
outlined in Sections 2.1, 2.2, 2.3, and 2.4, respectively, play an relies entirely on visual data or estimating sensor motion and
important role in V-SLAM or robotics as they acilitate the reconstructing environmental structures (aketomi et al., 2017).
simultaneous creation o maps and real-time location tracking It uses monocular, RGB-D, and stereo cameras to scan the
within the operational environment (Li et al., 2022b). environment, helping robots map unamiliar areas easily. Tis
approach has attracted attention in the literature because it is
cost-eective, easy to calibrate, and has low power consumption
3 State-o-the-art o visual SLAM in monocular cameras while also allowing depth estimation and
methods high accuracy in RGB-D and stereo cameras (Macario Barros et al.,
2022; Abbad et al., 2023). Te methods used in this part can be
V-SLAM plays a signicant role as a transormative topic listed herein.
within the robotics industry and research (Khoyani and Amini,
2023; Acosta-Amaya et al., 2023). Te progress in this eld can 3.1.1 PTAM-SLAM
be attributed to tools such as machine learning, computer vision, PAM-SLAM, which stands or parallel tracking and mapping
deep learning, and state-o-the-art sensor technologies, which (PAM), is a monocular SLAM used or real-time tracking systems.

Frontiers in Robotics and AI 05 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

capabilities and high-quality map reconstruction, it is widely


used in applications such as the human–robot interaction (HRI)
(Mur-Artal et al., 2015), augmented reality, and autonomous
navigation (Zhu et al., 2022; Yang et al., 2022). ORB-SLAM is
designed to handle robust and unstable motion clutter, covering
essential processes such as tracking, mapping, and loop closing
(Campos et al., 2021). Compared to other advanced V-SLAM
methods, ORB-SLAM outperorms by enhancing the dynamic, size,
and traceability o the map. It achieves real-time global localization
rom wide baselines, perorms camera re-localization rom various
viewpoints, and makes better selections or rames and points in the
reconstruction process (Ragot et al., 2019; Mur-A and ars, 2014);
see able 2.
ORB-SLAM1 categorized to be only-visual (Mur-Artal et al.,
2015; Mur-A and ars, 2014), while ORB-SLAM2 expands to both
only-visual and RGB-D SLAM (Ragot et al., 2019; Mur-Artal and
ardós 2017a). Furthermore, ORB-SLAM3 urthers its classication
to include all three categories: only-visual, visual-inertial, and RGB-
D SLAM. Tis expansion underscores the adaptability and versatility
o ORB-SLAM in real-lie applications (Zang et al., 2023; Ca et al.,
2021; Campos et al., 2021).
Te ORB-SLAM methodology process goes through our
sequential phases (Mur-Artal et al., 2015; Mur-Artal and ardós
2017a; Ca et al., 2021). Te initial phase involves the sensor input
and the tracking process (Joo et al., 2020). Across all ORB-SLAM
versions, this phase shares a common approach, ocusing on pose
preparation and rame generation to acilitate decision-making
(Sun et al., 2017). However, the dierence lies in input usage; or
example, ORB-SLAM1 uses one input, ORB-SLAM2 uses three,
FIGURE 4
Illustration o visual SLAM types: only-visual SLAM, visual-inertial and ORB-SLAM3 uses our (Campos et al., 2021). Tereore, the
SLAM, and RGB-D SLAM. quality and eciency o the next operation depend on the input
in the rst stage. In the next phase, local mapping is done by
adding new keyrames and creating map points with the localization
process simultaneously (Ca et al., 2021). Tis part remains consistent
It has 6-DoF camera tracking, which can be used in small scenes across all versions, but version 3 enhances its unctionality by
(K. and Mu. (2007). Tis methodology demonstrates remarkable incorporating additional bundle adjustment or improved eature
eciency in dynamic operational settings, consistently providing detection and matching (Dai et al., 2021). Te subsequent phase
high perormance even in conditions o requent and unstable involves loop closing, process optimization, and selecting similar
lighting variations (Soliman et al., 2023); see able 2. candidate data in all versions. However, versions 2 and 3 include
Te system workow consists o our sequential stages (Klien additional steps such as bundle adjustment welding and map
and Murray. 2007; Fernández-Moral et al., 2013). Input preparation merging (Mur-Artal and ardós, 2017a; Zang et al., 2023). Te last
and system initialization involve processes such as monocular stage is preparing the output, ocusing on creating the nal map
camera translation and rotation to improve image eciency and that includes essential inormation such as graphs, lines, point
clarity (De Croce et al., 2019). Te tracking process is carried out, mapping, and 2D and 3D maps or use in the SLAM process
where tasks related to image and video processing are perormed (Acosta-Amaya et al., 2023). Figure 5 parts 4, 5, and 6 give a detailed
to prepare data or subsequent mapping procedures. Following that, observation about the methods o ORB-SLAM 1, 2, and 3 versions,
the optimization and mapping processes are carried out to prepare respectively, showcasing their eatures and unctionalities or a
the map and reveal the outputs, which include the camera pose and better understanding.
the 3D map used in SLAM operations (Klien and Murray. 2007;
Servières et al., 2021). All processes and steps are simplied and 3.1.3 LSD-SLAM
demonstrated in Figure 5, part 1. LSD-SLAM, which stands or large-scale direct monocular
SLAM, is an advanced technique made or real-time mapping and
3.1.2 ORB-SLAM positioning. It can utilize various camera setups. It is designed or
ORB-SLAM stands or oriented FAS (eatures rom accelerated large-scale mapping jobs where it can create a very accurate and
segment test) and rotated BRIEF (binary robust independent detailed map o the working elds. In addition, it stays accurate
elementary eatures) SLAM (ourani et al., 2022). Tis eature- even with a lower image resolution (Engel et al., 2015; Fernández-
based detector is applicable in both small and large indoor Moral et al., 2013). Tis exibility makes it a better choice or
or outdoor elds (ourani et al., 2022). Due to its real-time operating in complex, wide-ranging and dynamic environments and

Frontiers in Robotics and AI 06 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

FIGURE 5
Visual SLAM methods, illustrating the state-o-the-art method and workfow or select notable SLAM methods eatured in this study, presented in a
simplied view.

is used in various applications such as robotics and sel-driving cars LSD and DVO-SLAM processes can unction similarly, and
(Mur-Artal et al., 2015; Eng et al., 2014); see able 2. their workow is structured in ve stages (Macario Barros et al.,
LSD-SLAM distinguishes itsel rom the DAM-SLAM 2022; Luo et al., 2021; Schöps et al., 2014; Engel et al., 2015). Te
approach by ocusing on areas with strong intensity changes, leaving rst stage includes inputting mono- and stereo data and preparing
out regions with little or no texture details. Tis choice comes rom them or the next processing step. Te second stage is designed or
the challenge o guring out how ar things are in areas where there tracking and estimating the initial pose by aligning images rom
is not much texture inside images. As a result, LSD-SLAM goes both mono and stereo cameras. Te third stage is dedicated to loop
beyond what DAM can do by concentrating on places with strong closure processes, involving keyrame preparation, regularization,
changes in brightness and ignoring areas with very little texture and data updates to prepare rames or subsequent stages. Te ourth
(Acosta-Amaya et al., 2023; Khoyani and Amini, 2023). stage carries out map optimization, including two critical phases,

Frontiers in Robotics and AI 07 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

TABLE 2 Comparative scenarios or actively used visual SLAM methods.

Sensor ILR

SLAM M S I O W-S Output Application RoLI T2D Hardware S.M


method usage eld deployment

PTAM K. ✓ × × × M-H Pose-estimation Robotics, AR, and +++ ++++ ODROID-XU4, GPL (2023)
and Mu 3D mapping VR Intel Quad-Core

DTAM Ne. ✓ ✓ × RGBD S-I extured depth Robotics, AR, VR, ++ +++ nvidia.gtx.480.gpu, Rintar (2023)
et al map AGV, and gpgpu-Processors
simulators

RTAB. M ✓ ✓ ✓ Lidar L-H 2D and 3D Robotics, VR, AR, +++ +++ Jetson Nano, Intel Introlab (2023)
Labbé mapping and 3D Core.i5.8th.gen
reconstruction

ORB.S ✓ × × × M-H ree-spanning and Robotics mapping ++++ +++ Intel raulmur (2023a)
Mur-A pose-estimating indoor navigation Core.i7.4700MQ

ORB.S2 ✓ ✓ × RGBD M-H Point-mapping Mobile mapping, ++++ ++++ Intel Core-i7.4790 raulmur (2023b)
Leut et al and keyrame robotics, VR, and and
selection UAVs RealSense-D435

ORB.S3 Ca. ✓ ✓ ✓ sh.e L-H 2D and 3D-Map Robotics, security, +++++ +++++ Jetson-tx2, pi.3B + uz.slaml (2023)
et al and tree-spanning and 3D nvidia.georce
reconstruction

RGBD.S × × ✓ RGBD L-H Maps, trajectories 3D-scanning, +++ ++++ Intel Core.i9.9900k elix. (2023)
End et al and 3D point robotics and UAVs and Quad
cloud Core.cpu.8.GB

SCE.S Son × ✓ × RGBD M-I Camera pose and Robotics, AR, and ++++ +++ nvidia.Jetson.AGX, None
et al Semantic Map AGV 512.core.Volta.GPU

OKVIS ✓ ✓ ✓ × M-H Graph estimation Robotics, UAVs, ++++ ++++ Up-Board, eth.a (2023a)
Leut et al and eature and VR ODROID.xu4, and
tracking ®
Intel CoreM.i7

ROVIO ✓ ✓ ✓ sh.e L-H Position and Robotics, AR, and +++ +++ ODROID-xu4 and eth.a (2023b)
Blo. et al orientation depth sel-driving. cars Intel i7-2760QM
map

VINS.M ✓ × ✓ × L-H Keyrame database Robotics, AR, and +++ +++ Intel Pentium, Intel hkust.a (2023)
Qin et al pose estimation VR Core i7-4790 CPU

LSD.S Eng ✓ ✓ × RGBD L-H Keyrame Robotics and ++++ +++++ pga.zynq.7020.soc CVG, . U. o. M. (2023)
et al selection and 3D
mapping
sel-driving cars ®
Intel NUC6i3SYH

DVO.S Kerl × ✓ × RGBD S-I 3D mapping Robotics and AR +++ +++ Sony Xperia.z1, tum.v (2023)
et al image alignment Perception Intel Xeon E5520

Kimera.S ✓ ✓ ✓ Lidar M-H rajectory Robotics, UAV, ++++ +++++ Not mentioned MI.S (2023)
Ros. et al estimate semantic VR, and AGV
mesh

-ILR, illumination and light robustness—evaluates how well each SLAM method responds to varying environmental lighting.
-RoLI, range o light intensity—measures the robot’s ability to operate eectively across a broad spectrum o light intensities, rom very dark to very bright.
-2D, tolerance to directionality—assesses the robot’s capability to unction in environments with strong directional light sources, such as spotlights and windows.
-W-S, denes the operational scale and application eld o the robot (M, medium; L, large; S, small, H, hybrid, I, indoor).
-S.M, sources and materials—provides links to the source codes used in the method.
-VINS.M.S, VINS-Mono SLAM; M, monocular camera; S, stereo camera; IMU, inertial measurement unit; O, other sensors; sh.e, sh-eye camera; rgbd, RGB-D camera.

which are direct mapping and eature-based mapping. It also covers its pointsassesses their perormance under varyin with semi-dense
processes such as activation, marginalization, and direct bundle adjustments or use in the output stage. In the nal stage, the
adjustment. Tese operations shape the necessary map and manage estimated camera trajectory and pose with the dense 3D map are

Frontiers in Robotics and AI 08 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

prepared or application in robotics’ SLAM unctions; see Figure 5, We have structured the OKVIS-SLAM workow into three key
part 14 or a detailed workow. phases (Leutenegger, 2022; Kasyanov et al., 2017; Wang et al., 2023).
Te rst phase ocuses on receiving initial sensor inputs, including
3.1.4 DVO-SLAM IMU and visual data. It initializes the system, conducts IMU
DVO-SLAM, which stands or dense visual odometry SLAM, is integration, and employs tracking techniques to prepare the data or
designed to acilitate real-time motion estimation and map creation subsequent processing. Te second phase is the real-time estimator
using depth-sensing devices, such as stereo and mono cameras and odometry ltering phase, covering various operations, such
(Schöps et al., 2014). It stands out or its ability to generate detailed as landmark triangulation and status updating. Te triangulation
and accurate environment maps while tracking the position and process is used or estimation used to generate the 3D position
orientation (Luo et al., 2021; Zhu et al., 2022). DVO-SLAM uses o visual landmarks to enhance SLAM operation (Yousi et al.,
point-to-plane metrics in photo metric bundle adjustment (PBA), 2015). In the last phase, optimization and ull graph estimation
enhancing the navigation o robotic systems, especially in situations are perormed. Tis includes loop closure detection, window
with less textured points. Te point-to-plane metric is a cost unction sliding, and marginalization. Te phase selects relevant rames and
and optimization tool that is used to optimize the depth sensor optimizes the overall graph structure, ultimately providing essential
poses and plane parameters or 3D reconstruction (Alismail et al., outputs or the SLAM system; see Figure 5, part 11.
2017; Zhou et al., 2020; Newcombe et al., 2011). Tese eatures make
DVO-SLAM suitable or more accurate applications such as in 3.2.2 ROVIO-SLAM
robotics and augmented reality (AR), and it is robust or operating in ROVIO-SLAM, which stands or robust visual-inertial
slightly unstable light sources (Khoyani and Amini, 2023; Kerl et al., odometry SLAM, is a cutting-edge sensor usion method that
2013); see able 2. smoothly combines visual and inertial data. Tis integration
signicantly enhances navigation accuracy, leading to improved
work eciency in robotics systems (Blo et al., 2015; Wang et al.,
3.2 Visual-inertial SLAM 2023). It brings valuable attributes or robotics, excelling in robust
perormance in challenging environments, and presents a smooth
VI-SLAM is a technique that combines the capabilities o interaction between the robot and its surroundings (Li et al., 2023a).
visual sensors, such as stereo cameras, and inertial measurement It eciently handles extensive mapping processes, making it suitable
sensors (IMUs) to achieve its SLAM objectives and operations or large-scale applications (Kasyanov et al., 2017). Moreover, it
(Servières et al., 2021; Leut et al., 2015). Tis hybrid approach allows operates with low computational demands and high robustness to
a comprehensive modeling o the environment, where robots light, making it ideal or cost-eective robotic platorms designed
operate (Zhang et al., 2023). It can be applied to various real-world or sustained, long-term operations (Leutenegger, 2022).
applications, such as drones and mobile robotics (aketomi et al., ROVIO-SLAM workow is divided into three stages
2017). Te integration o IMU data enhances and augments (Picard et al., 2023; Nguyen et al., 2020; Schneider et al., 2018).
the inormation available or environment modeling, resulting First, data rom visual cameras and IMU are obtained and prepared
in improved accuracy and reduced errors within the system’s or processing. In the next stage, eature detection, tracking, and
unctioning (Macario Barros et al., 2022; Mur-Artal and ardós semantic segmentation are done or visual data, while IMU data are
2017b). Te methods and algorithms used in this approach, while prepared or integration rom the other side. Te processing stage
implemented in real-lie applications, can be listed as shown in the involves loop closure operations, new keyrames insertion, and state
ollowing section. transition, along with data ltering. State transitions lead to the
generation o the key output, which is then transerred to the nal
3.2.1 OKVIS-SLAM stage, providing estimated position, orientation, and 3D landmarks;
OKVIS-SLAM, which stands or open keyrame-based see Figure 5, part 8.
visual-inertial SLAM, is designed or robotics and computer
vision applications that require real-time 3D reconstruction, 3.2.3 VINS Mono-SLAM
object tracking, and position estimation (Kasyanov et al., 2017). VINS Mono-SLAM, which stands or the visual-inertial
It combines visual and inertial measurements to accurately navigation system, is an advanced sensor usion technology that
predict the position and orientation o a robot simultaneously precisely tracks the motion and position o a robot or sensor in
(Leut et al., 2015). real-time. Utilizing only a single camera and an IMU, it combines
It accurately tracks the camera’s position and orientation in real- visual and inertial data to enhance accuracy and ensure precise
time control during a robot’s motion (Leutenegger, 2022). It uses unctionality o robot operations (Mur-Artal and ardós, 2017b).
image retrieval to connect keyrames in the SLAM pose-graph, aided Known or its eciency in creating maps and minimizing drif
by the pose estimator or locations beyond the optimization window errors, VINS-Mono excels in navigating challenging environments
o visual–inertial odometry (Kasyanov et al., 2017; Wang et al., with dynamic obstacles (Bruno and Colombini, 2021). Its smooth
2023). For portability, a lightweight semantic segmentation CNN perormance in dicult lighting conditions highlights its reliability,
is used to remove dynamic objects during navigation (Leutenegger, ensuring optimal unctionality or mobile robots operating in
2022). OKVIS’s real-time precision and resilience make it suitable unstable lighting conditions (Song et al., 2022; Kuang et al., 2022).
or various applications, including robotics and unmanned aerial Tis power-ecient, real-time monocular VIO method is suitable
vehicles (UAVs). It can operate eectively in complex and unstable or visual SLAM applications in robotics, virtual reality, and
illumination environments (Wang et al., 2023); see able 2. augmented reality (Gu et al., 2022); see able 2.

Frontiers in Robotics and AI 09 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

Te VINS-Mono SLAM workow is organized into our stages areas with low-textured suraces (Zhang et al., 2021b). Te objective
(Qin et al., 2018; Xu et al., 2021). In the rst stage, we gathered o RGB-D SLAM is to generate a precise 3D reconstruction or the
visual and inertial data and prepared them or acquisition and system surroundings, with a ocus on the acquisition o geometric
measurement processing, including eature extraction, matching, data to build a comprehensive 3D model (Chang et al., 2023). Te
and IMU data preparation, and sent them or visual and inertial methods used in this section are listed as ollows:
alignment. Te second stage handles loop closure operations and
re-localization to adjust old states with additional eature retrieval
or the next step. Te third stage ocuses on process optimization, 3.3.1 RTAB-Map SLAM
incorporating bundle adjustments and additional propagation or
eciency. Te nal stage outputs the system’s estimated pose and RAB-Map SLAM, which stands or real-time appearance-
a keyrame database, applicable to SLAM; see Figure 5, part 13. based mapping, is a visual SLAM technique that works with
RGB-D and stereo cameras (Ragot et al., 2019). It is a versatile
3.2.4 Kimera-SLAM algorithm that can handle 2D and 3D mapping tasks depending
Kimera-SLAM is an open-source SLAM technique applied on the sensor and data that are given (Peter et al., 2023; Acosta-
or real-time metric semantic purposes. Its ramework is highly Amaya et al., 2023). It integrates RGB-D and stereo data or
dependent on previous methodologies such as ORB-SLAM, VINS- 3D mapping, enabling the detection o static and dynamic 3D
Mono SLAM, OKVIS, and ROVIO-SLAM (Ros. et al., 2020). objects in the robot’s environment (Ragot et al., 2019). It is
Exhibiting robustness in dynamic scenes, particularly in the applicable in large outdoor environments where LiDAR rays cannot
presence o moving objects (Wang et al., 2022), Kimera-SLAM reect and manage the eld around the robot (Gurel, 2018).
showcases resilience to variations in lighting conditions. It operates Variable lighting and environmental interactions can cause robotic
eectively in both indoor and outdoor settings, making it highly localization and mapping errors. Tereore, RAB’s robustness and
compatible with integration into interactive robotic systems adaptability to changing illumination and scenes enable accurate
(Rosinol et al., 2021). In summary, Kimera-SLAM provides a operation in challenging environments. It can handle large, complex
thorough and ecient solution or real-time metric-semantic environments and is quickly adaptable to work with multiple
SLAM, prioritizing accuracy, modality, and robustness in its cameras or laser rangenders (Li et al., 2018; Peter et al., 2023).
operations (Rosinol et al., 2021); see able 2. Additionally, the integration o 265 (Intel RealSense Camera)
Te procedural workow o this technique can be summarized and implementation o ultra-wideband (UWB) (Lin and Yeh,
in ve stages (Ros et al. (2020). First, the input pre-processing 2022) address robot wheel slippage with drifing error handling,
includes dense 2D semantics, dense stereo, and Kimera-VIO. It also enhancing system eciency with precise tracking and 3D point
includes ront-end and back-end operations such as tracking, eature cloud generation, as done in Persson et al. (2023); see able 2.
extraction, and matching, which yield an accurate state estimation. Te RAB-MAP SLAM method involves a series o steps that
Te second stage involves robust pose graph optimization (Kimera- enable it to unction (Gurel, 2018; Labbé and Michaud, 2019).
RPGO), tasked with optimization and the ormulation o a global Initially, the hardware and ront-end stage is responsible or tasks
trajectory. Subsequently, the third stage eatures the per-rame such as obtaining data rom stereo and RGB-D cameras, generating
and multi-rame 3D mesh generator (Kimera–Mesher), responsible rames, and integrating sensors. Tis stage prepares the rames that
or the execution and generation o 3D meshes representing the will be used in the subsequent stage. Afer the rames have been
environment. Te ourth stage introduces semantically annotated processed simultaneously with the tracking process, the loop closure
3D meshes (Kimera-Semantics), dedicated to generating 3D meshes is activated to generate the necessary odometry. Subsequently, the
with semantic annotations. Tis stage sets the groundwork or the keyrames equalization and optimization processes are initiated to
subsequent and nal stage, where the generated 3D meshes are improve the quality o the 2D and 3D maps generated or SLAM
utilized or output visualization, ultimately serving SLAM purposes, applications, as shown in Figure 5, part 7.
as illustrated in Figure 5, part 9.
3.3.2 DTAM-SLAM
DAM-SLAM, which stands or dense tracking and mapping,
3.3 RGB-D SLAM is a V-SLAM algorithm specied or real-time camera tracking.
It provides robust six degrees o reedom (6 DoF) tracking
RGB-D is an innovative approach that integrates RGB-D and acilitates ecient environmental modeling or robotic
cameras with depth sensors to estimate and build models o systems (Ne. et al., 2011; Macario Barros et al., 2022). Tis
the environment (Ji et al., 2021; Macario Barros et al., 2022). approach plays a undamental role in advancing applications
Tis technique has ound applications in various domains, such as robotics, augmented reality, and autonomous navigation,
including robotic navigation and perception (Luo et al., 2021). It delivering precise tracking and high-quality map reconstruction.
demonstrates ecient perormance, particularly in well-lit indoor Furthermore, it is slightly dynamic with light; thus, it is accurate
environments, providing valuable insights into the spatial landscape to operate in high and strong illumination elds (Zhu et al., 2022;
(Dai et al., 2021). Yang et al., 2022); see able 2.
Te incorporation o RGB-D cameras and depth sensors Te DAM-SLAM workow is divided into a series o steps,
enables the system to capture both color and depth inormation each with its own purpose (Ne et al., 2011; Macario Barros et al.,
simultaneously. Tis capability is advantageous in indoor 2022). It begins with the input such as the RGB-D camera,
applications, addressing the challenge o dense reconstruction in which helps initialize the system work. In the camera tracking

Frontiers in Robotics and AI 10 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

and reconstruction stage, the system selects rames and estimates and makes it useul with greater accuracy and robustness in dynamic
textures on the image. It then accurately tracks the 6DoF situations with the help o merging semantic and geometric data and
camera motion, determining its exact position and orientation. leveraging YOLOv7 or quick object recognition (Wu et al., 2022).
Furthermore, the optimization ramework is activated and uses Tanks to these improvements, the SLAM algorithms can be well-
techniques such as spatially regularized energy minimization to suited or dynamic scenarios which allows in greater adaptability
enhance data terms, thereby improving the image quality that and comprehension o system surroundings. Tis enables robotic
is captured rom video streaming. As a result, the advanced systems to operate in more complex circumstances with the ewer
process tuning carries out operations that improve the method’s mistakes or slippage errors (Liu and Miura, 2021). Moreover, robots
perormance and producing precise outputs such as dense models, equipped with SCE-SLAM are empowered to operate in a more
surace patchwork, and texture depth maps (see Figure 5, part 2). exible and error-reduced manner, and it can operate in challenging
light environments (Son et al., 2023; Ren et al., 2022); see able 2.
3.3.3 RGBD-SLAM Te SCE-SLAM workow is divided into three key stages
RGDB-SLAM, which stands or simultaneous localization and (Son et al., 2023). Te rst stage involves the semantic module.
mapping using red–green–blue and depth data, is an important Tis module processes camera input data and employs Yolov2 to
method that creates a comprehensive 3D map containing both remove noise rom the input. Te second stage is the geometry
static and dynamic elements (Ji et al., 2021). Tis method involves module, where depth image analysis and spatial coordinate recovery
the tracking o trajectories and mapping o points associated with are perormed, preparing the system or integration with ORB-
moving objects (Steinbrücker et al., 2011; Niu et al., 2019). Using SLAM3. Te nal stage is dedicated to the integration o ORB-
these data types enhances and provides precise SLAM results SLAM3. Tis integration acilitates the execution o processes within
(End et al., 2012; Li Q. et al., 2022a). It has the ability to create ORB-SLAM3. Te process works in parallel with the loop closure
registered point clouds or OctoMaps or the purpose that can be technique, which results in a more accurate and precise system
used or robotic systems (Zhang and Li 2023; Ren et al., 2022). In output; see Figure 5, Part 12.
robotics applications, RGB-D SLAM, specically V-SLAM, excels
in both robustness and accuracy. It eectively addresses challenges
such as working in a dynamic environment (Steinbrücker et al., 4 Visual SLAM evolution and datasets
2011; Niu et al., 2019). Te implementation o RGB-D SLAM aced a
challenge in balancing segmentation accuracy, system load, and the Te roots o SLAM can be traced back to nearly three decades
number o detected classes rom images. Tis challenge was tackled ago, when it was rst introduced by Smith et al. Picard et al. (2023);
using ensorR, optimized by YOLOX or high-precision real- Khoyani and Amini (2023). Recently, visual SLAM has changed a lot
time object recognition (Chang et al., 2023; Martínez-Otzeta et al., and made a big impact on robotics and computer vision (Khoyani
2022). It has versatile applications in real-world robotics scenarios, and Amini, 2023). Along this journey, dierent V-SLAM methods
including autonomous driving cars, mobile robotics, and augmented have been created to tackle specic challenges in robot navigation,
reality (Zhang and Li, 2023; Bahraini et al., 2018); see able 2. mapping, and understanding the surroundings (Aloui et al., 2022;
Te RGB-D SLAM workow can be organized into ve essential Sun et al., 2017). o veriy and compare these V-SLAM methods,
stages, each playing a crucial role in the SLAM process (Ji et al., important datasets have been created which played a crucial role
2021; Hastürk and Erkmen, 2021; End et al., 2012). Te initial stage in the eld (Pal et al., 2022; ian et al., 2023a). In this section, we
involves data acquisition, where RGB-D and depth camera data are explore the evolution o V-SLAM methods over time and how they
collected as the oundational input or subsequent stages. Moving have advanced with the help o using the suitable datasets.
on to the second stage, processing o RGB-D details was activated. o oer a more comprehensible perspective, we provide an
During this phase, tasks include eature extraction and pairwise illustrative timeline depicting the evolution o the most well-
matching while simultaneously addressing depth-related activities, known V-SLAM methods, as shown in Figure 6. Tis graphical
such as storing point clouds, and aligning lines or shapes. In the third representation illustrates the development o the V-SLAM
stage, activities such as noise removal and semantic segmentation methodologies rom 2007 to 2021. Tese methods have been
(SS), in addition to loop closure detection, are perormed to lay the applied in various elds, including agriculture, healthcare, and
groundwork or map construction. Te ourth stage is dedicated industrial sectors, with a specic ocus on interactive mobile
to ocus on pose estimation and optimization techniques, leading robots. Additionally, we highlight several signicant and widely
to improvement in the accuracy o the system output. Te nal recognized benchmark datasets crucial to V-SLAM, as shown in the
stage involves generating trajectory estimation and maps, rening ollowing section.
the outputs or use in SLAM applications in robotic systems; see
Figure 5, part 3.
4.1 TUM RGB-D dataset
3.3.4 SCE-SLAM
SCE-SLAM, which stands or spatial coordinate errors SLAM, Te UM RGB-D dataset is a widely used resource in the
represents an innovative real-time semantic RGB-D SLAM eld o V-SLAM, which helps demonstrate the eectiveness and
technique. It has been developed to tackle the constraints posed by practicality o V-SLAM techniques. Tis dataset provides both
traditional SLAM systems when operating in dynamic environments RGB images and depth maps, with the RGB images saved in a
(Li et al., 2020). Te method was improved to increase the 640 × 480 8-bit ormat and the depth maps in a 640 × 480 16-
perormance o existing V-SLAM methods such as ORB-SLAM3 bit monochrome (Chu et al., 2018). It oers RGB-D data, making

Frontiers in Robotics and AI 11 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

FIGURE 6
Timeline illustrates the evolutionary journey o SLAM techniques, accompanied by the datasets that have played a pivotal role in their development. It
showcases the dynamic progression o SLAM technologies over time, refecting the symbiotic relationship between innovative methods and the rich
variety o datasets they have been tested and rened with.

it appropriate or both depth-based and V-SLAM techniques. providing comprehensive resources or algorithm development.
Its useulness extends to essential tasks such as mapping and Its comprehensive data structure makes it highly suitable or
odometry, providing researchers with a considerable volume o data thoroughly testing and validating algorithms tailored or MAV
or testing SLAM algorithms across diverse robotic applications purposes (Burri et al., 2016). For more details, reer to the EuRoC
(Ji et al., 2021; End et al., 2012). Te adaptability o these datasets is MAV dataset.
remarkable, as they nd application in mobile robotics and handheld
platorms, demonstrating eectiveness in both indoor and outdoor
environments (Martínez-Otzeta et al., 2022; Son et al., 2023). 4.3 KITTI dataset
Some o the recent studies used UM datasets, such as in Li et al.
(2023c). Tey have leveraged the UM RGB-D dataset to establish Te KII dataset is a widely utilized resource in robotics
benchmarks customized to their specic research objectives. Te navigation and SLAM, with a particular emphasis on V-SLAM.
study initiated its investigations with RGB-D images and ground Designed or outdoor SLAM applications in urban environments,
truth poses provided by the UM datasets, utilizing them to KII integrates data rom multiple sensors, including depth
construct 3D scenes characterized with real space eatures. Te cameras, lidar, GPS, and inertial measurement unit (IMU),
integrative role assumed by the UM RGB-D dataset in this context contributing to the delivery o precise results or robotic applications
attains proound signicance as a undamental resource within the (Geiger et al., 2013). Its versatility extends to supporting diverse
domain o V-SLAM research. For more details, reer to the UM research objectives such as 3D object detection, semantic
RGB-D SLAM dataset. segmentation, moving object detection, visual odometry, and
road-detection algorithms (Wang et al., 2023; Raikwar et al., 2023).
As a valuable asset, researchers routinely rely on the KII
4.2 EuRoC MAV benchmark dataset dataset to evaluate the eectiveness o V-SLAM techniques in real-
time tracking scenarios. In addition, it serves as an essential tool or
Te EuRoC MAV benchmark dataset is specically designed or researchers and developers engaged in the domains o sel-driving
micro aerial vehicles (MAVs) and contributes a valuable resource cars and mobile robotics (Geiger et al., 2012; Ortega-Gomez et al.,
in the domain o MAV-SLAM research since it includes sensor data 2023). Furthermore, its adaptability acilitates the evaluation o
such as IMU and visual data such as stereo images. Tese datasets, sensor congurations, thereby contributing to the renement and
published in early 2016, are made accessible or research purposes assessment o algorithms crucial to these elds Geiger et al. (2013).
and oer a diverse usability in indoor and outdoor applications. For more details, reer to the KII Vision Benchmark Suite.
Consequently, it serves as a relevant choice or evaluating MAV
navigation and mapping algorithms, particularly in conjunction
with various visual V-SLAM methodologies (Sharautdinov et al., 4.4 Bonn RGB-D dynamic dataset
2023; Leutenegger, 2022; Burri et al., 2016).
Te EuRoC MAV benchmark dataset, o notable benets to Te Bonn dataset is purposeully designed or RGB-D SLAM,
robotics, is particularly valuable or researchers working on visual- containing dynamic sequences o objects. It showcases RGB-D
inertial localization algorithms like OpenVINS (Geneva et al., data accompanied by a 3D point cloud representing the dynamic
2020; Sumikura et al., 2019) and ORB-SLAM2 (Mur-Artal and environment, which has the same ormat as UM RGB-D datasets
ardós, 2017a). Tis dataset incorporates synchronized stereo (Palazzolo et al., 2019). It covers both indoor and outdoor scenarios,
images, IMU measurements, and precise ground truth data, extending beyond the boundaries o controlled environments. It

Frontiers in Robotics and AI 12 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

proves valuable or developing and evaluating algorithms related collectively enhance the algorithm’s reliability in challenging real-
to tasks such as robot navigation, object recognition, and scene world situations, making them crucial actors or successul mobile
understanding. Signicantly, this dataset is versatile enough to robotic applications.
address the complexities o applications used in light-challenging
areas (Soares et al., 2021; Ji et al., 2021). In addition, it proves
to be an important resource or evaluating V-SLAM techniques 5.2 Computational efciency and real-time
characterized by high dynamism and crowds where the robot might requirements
ace the challenge o object detection and interaction with the
surrounding environment (Dai et al., 2021; Yan et al., 2022). For In the application o mobile robotics, the selection o the SLAM
more details, reer to the Bonn RGB-D dynamic dataset. algorithm is extremely important, ocusing on the eciency o the
process happening inside the robot’s computational architecture
(Macario Barros et al., 2022). Tereore, the chosen V-SLAM
algorithm must be careully tailored to meet the computational
4.5 ICL-NUIM dataset demands imposed by the real-time constraints o the robot. Tis
entails a delicate balancing act as the selected algorithm should
It is a benchmark dataset which is designed or RGB-D
be seamlessly integrated with the available processing power and
applications, serving as a valuable tool or evaluating RGB-D,
hardware resources, all while satisying the stringent real-time
visual odometry, and V-SLAM algorithms, particularly in indoor
requirements o the application. Te critical consideration or this
situations (Handa et al., 2014). It includes 3D sensor data and
step is the quality o the sensors, the proessors, and/or computers
ground truth poses, acilitating the benchmarking o techniques
so that they can generate a quick response and accurate localization
related to mapping, localization, and object detection in the domain in a very limited time (Henein et al., 2020).
o robotic systems. Its pre-rendered sequences, scripts or generating
test data, and standardized data ormats are benecial or researchers
in evaluating and improving their SLAM algorithms (Chen et al., 5.3 Flexible hardware integration
2020). A unique aspect o the ICL-NUIM dataset is its inclusion
o a three-dimensional model. Tis eature empowers researchers to In robotic applications, it is important or researchers to choose
explore and devise new scenarios or robotic systems, which operates a SLAM algorithm that works well with the robot’s sensors.
in unknown environments. Moreover, it promotes improvements Integrating suitable hardware improves speed and perormance
in V-SLAM, which makes it possible to generate semantic maps in SLAM systems through accelerators, method optimization,
that improve robots’ exibility and adaptability to integration into and energy-ecient designs (Eyvazpour et al., 2023). Various V-
that environment easily and exibly (Zhang et al., 2021a). For more SLAM algorithms are designed or specic sensor types such
details, reer to the ICL-NUIM dataset. as RGB-D, lidar, and stereo cameras. Tis acilitates seamless
integration into the SLAM system, enhancing the unctionality
o utilizing integrated hardware (Wang et al., 2022). Moreover, the
5 Guidelines or evaluating and availability o ROS packages and open-source sofware or sensors
selecting visual SLAM methods and cameras provides increased modality and exibility during
system installation. Tis, in turn, enhances adaptability and makes
Choosing the right visual SLAM algorithm is crucial or building integration easy and ree o challenges (Sharautdinov et al., 2023;
an eective SLAM system. With the continuous advancements in V- Roch et al., 2023). For example, the OAK-D Camera, also known as
SLAM methodologies responding to diverse challenges, it is essential the OpenCV AI Kit, is a smart camera that is great or indoor use. It
can automatically process data les and use neural reasoning right
to navigate structured criteria to deploy and implement precise
inside the camera, without needing extra computer power rom the
solutions (Placed et al., 2023; Sousa et al., 2023). In the context o
robot. Tis means it can run neural network models without making
robotic systems, we provide important parameters. We outline them
the robot’s operating system work harder (Han et al., 2023).
by oering concise explanations o the selection criteria that guide
how to choose suitable SLAM methods or eld applications. Tese
parameters are listed below. 5.4 System scalability
In SLAM algorithms or robotics, scalability is a vital actor
5.1 Robustness and accuracy to keep in mind during the design o the system Middleware
architecture. It enables rapid situational awareness over large
When choosing among V-SLAM methods, a key consideration areas, supports exible dense metric-semantic SLAM in multi-
is the robustness and accuracy o the method (Zhu et al., 2022). In robot systems, and acilitates ast map learning in unknown
particular, a robust algorithm can handle sensor noise, obstacles, and environments (Castro, 2021). Tis parameter needs to evaluate
changing environments to ensure continuous and reliable operation the algorithm’s capability to adjust to dierent mapping sizes and
(Bongard, 2008). Additionally, accuracy is equally important or environmental conditions, particularly considering light emission,
creating precise maps and localization, allowing the robot to make video, and/or image clarity. It should also provide versatility or
inormed decisions and move through the environment without various application needs, applicable to both indoor and outdoor
errors (Kucner et al., 2023; Nakamura et al., 2023). Tese qualities scenarios (Laidlow et al., 2019; Zhang et al., 2023).

Frontiers in Robotics and AI 13 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

5.5 Adapting to dynamic environments It also assesses their perormance under varying illumination
conditions, classiying algorithms based on their robustness, with
Te ability o a SLAM algorithm to handle dynamic objects categories ranging rom the lowest, which represents (+) and
in the environment is an important consideration or robotics to the highest which represents (+++++). Additionally, the table
systems. Tis parameter assesses the algorithm’s ability to detect, categorizes the algorithms based on their range o light intensity
track, and incorporate dynamic objects and moving obstacles (RoLI), which reects the robot’s ability to operate eectively in
into the mapping process (Lopez et al., 2020). It ocuses on the diverse lighting conditions, spanning rom very dim to extremely
algorithm’s capability to enable the robot to handle these objects bright. Moreover, the tolerance to directionality (2D) category
eectively and respond quickly during the ongoing SLAM process assesses the algorithm’s ability to unction in environments with
(Wu et al., 2022). A robust dynamic environment should ensure the strong directional light sources, such as spotlights and windows.
algorithm’s ability to adapt and respond in real-time applications. Collectively, these criteria collectively urnish a valuable resource or
Tis is crucial or systems operating in environments where changes researchers seeking to pick the most tting SLAM approach or their
occur instantaneously, such as in interactive robotics applications specic research endeavors.
(Li et al., 2018).

6 Conclusion
5.6 Open-source availability and
Te study simplies the evaluation o V-SLAM methods,
community support
making it easy to understand their behavior and suitability or
robotics applications. It covers various active V-SLAM methods,
When choosing a SLAM algorithm or our project, it is
each with unique strengths, limitations, specialized use cases,
important to observe whether it is open-source and has a
and special workows. It has served as a solid oundation or
community o active users. It is important because it makes it
the proposed research methodology or selection among V-
easier to customize and adapt the system according to our needs,
SLAM methods. Troughout the research, it becomes evident
beneting rom the experiences o the user community (Khoyani
that V-SLAM’s evolution is importantly linked to the availability
and Amini 2023; Xiao et al., 2019). Additionally, having community
o benchmark datasets, serving as a ground base or method
support ensures that the algorithm receives updates, bug xes, and
validation. Consequently, the work has laid a strong oundation
improvements. Tis enhances the reliability and longevity o the
or understanding the system behavior o the working V-SLAM
algorithm, making it better equipped to handle challenges during
methods. It explores SLAM techniques that operate in the ROS
system implementation (Persson et al., 2023).
environment, oering exibility in simpliying the architecture
o robotic systems. Te study includes the identication o
suitable algorithms and sensor usion approaches relevant to
5.7 Map data representation and storage researchers’ work.
By examining previous studies, we identied the potential
Tis parameter ocuses on how a SLAM algorithm is represented benets o incorporating V-SLAM sofware tools into the system
and manages maps, allowing the researcher to determine its architecture. Additionally, the integration o hardware tools
suitability or system hardware implementation. Te evaluation such as the 265 camera and OAK-D camera emerged as a
includes the chosen method’s map representation, whether it is valuable strategy. Tis integration has a signicant potential in
grid-based, eature-based, or point cloud, helping in assessing the reducing errors during robot navigation, thereby enhancing overall
eciency o storing map inormation in the robotic system without system robustness.
encountering challenges (Persson et al., 2023; Acosta-Amaya et al.,
2023). Te selection o map representation inuences memory
usage and computational demands. It is a critical actor or robotic Author contributions
applications, especially those based on CNN and deep learning
approaches (Duan et al., 2019). BA: investigation, sofware, supervision, and writing–review
In conclusion, we have summarized the preceding details in and editing. H: data curation, methodology, conceptualization,
able 2, oering a comprehensive overview o various V-SLAM validation, investigation, resources, visualization, writing–review
algorithms. Tis table serves as a valuable resource or inormed and editing. AA: methodology, ormal analysis, validation,
algorithm selection with comparative details or each method. It investigation, visualization, sofware, writing–review and editing.
oers insights into the sensor capabilities, examining the types AA–H: methodology, supervision, project administration,
o sensors most eectively used by each algorithm and their validation, unding acquisition, resources, writing–review and
role in acilitating algorithmic unctionality. Moreover, the table editing.
underscores the potential application domains o the methods,
empowering researchers to align their research objectives with
suitable V-SLAM methodologies. Te table also classies algorithms Funding
based on their mapping scale distinguishing between small-scale (up
to 100 m), medium-scale (up to 500 m), and large-scale (1 km and Te author(s) declare that nancial support was
beyond) mapping capabilities (ian et al., 2023b; Hong et al., 2021). received or the research, authorship, and/or publication

Frontiers in Robotics and AI 14 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

o this article. Tis work is unded and supported by the that could be construed as a potential conict o
Federal Ministry o Education and Research o Germany interest.
(BMBF) (AutoKoWA-3DMAt under grant No. 13N16336)
and German Research Foundation (DFG) under grants
Al 638/15-1. Publisher’s note
All claims expressed in this article are solely those o the
authors and do not necessarily represent those o their aliated
Conict o interest organizations, or those o the publisher, the editors, and the
reviewers. Any product that may be evaluated in this article, or claim
Te authors declare that the research was conducted in that may be made by its manuacturer, is not guaranteed or endorsed
the absence o any commercial or nancial relationships by the publisher.

Reerences
Abbad, A. M., Haouala, I., Raisov, A., and Benkredda, R. (2023). Low cost mobile Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., and ardós, J. D. (2021).
navigation using 2d-slam in complex environments Orb-slam3: an accurate open-source library or visual, visual–inertial, and multimap
slam. IEEE rans. Robotics 37, 1874–1890. doi:10.1109/tro.2021.3075644
Acosta-Amaya, G. A., Cadavid-Jimenez, J. M., and Jimenez-Builes, J. A. (2023).
Tree-dimensional location and mapping analysis in mobile robotics based on visual Castro, G. I. (2021). Scalability and consistency improvements in SLAM systems with
slam methods. J. Robotics 2023, 1–15. doi:10.1155/2023/6630038 applications in active multi-robot exploration. Ph.D. thesis (FACULY OF EXAC AND
NAURAL SCIENCES DEPARMEN OF COMPUAIONÓN Improvements).
Ai, Y.-b., Rui, ., Yang, X.-q., He, J.-l., Fu, L., Li, J.-b., et al. (2021). Visual slam
in dynamic environments based on object detection. De. echnol. 17, 1712–1721. Chang, Z., Wu, H., and Li, C. (2023). Yolov4-tiny-based robust rgb-d slam approach
doi:10.1016/j.dt.2020.09.012 with point and surace eature usion in complex indoor environments. J. Field Robotics
40, 521–534. doi:10.1002/rob.22145
Alismail, H., Browning, B., and Lucey, S. (2017). “Photometric bundle adjustment
or vision-based slam,” in Computer Vision–ACCV 2016: 13th Asian Conerence on Chen, H., Yang, Z., Zhao, X., Weng, G., Wan, H., Luo, J., et al. (2020).
Computer Vision, aipei, aiwan, November 20-24, 2016 (Springer), 324–341. Revised Advanced mapping robot and high-resolution dataset. Robotics Aut. Syst. 131, 103559.
Selected Papers, Part IV. doi:10.1016/j.robot.2020.103559
Aloui, K., Guizani, A., Hammadi, M., Haddar, M., and Soriano, . (2022). “Systematic Chou, C., Wang, D., Song, D., and Davis, . A. (2019). “On the tunable
literature review o collaborative slam applied to autonomous mobile robots,” in 2022 sparse graph solver or pose graph optimization in visual slam problems,” in 2019
IEEE Inormation echnologies and Smart Industrial Systems (ISIS), 1–5. IEEE/RSJ International Conerence on Intelligent Robots and Systems (IROS) (IEEE),
1300–1306.
Altawil, B., and Can, F. C. (2023). Design and analysis o a our do robotic arm with
two grippers used in agricultural operations. Int. J. Appl. Math. Electron. Comput. 11, Chu, P. M., Sung, Y., and Cho, K. (2018). Generative adversarial network-based
79–87. doi:10.18100/ijamec.1217072 method or transorming single rgb image into 3d point cloud. IEEE Access 7,
1021–1029. doi:10.1109/access.2018.2886213
Ara, E. (2022). Study and implementation o LiDAR-based SLAM algorithm and
map-based autonomous navigation or a telepresence robot to be used as a chaperon Chung, C.-M., seng, Y.-C., Hsu, Y.-C., Shi, X.-Q., Hua, Y.-H., Yeh, J.-F., et al. (2023).
or smart laboratory requirements. Master’s thesis. “Orbeez-slam: a real-time monocular visual slam with orb eatures and ner-realized
mapping,” in 2023 IEEE International Conerence on Robotics and Automation (ICRA)
Aslan, M. F., Durdu, A., Yuse, A., Sabanci, K., and Sungur, C. (2021). A tutorial:
(IEEE), 9400–9406.
mobile robotics, slam, bayesian lter, keyrame bundle adjustment and ros applications.
Robot Operating Syst. (ROS) Complete Reerence 6, 227–269. Civera, J., Gálvez-López, D., Riazuelo, L., ardós, J. D., and Montiel, J. M.
M. (2011). “owards semantic slam using a monocular camera,” in 2011
Awais, M., and Henrich, D. (2010). Human-robot collaboration by intention
IEEE/RSJ international conerence on intelligent robots and systems (IEEE),
recognition using probabilistic state machines , 75–80.
1277–1284.
Bahraini, M. S., Bozorg, M., and Rad, A. B. (2018). Slam in dynamic environments
Cui, Y., Chen, X., Zhang, Y., Dong, J., Wu, Q., and Zhu, F. (2022). Bow3d: bag o words
via ml-ransac. Mechatronics 49, 105–118. doi:10.1016/j.mechatronics.2017.12.002
or real-time loop closing in 3d lidar slam. IEEE Robotics Automation Lett. 8, 2828–2835.
Beghdadi, A., and Mallem, M. (2022). A comprehensive overview o dynamic visual doi:10.1109/lra.2022.3221336
slam and deep learning: concepts, methods and challenges. Mach. Vis. Appl. 33, 54.
CVG, . U. o. M. (2023). LSD-SLAM: large-scale direct monocular SLAM. Available
doi:10.1007/s00138-022-01306-w
at: https://cvg.cit.tum.de/research/vslam/lsdslam?redirect.
Blo, M., Omari, S., Hutter, M., and Siegwart, R. (2015). “Robust visual inertial
Dai, W., Zhang, Y., Zheng, Y., Sun, D., and Li, P. (2021). Rgb-d slam with moving
odometry using a direct ek-based approach,” in 2015 IEEE/RSJ international
object tracking in dynamic environments. IE Cyber-Systems Robotics 3, 281–291.
conerence on intelligent robots and systems (IROS) (IEEE), 298–304.
doi:10.1049/csy2.12019
Bongard, J. (2008). Probabilistic robotics. sebastian thrun, wolram burgard, and dieter
[Dataset] uz.slaml (2023). ORB-SLAM3. Available at: https://github.com/UZ-
ox. Cambridge, MA, United States: MI press, 647. 2005.
SLAMLab/ORB_SLAM3.
Bruno, H. M. S., and Colombini, E. L. (2021). Lif-slam: a deep-learning
Davison, A. J., Reid, I. D., Molton, N. D., and Stasse, O. (2007). Monoslam: real-
eature-based monocular visual slam method. Neurocomputing 455, 97–110.
time single camera slam. IEEE rans. pattern analysis Mach. Intell. 29, 1052–1067.
doi:10.1016/j.neucom.2021.05.027
doi:10.1109/tpami.2007.1049
Burri, M., Nikolic, J., Gohl, P., Schneider, ., Rehder, J., Omari, S., et al.
De Croce, M., Pire, ., and Bergero, F. (2019). Ds-ptam: distributed stereo
(2016). Te euroc micro aerial vehicle datasets. Int. J. Robotics Res. 35, 1157–1163.
parallel tracking and mapping slam system. J. Intelligent Robotic Syst. 95, 365–377.
doi:10.1177/0278364915620033
doi:10.1007/s10846-018-0913-6
Bustos, A. P., Chin, .-J., Eriksson, A., and Reid, I. (2019). “Visual slam: why bundle
Duan, C., Junginger, S., Huang, J., Jin, K., and Turow, K. (2019). Deep learning
adjust?,” in 2019 international conerence on robotics and automation (ICRA) (IEEE),
or visual slam in transportation robotics: a review. ransp. Sa. Environ. 1, 177–184.
2385–2391.
doi:10.1093/tse/tdz019
Buyval, A., Aanasyev, I., and Magid, E. (2017). “Comparative analysis o ros-based
Durrant-Whyte, H. F. (2012). Integration, coordination and control o multi-sensor
monocular slam methods or indoor navigation,” in Ninth International Conerence on
robot systems, 36. Springer Science and Business Media.
Machine Vision (ICMV 2016) (SPIE), 305–310.
El Bouazzaoui, I., Rodriguez, S., Vincke, B., and El Ouardi, A. (2021). Indoor
Ca, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., and ardós, J. D. (2021). Orb-
visual slam dataset with various acquisition modalities. Data Brie 39, 107496.
slam3: an accurate open-source library or visual, visual–inertial, and multimap slam.
doi:10.1016/j.dib.2021.107496
IEEE rans. Robotics 37, 1874–1890. doi:10.1109/tro.2021.3075644

Frontiers in Robotics and AI 15 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

End, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., and Burgard, W. (2012). “An Heyer, C. (2010). “Human-robot interaction and uture industrial robotics
evaluation o the rgb-d slam system,” in 2012 IEEE international conerence on robotics applications,” in 2010 ieee/rsj international conerence on intelligent robots and systems
and automation (IEEE), 1691–1696. (IEEE), 4749–4754.
Eng, J., Schöps, ., and Cremers, D. (2014). “Lsd-slam: large-scale direct monocular hkust.a (2023). VINS-Mono. Available at: https://github.com/HKUS-Aerial-
slam,” in European conerence on computer vision (Springer), 834–849. Robotics/VINS-Mono.
Engel, J., Stückler, J., and Cremers, D. (2015). “Large-scale direct slam with stereo Hong, S., Bangunharcana, A., Park, J.-M., Choi, M., and Shin, H.-S. (2021). Visual
cameras,” in 2015 IEEE/RSJ international conerence on intelligent robots and systems slam-based robotic mapping method or planetary construction. Sensors 21, 7715.
(IROS) (IEEE), 1935–1942. doi:10.3390/s21227715
eth.a (2023a). OKVIS: open keyrame-based visual-inertial SLAM. Available at: Hsiao, M., Westman, E., Zhang, G., and Kaess, M. (2017). “Keyrame-based dense
https://github.com/ethz-asl/okvis. planar slam,” in 2017 IEEE International Conerence on Robotics and Automation
(ICRA) (Ieee), 5110–5117.
eth.a (2023b). Rovio: robust visual inertial odometry. Available at: https://github.
com/ethz-asl/rovio. Huang, L. (2021). “Review on lidar-based slam techniques,” in 2021 International
Conerence on Signal Processing and Machine Learning (CONF-SPML) (IEEE),
Eudes, A., Lhuillier, M., Naudet-Collette, S., and Dhome, M. (2010).
163–168.
“Fast odometry integration in local bundle adjustment-based visual
slam,” in 2010 20th International Conerence on Pattern Recognition Introlab (2023). RAB-Map. Available at: http://introlab.github.io/rtabmap/.
(IEEE), 290–293.
Ji, ., Wang, C., and Xie, L. (2021). “owards real-time semantic rgb-d slam in
Eyvazpour, R., Shoaran, M., and Karimian, G. (2023). Hardware dynamic environments,” in 2021 IEEE International Conerence on Robotics and
implementation o slam algorithms: a survey on implementation Automation (ICRA) (IEEE), 11175–11181.
approaches and platorms. Arti. Intell. Rev. 56, 6187–6239.
Joo, S.-H., Manzoor, S., Rocha, Y. G., Bae, S.-H., Lee, K.-H., Kuc, .-Y., et al.
doi:10.1007/s10462-022-10310-5
(2020). Autonomous navigation ramework or intelligent robots based on a semantic
Fan, ., Wang, H., Rubenstein, M., and Murphey, . (2020). Cpl-slam: ecient and environment modeling. Appl. Sci. 10, 3219. doi:10.3390/app10093219
certiably correct planar graph-based slam using the complex number representation.
Kasyanov, A., Engelmann, F., Stückler, J., and Leibe, B. (2017). Keyrame-based
IEEE rans. Robotics 36, 1719–1737. doi:10.1109/tro.2020.3006717
visual-inertial online slam with relocalization , 6662–6669.
elix (2023). RGB-D SLAM v2. Available at: https://github.
Kazerouni, I. A., Fitzgerald, L., Dooly, G., and oal, D. (2022). A survey o state-o-
com/elixendres/rgbdslam_v2.
the-art on visual slam. Expert Syst. Appl. 205, 117734. doi:10.1016/j.eswa.2022.117734
Fernández-Moral, E., Jiménez, J. G., and Arévalo, V. (2013). Creating metric-
Kerl, C., Sturm, J., and Cremers, D. (2013). “Dense visual slam or rgb-d cameras,”
topological maps or large-scale monocular slam. ICINCO (2), 39–47.
in 2013 IEEE/RSJ International Conerence on Intelligent Robots and Systems (IEEE),
Fiedler, M.-A., Werner, P., Khalia, A., and Al-Hamadi, A. (2021). Spd: simultaneous 2100–2106.
ace and person detection in real-time or human–robot interaction. Sensors 21, 5918.
Khoyani, A., and Amini, M. (2023). A survey on visual slam algorithms compatible
doi:10.3390/s21175918
or 3d space reconstruction and navigation , 01–06.
Fong, ., Nourbakhsh, I., and Dautenhahn, K. (2003). A survey o socially interactive
Klein, G., and Murray, D. (2007). “Parallel tracking and mapping or small ar
robots. Robotics Aut. Syst. 42, 143–166. doi:10.1016/s0921-8890(02)00372-x
workspaces,” in 2007 6th IEEE and ACM international symposium on mixed and
Gao, B., Lang, H., and Ren, J. (2020). “Stereo visual slam or autonomous vehicles: augmented reality (IEEE), 225–234.
a review,” in 2020 IEEE International Conerence on Systems, Man, and Cybernetics
Kuang, Z., Wei, W., Yan, Y., Li, J., Lu, G., Peng, Y., et al. (2022). A real-time and
(SMC) (IEEE), 1316–1322.
robust monocular visual inertial slam system based on point and line eatures or
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013). Vision meets robotics: the kitti mobile robots o smart cities toward 6g. IEEE Open J. Commun. Soc. 3, 1950–1962.
dataset. Int. J. Robotics Res. 32, 1231–1237. doi:10.1177/0278364913491297 doi:10.1109/ojcoms.2022.3217147
Geiger, A., Lenz, P., and Urtasun, R. (2012). “Are we ready or autonomous driving? Kucner, . P., Magnusson, M., Mghames, S., Palmieri, L., Verdoja, F., Swaminathan,
the kitti vision benchmark suite,” in 2012 IEEE conerence on computer vision and C. S., et al. (2023). Survey o maps o dynamics or mobile robots. Int. J. Robotics Res.,
pattern recognition (IEEE), 3354–3361. 02783649231190428.
Geneva, P., Eckenho, K., Lee, W., Yang, Y., and Huang, G. (2020). “OpenVINS: Labbé, F., and Michaud, M. (2019). Rtab-map as an open-source lidar and visual
a research platorm or visual-inertial estimation,” in Proc. o the IEEE International simultaneous localization and mapping library or large-scale and long-term online
Conerence on Robotics and Automation, Paris, France. operation. J. feld robotics 36, 416–446. doi:10.1002/rob.21831
GPL (2023). Available at: https://github.com/Oxord-PAM/PAM-GPL. Laidlow, ., Czarnowski, J., and Leutenegger, S. (2019). Deepusion: real-time dense
3d reconstruction or monocular slam using single-view depth and gradient predictions
Grisetti, G., Stachniss, C., and Burgard, W. (2007). Improved techniques or grid
, 4068–4074.
mapping with rao-blackwellized particle lters. IEEE rans. Robotics 23, 34–46.
doi:10.1109/tro.2006.889486 Lee, G., Moon, B.-C., Lee, S., and Han, D. (2020). Fusion o the slam with wi--based
positioning methods or mobile robot-based learning data collection, localization, and
Gu, P., Meng, Z., and Zhou, P. (2022). Real-time visual inertial odometry with a
tracking in indoor spaces. Sensors 20, 5182. doi:10.3390/s20185182
resource-ecient harris corner detection accelerator on pga platorm , 10542–10548.
Leut, S., Lynen, S., Bosse, M., Siegwart, R., and Furgale, P. (2015). Keyrame-based
Gurel, C. S. (2018). Real-time 2d and 3d slam using rtab-map, gmapping, and
visual–inertial odometry using nonlinear optimization. Int. J. Robotics Res. 34, 314–334.
cartographer packages. University o Maryland.
doi:10.1177/0278364914554813
Han, Y., Mokhtarzadeh, A. A., and Xiao, S. (2023). Novel cartographer using an oak-d
Leutenegger, S. (2022). Okvis2: realtime scalable visual-inertial slam with loop closure.
smart camera or indoor robots location and navigation. J. Phys. Con. Ser. 2467, 012029.
arXiv preprint arXiv:2202.09199.
doi:10.1088/1742-6596/2467/1/012029
Li, D., Shi, X., Long, Q., Liu, S., Yang, W., Wang, F., et al. (2020). “Dxslam: a robust
Handa, A., Whelan, ., McDonald, J., and Davison, A. J. (2014). “A benchmark
and ecient visual slam system with deep eatures,” in 2020 IEEE/RSJ International
or rgb-d visual odometry, 3d reconstruction and slam,” in 2014 IEEE international
conerence on intelligent robots and systems (IROS) (IEEE), 4958–4965.
conerence on Robotics and automation (ICRA) (IEEE), 1524–1531.
Li, G., Hou, J., Chen, Z., Yu, L., and Fei, S. (2023a). Robust stereo inertial odometry
Hastürk, Ö., and Erkmen, A. M. (2021). Dudmap: 3d rgb-d mapping or dense,
based on sel-supervised eature points. Appl. Intell. 53, 7093–7107. doi:10.1007/s10489-
unstructured, and dynamic environment. Int. J. Adv. Robotic Syst. 18, 172988142110161.
022-03278-w
doi:10.1177/17298814211016178
Li, P., Qin, ., and Shen, S. (2018). “Stereo vision-based semantic 3d object and ego-
Hempel, ., and Al-Hamadi, A. (2020). Pixel-wise motion segmentation
motion tracking or autonomous driving,” in Proceedings o the European Conerence
or slam in dynamic environments. IEEE Access 8, 164521–164528.
on Computer Vision (ECCV), 646–661.
doi:10.1109/access.2020.3022506
Li, Q., Wang, X., Wu, ., and Yang, H. (2022a). Point-line eature usion based eld
Hempel, ., Dinges, L., and Al-Hamadi, A. (2023). “Sentiment-based engagement
real-time rgb-d slam. Comput. Graph. 107, 10–19. doi:10.1016/j.cag.2022.06.013
strategies or intuitive human-robot interaction,” in Proceedings o the 18th
International Joint Conerence on Computer Vision, 680–686. Imaging and Computer Li, S., Zhang, D., Xian, Y., Li, B., Zhang, ., and Zhong, C. (2022b). Overview o deep
Graphics Teory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP. INSICC learning application on visual slam, 102298. Displays.
(SciePress). doi:10.5220/0011772900003417
Li, S., Zheng, P., Liu, S., Wang, Z., Wang, X. V., Zheng, L., et al. (2023b).
Henein, M., Zhang, J., Mahony, R., and Ila, V. (2020). Dynamic slam: the need or Proactive human–robot collaboration: mutual-cognitive, predictable, and
speed, 2123–2129. sel-organising perspectives. Robotics Computer-Integrated Manu. 81, 102510.
doi:10.1016/j.rcim.2022.102510
Hess, W., Kohler, D., Rapp, H., and Andor, D. (2016). “Real-time loop closure in 2d
lidar slam,” in 2016 IEEE international conerence on robotics and automation (ICRA) Li, Y., Guo, Z., Yang, Z., Sun, Y., Zhao, L., and ombari, F. (2023c). Open-structure: a
(IEEE), 1271–1278. structural benchmark dataset or slam algorithms. arXiv preprint arXiv:2310.10931.

Frontiers in Robotics and AI 16 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

Lin, H.-Y., and Yeh, M.-C. (2022). Drif-ree visual slam or mobile robot Palazzolo, E., Behley, J., Lottes, P., Giguere, P., and Stachniss, C. (2019). “Reusion:
localization by integrating uwb technology. IEEE Access 10, 93636–93645. 3d reconstruction in dynamic environments or rgb-d cameras exploiting residuals,”
doi:10.1109/access.2022.3203438 in 2019 IEEE/RSJ International Conerence on Intelligent Robots and Systems (IROS)
(IEEE), 7855–7862.
Liu, Y., and Miura, J. (2021). Rds-slam: real-time dynamic slam using semantic
segmentation methods. Ieee Access 9, 23772–23785. doi:10.1109/access.2021.3050617 Persson, N., Ekström, M. C., Ekström, M., and Papadopoulos, A. V. (2023). “On the
initialization problem or timed-elastic bands,” in Proceedings o the 22nd IFAC World
Lopez, J., Sanchez-Vilarino, P., Cacho, M. D., and Guillén, E. L. (2020). Obstacle
Congress (IFAC WC).
avoidance in dynamic environments based on velocity space optimization. Robotics Aut.
Syst. 131, 103569. doi:10.1016/j.robot.2020.103569 Peter, J., Tomas, M. J., and Mohan, S. (2023). Development o an autonomous
ground robot using a real-time appearance based (rtab) algorithm or enhanced spatial
Luo, H., Pape, C., and Reithmeier, E. (2021). Robust rgbd visual odometry
mapping
using windowed direct bundle adjustment and slanted support plane. IEEE Robotics
Automation Lett. 7, 350–357. doi:10.1109/lra.2021.3126347 Picard, Q., Chevobbe, S., Darouich, M., and Didier, J.-Y. (2023). A survey on real-
time 3d scene reconstruction with slam methods in embedded systems. arXiv preprint
Lynch, C., Wahid, A., ompson, J., Ding, ., Betker, J., Baruch, R., et al. (2023).
arXiv:2309.05349.
Interactive language: talking to robots in real time. IEEE Robotics Automation Lett., 1–8.
doi:10.1109/lra.2023.3295255 Placed, J. A., Strader, J., Carrillo, H., Atanasov, N., Indelman, V., Carlone, L., et al.
(2023). A survey on active simultaneous localization and mapping: state o the art and
Macario Barros, A., Michel, M., Moline, Y., Corre, G., and Carrel, F.
new rontiers. IEEE rans. Robotics 39, 1686–1705. doi:10.1109/tro.2023.3248510
(2022). A comprehensive survey o visual slam algorithms. Robotics 11, 24.
doi:10.3390/robotics11010024 Prati, E., Villani, V., Grandi, F., Peruzzini, M., and Sabattini, L. (2021).
Use o interaction design methodologies or human–robot collaboration
Mane, A. A., Parihar, M. N., Jadhav, S. P., and Gadre, R. (2016). “Data acquisition
in industrial scenarios. IEEE rans. Automation Sci. Eng. 19, 3126–3138.
analysis in slam applications,” in 2016 International Conerence on Automatic Control
doi:10.1109/tase.2021.3107583
and Dynamic Optimization echniques (ICACDO) (IEEE), 339–343.
Qin, ., Li, P., and Shen, S. (2018). Vins-mono: a robust and versatile
Martínez-Otzeta, J. M., Rodríguez-Moreno, I., Mendialdua, I., and Sierra, B. (2022).
monocular visual-inertial state estimator. IEEE rans. Robotics 34, 1004–1020.
Ransac or robotic applications: a survey. Sensors 23, 327. doi:10.3390/s23010327
doi:10.1109/tro.2018.2853729
Mazumdar, H., Chakraborty, C., Sathvik, M., Jayakumar, P., and Kaushik, A.
Ragot, N., Khemmar, R., Pokala, A., Rossi, R., and Ertaud, J.-Y. (2019). “Benchmark
(2023). Optimizing pix2pix gan with attention mechanisms or ai-driven polyp
o visual slam algorithms: orb-slam2 vs rtab-map,” in 2019 Eighth International
segmentation in iomt-enabled smart healthcare. IEEE J. Biomed. Health In., 1–8.
Conerence on Emerging Security echnologies (ES) (IEEE), 1–6.
doi:10.1109/jbhi.2023.3328962
Raikwar, S., Yu, H., and Herlitzius, . (2023). 2d lidar slam localization system or
Meng, X., Gao, W., and Hu, Z. (2018). Dense rgb-d slam with multiple cameras.
a mobile robotic platorm in gps denied environment. J. Biosyst. Eng. 48, 123–135.
Sensors 18, 2118. doi:10.3390/s18072118
doi:10.1007/s42853-023-00176-y
Meng, X., Li, B., Li, B., Li, B., and Li, B. (2022). “Prob-slam: real-time
raulmur (2023a). ORB-SLAM. Available at: https://github.com/raulmur/ORB_
visual slam based on probabilistic graph optimization,” in Proceedings o the
SLAM.
8th International Conerence on Robotics and Articial Intelligence, 39–45.
doi:10.1145/3573910.3573920 raulmur (2023b). ORB-SLAM2. Available at: https://github.com/raulmur/ORB_
SLAM2.
MI.S (2023). Kimera: an open-source library or real-time metric-semantic
localization and mapping. Available at: https://github.com/MI-SPARK/Kimera. Ren, G., Cao, Z., Liu, X., an, M., and Yu, J. (2022). Plj-slam: monocular visual slam
with points, lines, and junctions o coplanar lines. IEEE Sensors J. 22, 15465–15476.
Mohamed, N., Al-Jaroodi, J., and Jawhar, I. (2008). “Middleware or robotics: a
doi:10.1109/jsen.2022.3185122
survey,” in 2008 IEEE Conerence on Robotics, Automation and Mechatronics (Ieee),
736–742. Rintar (2023). dtam-1. Available at: https://github.com/Rintarooo/dtam-1.
Mur-A, J. D., and ars, R. (2014). “Orb-slam: tracking and mapping recognizable,” in Roch, J., Fayyad, J., and Najjaran, H. (2023). Dopeslam: high-precision ros-based
Proceedings o the Workshop on Multi View Geometry in Robotics (MVIGRO)-RSS. semantic 3d slam in a dynamic environment. Sensors 23, 4364. doi:10.3390/s23094364
Mur-Artal, R., Montiel, J. M. M., and ardos, J. D. (2015). Orb-slam: a Ros, A., Abate, M., Chang, Y., and Carlone, L. (2020). “Kimera: an open-source library
versatile and accurate monocular slam system. IEEE rans. robotics 31, 1147–1163. or real-time metric-semantic localization and mapping,” in 2020 IEEE International
doi:10.1109/tro.2015.2463671 Conerence on Robotics and Automation (ICRA) (IEEE), 1689–1696.
Mur-Artal, R., and ardós, J. D. (2017a). Orb-slam2: an open-source slam system Rosinol, A., Violette, A., Abate, M., Hughes, N., Chang, Y., Shi, J., et al. (2021). Kimera:
or monocular, stereo, and rgb-d cameras. IEEE rans. robotics 33, 1255–1262. rom slam to spatial perception with 3d dynamic scene graphs. Int. J. Robotics Res. 40,
doi:10.1109/tro.2017.2705103 1510–1546. doi:10.1177/02783649211056674
Mur-Artal, R., and ardós, J. D. (2017b). Visual-inertial monocular slam with map Scaradozzi, D., Zingaretti, S., and Ferrari, A. (2018). Simultaneous localization and
reuse. IEEE Robotics Automation Lett. 2, 796–803. doi:10.1109/lra.2017.2653359 mapping (slam) robotics techniques: a possible application in surgery. Shanghai Chest
2, 5. doi:10.21037/shc.2018.01.01
Nakamura, ., Kobayashi, M., and Motoi, N. (2023). Path planning or mobile
robot considering turnabouts on narrow road by deep q-network. IEEE Access 11, Schneider, ., Dymczyk, M., Fehr, M., Egger, K., Lynen, S., Gilitschenski,
19111–19121. doi:10.1109/access.2023.3247730 I., et al. (2018). maplab: an open ramework or research in visual-inertial
mapping and localization. IEEE Robotics Automation Lett. 3, 1418–1425.
Navvis (2023). Map orming. Available at: https://www.navvis.com/technology/slam
doi:10.1109/lra.2018.2800113
(Accessed on November 14, 2023).
Schöps, ., Engel, J., and Cremers, D. (2014). “Semi-dense visual odometry or ar on
Ne, R. A., Lovegrove, S. J., and Davison, A. J. (2011). “Dtam: dense tracking and
a smartphone,” in 2014 IEEE international symposium on mixed and augmented reality
mapping in real-time,” in 2011 international conerence on computer vision (IEEE),
(ISMAR) (IEEE), 145–150.
2320–2327.
Servières, M., Renaudin, V., Dupuis, A., and Antigny, N. (2021). Visual and visual-
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., et al.
inertial slam: state o the art, classication, and experimental benchmarking. J. Sensors
(2011). “Kinectusion: real-time dense surace mapping and tracking,” in 2011 10th
2021, 1–26. doi:10.1155/2021/2054828
IEEE international symposium on mixed and augmented reality (Ieee), 127–136.
Sharautdinov, D., Griguletskii, M., Kopanev, P., Kurenkov, M., Ferrer, G., Burkov, A.,
Nguyen, Q. H., Johnson, P., and Latham, D. (2022). Perormance evaluation o ros-
et al. (2023). Comparison o modern open-source visual slam approaches. J. Intelligent
based slam algorithms or handheld indoor mapping and tracking systems. IEEE Sensors
Robotic Syst. 107, 43. doi:10.1007/s10846-023-01812-7
J. 23, 706–714. doi:10.1109/jsen.2022.3224224
Sheng, L., Xu, D., Ouyang, W., and Wang, X. (2019). “Unsupervised collaborative
Nguyen, ., Mann, G. K., Vardy, A., and Gosine, R. G. (2020). Ck-based
learning o keyrame detection and visual odometry towards monocular deep slam,”
visual inertial odometry or long-term trajectory operations. J. Robotics 2020, 1–14.
in Proceedings o the IEEE/CVF International Conerence on Computer Vision,
doi:10.1155/2020/7362952
4302–4311.
Niu, X., Liu, H., and Yuan, J. (2019). “Rgb-d indoor simultaneous location and
Sheridan, . B. (2016). Human–robot interaction: status and challenges. Hum. actors
mapping based on inliers tracking statistics,” in Journal o Physics: Conerence Series
58, 525–532. doi:10.1177/0018720816644364
(IOP Publishing), 1176, 062023.
Soares, J. C. V., Gattass, M., and Meggiolaro, M. A. (2021). Crowd-slam: visual slam
Ortega-Gomez, J. I., Morales-Hernandez, L. A., and Cruz-Albarran, I. A. (2023).
towards crowded environments using object detection. J. Intelligent Robotic Syst. 102,
A specialized database or autonomous vehicles based on the kitti vision benchmark.
50. doi:10.1007/s10846-021-01414-1
Electronics 12, 3165. doi:10.3390/electronics12143165
Soliman, A., Bonardi, F., Sidibé, D., and Bouchaa, S. (2023). Dh-ptam: a deep
Pal, S., Gupta, S., Das, N., and Ghosh, K. (2022). Evolution o simultaneous
hybrid stereo events-rames parallel tracking and mapping system. arXiv preprint
localization and mapping ramework or autonomous robotics—a comprehensive
arXiv:2306.01891
review. J. Aut. Veh. Syst. 2, 020801. doi:10.1115/1.4055161

Frontiers in Robotics and AI 17 rontiersin.org


Al-Tawil et al. 10.3389/robt.2024.1347985

Son, S., Chen, J., Zhong, Y., Zhang, W., Hou, W., and Zhang, L. (2023). Sce-slam: a Wang, Z., Pang, B., Song, Y., Yuan, X., Xu, Q., and Li, Y. (2023). Robust visual-inertial
real-time semantic rgbd slam system in dynamic scenes based on spatial coordinate odometry based on a kalman lter and actor graph. IEEE rans. Intelligent ransp. Syst.
error. Meas. Sci. echnol. 34, 125006. doi:10.1088/1361-6501/aceb7e 24, 7048–7060. doi:10.1109/tits.2023.3258526
Song, K., Li, J., Qiu, R., and Yang, G. (2022). Monocular visual-inertial Wu, W., Guo, L., Gao, H., You, Z., Liu, Y., and Chen, Z. (2022). Yolo-slam: a semantic
odometry or agricultural environments. IEEE Access 10, 103975–103986. slam system towards dynamic environment with geometric constraint. Neural Comput.
doi:10.1109/access.2022.3209186 Appl. 34, 6011–6026. doi:10.1007/s00521-021-06764-3
Song, Y., Zhang, Z., Wu, J., Wang, Y., Zhao, L., and Huang, S. (2021). A right Xiao, L., Wang, J., Qiu, X., Rong, Z., and Zou, X. (2019). Dynamic-slam: semantic
invariant extended kalman lter or object based slam. IEEE Robotics Automation Lett. monocular visual localization and mapping based on deep learning in dynamic
7, 1316–1323. doi:10.1109/lra.2021.3139370 environment. Robotics Aut. Syst. 117, 1–16. doi:10.1016/j.robot.2019.03.012
Sousa, R. B., Sobreira, H. M., and Moreira, A. P. (2023). A systematic literature Xu, C., Liu, Z., and Li, Z. (2021). Robust visual-inertial navigation system or low
review on long-term localization and mapping or mobile robots. J. Field Robotics 40, precision sensors under indoor and outdoor environments. Remote Sens. 13, 772.
1245–1322. doi:10.1002/rob.22170 doi:10.3390/rs13040772
Steinbrücker, F., Sturm, J., and Cremers, D. (2011). “Real-time visual odometry Yan, L., Hu, X., Zhao, L., Chen, Y., Wei, P., and Xie, H. (2022). Dgs-slam: a ast
rom dense rgb-d images,” in 2011 IEEE international conerence on computer vision and robust rgbd slam in dynamic environments combined by geometric and semantic
workshops (ICCV Workshops) (IEEE), 719–722. inormation. Remote Sens. 14, 795. doi:10.3390/rs14030795
Strazdas, D., Hintz, J., Felßberg, A.-M., and Al-Hamadi, A. (2020). Robots and Yang, X., Li, H., Zhai, H., Ming, Y., Liu, Y., and Zhang, G. (2022). “Vox-usion:
wizards: an investigation into natural human–robot interaction. IEEE Access 8, dense tracking and mapping with voxel-based neural implicit representation,” in 2022
207635–207642. doi:10.1109/access.2020.3037724 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (IEEE),
499–507.
Sumikura, S., Shibuya, M., and Sakurada, K. (2019). “Openvslam: a versatile visual
slam ramework,” in Proceedings o the 27th ACM International Conerence on Yousi, K., Bab-Hadiashar, A., and Hoseinnezhad, R. (2015). An overview to visual
Multimedia, 2292–2295. odometry and visual slam: applications to mobile robotics. Intell. Ind. Syst. 1, 289–311.
doi:10.1007/s40903-015-0032-7
Sun, Y., Liu, M., and Meng, M. Q.-H. (2017). Improving rgb-d slam in
dynamic environments: a motion removal approach. Robotics Aut. Syst. 89, 110–122. Zang, Q., Zhang, K., Wang, L., and Wu, L. (2023). An adaptive orb-slam3 system or
doi:10.1016/j.robot.2016.11.012 outdoor dynamic environments. Sensors 23, 1359. doi:10.3390/s23031359
aheri, H., and Xia, Z. C. (2021). Slam; denition and evolution. Eng. Appl. Arti. Zhang, J., Zhu, C., Zheng, L., and Xu, K. (2021a). Roseusion: random optimization
Intell. 97, 104032. doi:10.1016/j.engappai.2020.104032 or online dense reconstruction under ast camera motion. ACM rans. Graph. (OG)
40, 1–17. doi:10.1145/3476576.3476604
aketomi, ., Uchiyama, H., and Ikeda, S. (2017). Visual slam algorithms: a survey
rom 2010 to 2016. IPSJ rans. Comput. Vis. Appl. 9, 16–11. doi:10.1186/s41074-017- Zhang, Q., and Li, C. (2023). Semantic slam or mobile robots in dynamic
0027-2 environments based on visual camera sensors. Meas. Sci. echnol. 34, 085202.
doi:10.1088/1361-6501/acd1a4
Teodorou, C., Velisavljevic, V., Dyo, V., and Nonyelu, F. (2022). Visual slam
algorithms and their application or ar, mapping, localization and waynding. Array Zhang, S., Zheng, L., and ao, W. (2021b). Survey and evaluation o rgb-d slam. IEEE
15, 100222. doi:10.1016/j.array.2022.100222 Access 9, 21367–21387. doi:10.1109/access.2021.3053188
ian, Y., Chang, Y., Quang, L., Schang, A., Nieto-Granda, C., How, J. P., et al. (2023a). Zhang, W., Wang, S., Dong, X., Guo, R., and Haala, N. (2023). Bam-slam: bundle
Resilient and distributed multi-robot visual slam: datasets, experiments, and lessons adjusted multi-sheye visual-inertial slam using recurrent eld transorms. arXiv
learned. arXiv preprint arXiv:2304.04362. preprint arXiv:2306.01173
ian, Y., Chang, Y., Quang, L., Schang, A., Nieto-Granda, C., How, J. P., et al. (2023b). Zhang, X., Liu, Q., Zheng, B., Wang, H., and Wang, Q. (2020). A visual
Resilient and distributed multi-robot visual slam: datasets, experiments, and lessons simultaneous localization and mapping approach based on scene segmentation
learned. arXiv preprint arXiv:2304.04362. and incremental optimization. Int. J. Adv. Robotic Syst. 17, 172988142097766.
doi:10.1177/1729881420977669
ourani, A., Bavle, H., Sanchez-Lopez, J. L., and Voos, H. (2022). Visual slam: what
are the current trends and what to expect? Sensors 22, 9297. doi:10.3390/s22239297 Zhang, X., Su, Y., and Zhu, X. (2017). “Loop closure detection or visual slam
systems using convolutional neural network,” in 2017 23rd International Conerence
sintotas, K. A., Bampis, L., and Gasteratos, A. (2022). Te revisiting problem in
on Automation and Computing (ICAC) (IEEE), 1–6.
simultaneous localization and mapping: a survey on visual loop closure detection. IEEE
rans. Intelligent ransp. Syst. 23, 19929–19953. doi:10.1109/tits.2022.3175656 Zheng, P., Li, S., Xia, L., Wang, L., and Nassehi, A. (2022). A visual reasoning-based
approach or mutual-cognitive human-robot collaboration. CIRP Ann. 71, 377–380.
tum.v (2023). DVO-SLAM: direct visual odometry or monocular cameras. Available
doi:10.1016/j.cirp.2022.04.016
at: https://github.com/tum-vision/dvo_slam.
Zheng, S., Wang, J., Rizos, C., Ding, W., and El-Moway, A. (2023). Simultaneous
Ullah, I., Su, X., Zhang, X., and Choi, D. (2020). Simultaneous localization and
localization and mapping (slam) or autonomous driving: concept and analysis. Remote
mapping based on kalman lter and extended kalman lter. Wirel. Commun. Mob.
Sens. 15, 1156. doi:10.3390/rs15041156
Comput. 2020, 1–12. doi:10.1155/2020/2138643
Zhou, L., Koppel, D., Ju, H., Steinbruecker, F., and Kaess, M. (2020). “An ecient
Van Nam, D., and Gon-Woo, K. (2021). “Solid-state lidar based-slam: a concise
planar bundle adjustment algorithm,” in 2020 IEEE International Symposium on Mixed
review and application,” in 2021 IEEE International Conerence on Big Data and Smart
and Augmented Reality (ISMAR) (IEEE), 136–145.
Computing (BigComp) (IEEE), 302–305.
Zhu, Z., Peng, S., Larsson, V., Xu, W., Bao, H., Cui, Z., et al. (2022). Nice-slam: neural
Wang, H., Ko, J. Y., and Xie, L. (2022). Multi-modal semantic slam or complex dynamic
implicit scalable encoding or slam , 12786–12796.
environments. arXiv preprint arXiv:2205.04300.

Frontiers in Robotics and AI 18 rontiersin.org

You might also like