A review of visual SLAM for robotics_ evolution, properties, and future applications - frobt-11-1347985
A review of visual SLAM for robotics_ evolution, properties, and future applications - frobt-11-1347985
*CORRESPONDENCE
Basheer Al-Tawil, Visual simultaneous localization and mapping (V-SLAM) plays a crucial role
[email protected] in the eld o robotic systems, especially or interactive and collaborative
RECEIVED 01 December 2023 mobile robots. The growing reliance on robotics has increased complexity in
ACCEPTED 20 February 2024 task execution in real-world applications. Consequently, several types o V-
PUBLISHED 10 April 2024
SLAM methods have been revealed to acilitate and streamline the unctions
CITATION o robots. This work aims to showcase the latest V-SLAM methodologies,
Al-Tawil B, Hempel T, Abdelrahman A and
Al-Hamadi A (2024), A review o visual SLAM
oering clear selection criteria or researchers and developers to choose
or robotics: evolution, properties, and uture the right approach or their robotic applications. It chronologically presents
applications. the evolution o SLAM methods, highlighting key principles and providing
Front. Robot. AI 11:1347985.
doi: 10.3389/robt.2024.1347985
comparative analyses between them. The paper ocuses on the integration o
the robotic ecosystem with a robot operating system (ROS) as Middleware,
COPYRIGHT
© 2024 Al-Tawil, Hempel, Abdelrahman and explores essential V-SLAM benchmark datasets, and presents demonstrative
Al-Hamadi. This is an open-access article gures or each method’s workfow.
distributed under the terms o the Creative
Commons Attribution License (CC BY). The
KEYWORDS
use, distribution or reproduction in other
orums is permitted, provided the original V-SLAM, interactive mobile robots, ROS, benchmark, Middleware, workfow, robotic
author(s) and the copyright owner(s) are
applications, robotic ecosystem
credited and that the original publication in
this journal is cited, in accordance with
accepted academic practice. No use,
distribution or reproduction is permitted
which does not comply with these terms. 1 Introduction
Robotics is an interdisciplinary eld that involves the creation, design, and operation
o tasks using algorithms and programming (Bongard, 2008; Joo et al., 2020; Awais
and Henrich 2010; Fong et al., 2003). Its impact extends to manuacturing, automation,
optimization, transportation, medical applications, and even NASA’s interplanetary
exploration (Li et al., 2023b; Heyer, 2010; Sheridan, 2016; Mazumdar et al., 2023). Service
robots, which interact with people, are becoming more common and useul in everyday
lie (Hempel et al., 2023; Lynch et al., 2023). Te imperative o integrating automation with
human cognitive abilities becomes evident in acilitating a successul collaboration between
humans and robots. Tis helps service robots be more eective in dierent situations
where they interact with people (Prati et al., 2021; Strazdas et al., 2020; Zheng et al., 2023).
Furthermore, using multiple robots together can help them handle complex tasks better
(Zheng et al., 2022; Li et al., 2023b; Fiedler et al., 2021). o manage and coordinate various
processes, a robot operating system (ROS) plays a signicant role (Buyval et al., 2017).
It is an open-source ramework that aids roboticists in implementing their research and
projects with minimal complexity. ROS oers a multitude o eatures, including hardware
integration, control mechanisms, and seamless device implementation into the system, thus
acilitating the development and operation o robotic systems (Altawil and Can 2023).
FIGURE 1
Article organizational chart.
As shown in Figure 1, the paper is divided into six sections. depends on the map, mapping depends on localization. Tus, the
Section 1 gives the brie introduction about robotics and SLAM. question is known as the “Chicken and Egg” question (aheri
Section 2 presents an overview o the V-SLAM paradigm that delves and Xia, 2021). In robotics, there are dierent tools to help
into its undamental concepts. robots obtain inormation rom surroundings and build their
Section 3 presents the state-o-the-art V-SLAM methods, map. One way is to use sensors such as LiDAR, which uses
oering insights into the latest advancements o them. Moving light detection and ranging sensors to make a 3D map (Huang,
orward, section 4 explores the evolution o V-SLAM and discusses 2021; Van Nam and Gon-Woo, 2021). Another way is to use
the most commonly used datasets. Section 5 ocuses on techniques cameras, such as monocular and stereo cameras, which are
or evaluating SLAM methods, aiding in the selection o appropriate applied in visual SLAM (V-SLAM). In this method, the robot
methods. Finally, Section 6 provides the conclusion o the article, uses pictures to gure out where it is and creates the required
summarizing the key points we discovered while working on our map (Davison et al., 2007). Regarding the paper’s intensive details,
review paper. we provide able 1 that summarizes and includes the description
Recently, we require robots that can move around and o abbreviations used in the article based on SLAM principles
work well in places they have never been beore. In this and undamentals.
regard, simultaneous localization and mapping (SLAM) emerges Due to the signicance o visual techniques in interactive robotic
as a undamental approach or these robots. Te primary goal applications, our research ocuses on V-SLAM methodologies and
o SLAM is to autonomously explore and navigate unknown their evaluation. V-SLAM can be applied to mobile robotics that
environments by simultaneously creating a map and determining utilizes cameras to create a map o their surroundings and easily
their own position (Durrant-Whyte, 2012; Mohamed et al., 2008). locate themselves within their work space (Li et al., 2020). It uses
Furthermore, it provides real-time capabilities, allowing robots to techniques such as computer vision to extract and match visual data
make decisions on-the-y without relying on pre-existing maps. Its or localization and mapping (Zhang et al., 2020; Chung et al., 2023).
utility extends to the extraction, organization, and comprehension o It allows robots to map complex environments while perorming
inormation, thereby enhancing the robot’s capacity to interpret and tasks such as navigation in dynamic elds (Placed et al., 2023;
interact eectively with its environment (Pal et al., 2022; Lee et al., Khoyani and Amini 2023). It places a strong emphasis on accurate
2020; Aslan et al., 2021). It is crucial to enable these robots to tracking o camera poses and estimating past trajectories o the robot
autonomously navigate and interact in human environments, thus during its work (Nguyen et al., 2022; Awais and Henrich 2010).
reducing human eort and enhancing overall productivity (Ara, Figure 2 provides a basic understanding o V-SLAM. It takes an
2022). Te construction o maps is based on the utilization o sensor image rom the environment as an input, processes it, and produces
data, such as visual data, laser scanning data, and data rom the a map as an output. In V-SLAM, various types o cameras are
inertial measurement unit (IMU), ollowed by rapid processing used to capture images or videos. A commonly used camera is the
(Macario Barros et al., 2022). monocular camera, which has a single lens, providing 2D visual
Historically, prior to the advent o SLAM technology, inormation (Civera et al., 2011). However, due to its limitation o
localization and mapping were treated as distinct entities. lacking depth inormation, researchers ofen turn to stereo cameras,
However, it was seen that there is a strong internal dependency which are equipped with two lenses set at a specic distance to
between mapping and localization. Although accurate localization capture images rom dierent perspectives, enabling depth details
TABLE 1 List o abbreviations used in this article. Kinect and stereo cameras, suitable or robust and accurate SLAM
systems (Luo et al., 2021).
Abbreviation Explanation Abbreviation Explanation
Previous research demonstrated the eectiveness o V-SLAM
V-SLAM Visual LSD Large-scale methods, but they are ofen explained with very ew details and
simultaneous direct separate gures (Khoyani and Amini, 2023; Fan et al., 2020), making
localization and it challenging to understand, compare, and make selections among
mapping
them. As a result, our study ocuses on simpliying the explanation
ROS Robot OKVIS Open o V-SLAM methodologies to enable readers to comprehend them
Operating keyrame-based easily. Te main contributions o the study can be described as
System visual–inertial ollows:
Lidar Light detection DVO Dense visual
and ranging odometry • Investigation into V-SLAM techniques to determine the most
appropriate tools or use in robotics.
BA Bundle RPGO Robust • Creation o a graphical and illustrative structural workow or
adjustment pose-graph each method to enhance the comprehension o the operational
optimization
processes involved in V-SLAM.
BoW Bag o words IMU Inertial • Presentation o signicant actors or the evaluation and
measurement selection criteria among the V-SLAM methods.
unit • Compilation o a comparative table that lists essential
parameters and eatures or each V-SLAM method.
PAM Parallel GPS Global
tracking and positioning • Presentation and discussion o relevant datasets employed
mapping system within the domain o robotics applications.
FIGURE 2
Schematic representation o a robotic system’s architecture, highlighting the incorporation o SLAM and its location within the system.
Te system gathers data, with a particular emphasis on crucial comprising obstacles, topography, and occupancy. It unctions as
ltering details aimed at eectively eliminating any noise present in a undamental data structure or several robotics navigation and
the input data (Mane et al., 2016; Grisetti et al., 2007). Te rened localization techniques (Grisetti et al., 2007). A eature-based map
data are then sent to the next stage or urther processing to extract is a representation which captures the eatures o the environment,
eatures rom the input inormation (Ai et al., 2021). As a result, such as landmarks or objects, to acilitate localization and navigation
progress in SLAM methods has resulted in the creation o numerous tasks (Li et al., 2022a). A point cloud map is a representation o a
datasets accessible to researchers to evaluate V-SLAM algorithms physical space or object made rom lots o 3D dots, showing how
(El Bouazzaoui et al., 2021). things are arranged in a place. It is created using special cameras or
sensors and helps robots and computers understand what is around
them (Chu et al., 2018).
2.2 System localization Afer setting up keyrames during the localization stage, the
workow progresses to eld modeling. Ten, key points and eature
In the second stage o V-SLAM, the system ocuses on nding lines are identied and detected, which is crucial or generating a
its location, which is an important part o the entire process map (Schneider et al., 2018). It is a process that builds and updates
(Scaradozzi et al., 2018). It involves the execution o various the map o an unknown environment and is used to continuously
processes that are crucial or successully determining where the track the robot’s location (Chen et al., 2020). It is a two-way process
robot is. Feature tracking plays a central role during this phase, with that works together with the localization process, where they depend
a primary ocus on tasks such as eature extraction, matching, re- on each other to achieve SLAM processes. It gathers real-time
localization, and pose estimation (Picard et al., 2023). It aims to align data about the surroundings, creating both a geometric and a
and identiy the rames that guide the estimation and creation o the visual model r13 (accessed on 14 November 2023). In addition, the
initial keyrame or the input data (Ai et al., 2021). A keyrame is a process includes the implementation o bundle adjustments (BAs)
set o video rames that includes a group o observed eature points to improve the precision o the generated map beore it is moved
and the camera’s poses. It plays an important role or the tracking to the nal stage (Acosta-Amaya et al., 2023). BA is a tool that
and localization process, helping in eliminating drif errors or simultaneously renes the parameters essential or estimating and
camera poses attached to the robot (Sheng et al., 2019; Hsiao et al., reconstructing the location o observed points in available images.
2017). Subsequently, this keyrame is sent or urther processing in It plays a crucial role in eature-based SLAM (Bustos et al., 2019;
the next stage, where it will be shaped into a preliminary map, a Eudes et al., 2010).
crucial part or the third stage o the workow (Aloui et al., 2022;
Zhang et al., 2020).
2.4 System loop closure and process
tuning
2.3 System map ormation
Te nal stage in the V-SLAM workow involves ne-tuning
Te third stage o the V-SLAM workow ocuses on the the process and closing loops, resulting in the optimization o the
crucial task o building the map, an essential element in V-SLAM nal map. In V-SLAM, the loop closure procedure examines and
processes. Various types o maps can be generated using SLAM, maintains previously visited places, xing any errors that might
including topological maps, volumetric (3D) maps, such as point have occurred during the robot’s exploration within an unknown
cloud and occupancy grid maps, and eature-based or landmark environment. Tese errors typically result rom the estimation
maps. Te choice o the map type is based on actors such as processes perormed in earlier stages o the SLAM workow
the sensors employed, application requirements, environmental (sintotas et al., 2022; Hess et al., 2016). Loop closure and process
assumptions, and the type o dataset used in robotic applications tuning can be done using dierent techniques, such as the extended
(aheri and Xia, 2021; Fernández-Moral et al., 2013). In robotics, Kalman lter SLAM (EKF-SLAM). EKF-SLAM combines loop
a grid map is a representation o a physical environment, with closure and landmark observation data to adjust the map in the
each cell representing a particular location and storing data Kalman lter’s state estimate. Tis tool helps address uncertainties
FIGURE 3
Visual SLAM architecture: an overview o the our core components necessary or visual SLAM: data acquisition, system localization, system mapping,
and system loop closure, and process tuning, enabling mobile robots to perceive, navigate, and interact with their environment.
in the surrounding world (map) and localize the robot within it have collectively simplied and enhanced its strategy in real-lie
(Song et al., 2021; Ullah et al., 2020). applications (Beghdadi and Mallem, 2022; Duan et al., 2019).
Te bag-o-words (BoW) approach is another technique Te landscape o V-SLAM is composed o a variety o
used to enable robots to recognize and recall previously visited methodologies, which can be divided into three categories, namely,
locations. Tis is similar to how humans remember places they only visual SLAM, visual-inertial SLAM, and RGB-D SLAM
have been to in the past, even afer a long time, due to the (Macario Barros et al., 2022; Teodorou et al., 2022), as shown in
activities that took place there. BoW works by taking the visual Figure 4. In this section, we provide a brie overview o the current
eatures o each image and converting them into a histogram state-o-the-art V-SLAM algorithms and techniques, including
o visual words. Tis histogram is then used to create a xed- their methodology, eciency, time requirements, and processing
size vector representation o the BoW, which is stored or capacity, as well as whether they are designed to run on-board or
use in matching and loop-closing processes (Cui et al., 2022; o-board computer systems (ourani et al., 2022). Additionally, we
sintotas et al., 2022). combine various graphical representations to create a single and
Finally, graph optimization is used as a correction tool or loop comprehensive visual representation o the method workow, as
closure processes. It renes the nal map and robot’s trajectory by shown in Figure 5.
optimizing the graph based on landmarks. Tis technique involves
a graph-based representation o the SLAM issue, where vertices
represent robot poses and map characteristics and edges represent 3.1 Only visual SLAM
constraints or measurements between the poses. It is commonly
used as a correction tool in graph-based SLAM types (Zhang et al., It is a SLAM system designed to map the environment around
2017; Chou et al., 2019; Meng et al., 2022). the sensors while simultaneously determining the precise location
In conclusion, these comprehensive workow processes and orientation o those sensors within their surroundings. It
outlined in Sections 2.1, 2.2, 2.3, and 2.4, respectively, play an relies entirely on visual data or estimating sensor motion and
important role in V-SLAM or robotics as they acilitate the reconstructing environmental structures (aketomi et al., 2017).
simultaneous creation o maps and real-time location tracking It uses monocular, RGB-D, and stereo cameras to scan the
within the operational environment (Li et al., 2022b). environment, helping robots map unamiliar areas easily. Tis
approach has attracted attention in the literature because it is
cost-eective, easy to calibrate, and has low power consumption
3 State-o-the-art o visual SLAM in monocular cameras while also allowing depth estimation and
methods high accuracy in RGB-D and stereo cameras (Macario Barros et al.,
2022; Abbad et al., 2023). Te methods used in this part can be
V-SLAM plays a signicant role as a transormative topic listed herein.
within the robotics industry and research (Khoyani and Amini,
2023; Acosta-Amaya et al., 2023). Te progress in this eld can 3.1.1 PTAM-SLAM
be attributed to tools such as machine learning, computer vision, PAM-SLAM, which stands or parallel tracking and mapping
deep learning, and state-o-the-art sensor technologies, which (PAM), is a monocular SLAM used or real-time tracking systems.
FIGURE 5
Visual SLAM methods, illustrating the state-o-the-art method and workfow or select notable SLAM methods eatured in this study, presented in a
simplied view.
is used in various applications such as robotics and sel-driving cars LSD and DVO-SLAM processes can unction similarly, and
(Mur-Artal et al., 2015; Eng et al., 2014); see able 2. their workow is structured in ve stages (Macario Barros et al.,
LSD-SLAM distinguishes itsel rom the DAM-SLAM 2022; Luo et al., 2021; Schöps et al., 2014; Engel et al., 2015). Te
approach by ocusing on areas with strong intensity changes, leaving rst stage includes inputting mono- and stereo data and preparing
out regions with little or no texture details. Tis choice comes rom them or the next processing step. Te second stage is designed or
the challenge o guring out how ar things are in areas where there tracking and estimating the initial pose by aligning images rom
is not much texture inside images. As a result, LSD-SLAM goes both mono and stereo cameras. Te third stage is dedicated to loop
beyond what DAM can do by concentrating on places with strong closure processes, involving keyrame preparation, regularization,
changes in brightness and ignoring areas with very little texture and data updates to prepare rames or subsequent stages. Te ourth
(Acosta-Amaya et al., 2023; Khoyani and Amini, 2023). stage carries out map optimization, including two critical phases,
Sensor ILR
PTAM K. ✓ × × × M-H Pose-estimation Robotics, AR, and +++ ++++ ODROID-XU4, GPL (2023)
and Mu 3D mapping VR Intel Quad-Core
DTAM Ne. ✓ ✓ × RGBD S-I extured depth Robotics, AR, VR, ++ +++ nvidia.gtx.480.gpu, Rintar (2023)
et al map AGV, and gpgpu-Processors
simulators
RTAB. M ✓ ✓ ✓ Lidar L-H 2D and 3D Robotics, VR, AR, +++ +++ Jetson Nano, Intel Introlab (2023)
Labbé mapping and 3D Core.i5.8th.gen
reconstruction
ORB.S ✓ × × × M-H ree-spanning and Robotics mapping ++++ +++ Intel raulmur (2023a)
Mur-A pose-estimating indoor navigation Core.i7.4700MQ
ORB.S2 ✓ ✓ × RGBD M-H Point-mapping Mobile mapping, ++++ ++++ Intel Core-i7.4790 raulmur (2023b)
Leut et al and keyrame robotics, VR, and and
selection UAVs RealSense-D435
ORB.S3 Ca. ✓ ✓ ✓ sh.e L-H 2D and 3D-Map Robotics, security, +++++ +++++ Jetson-tx2, pi.3B + uz.slaml (2023)
et al and tree-spanning and 3D nvidia.georce
reconstruction
RGBD.S × × ✓ RGBD L-H Maps, trajectories 3D-scanning, +++ ++++ Intel Core.i9.9900k elix. (2023)
End et al and 3D point robotics and UAVs and Quad
cloud Core.cpu.8.GB
SCE.S Son × ✓ × RGBD M-I Camera pose and Robotics, AR, and ++++ +++ nvidia.Jetson.AGX, None
et al Semantic Map AGV 512.core.Volta.GPU
OKVIS ✓ ✓ ✓ × M-H Graph estimation Robotics, UAVs, ++++ ++++ Up-Board, eth.a (2023a)
Leut et al and eature and VR ODROID.xu4, and
tracking ®
Intel CoreM.i7
ROVIO ✓ ✓ ✓ sh.e L-H Position and Robotics, AR, and +++ +++ ODROID-xu4 and eth.a (2023b)
Blo. et al orientation depth sel-driving. cars Intel i7-2760QM
map
VINS.M ✓ × ✓ × L-H Keyrame database Robotics, AR, and +++ +++ Intel Pentium, Intel hkust.a (2023)
Qin et al pose estimation VR Core i7-4790 CPU
LSD.S Eng ✓ ✓ × RGBD L-H Keyrame Robotics and ++++ +++++ pga.zynq.7020.soc CVG, . U. o. M. (2023)
et al selection and 3D
mapping
sel-driving cars ®
Intel NUC6i3SYH
DVO.S Kerl × ✓ × RGBD S-I 3D mapping Robotics and AR +++ +++ Sony Xperia.z1, tum.v (2023)
et al image alignment Perception Intel Xeon E5520
Kimera.S ✓ ✓ ✓ Lidar M-H rajectory Robotics, UAV, ++++ +++++ Not mentioned MI.S (2023)
Ros. et al estimate semantic VR, and AGV
mesh
-ILR, illumination and light robustness—evaluates how well each SLAM method responds to varying environmental lighting.
-RoLI, range o light intensity—measures the robot’s ability to operate eectively across a broad spectrum o light intensities, rom very dark to very bright.
-2D, tolerance to directionality—assesses the robot’s capability to unction in environments with strong directional light sources, such as spotlights and windows.
-W-S, denes the operational scale and application eld o the robot (M, medium; L, large; S, small, H, hybrid, I, indoor).
-S.M, sources and materials—provides links to the source codes used in the method.
-VINS.M.S, VINS-Mono SLAM; M, monocular camera; S, stereo camera; IMU, inertial measurement unit; O, other sensors; sh.e, sh-eye camera; rgbd, RGB-D camera.
which are direct mapping and eature-based mapping. It also covers its pointsassesses their perormance under varyin with semi-dense
processes such as activation, marginalization, and direct bundle adjustments or use in the output stage. In the nal stage, the
adjustment. Tese operations shape the necessary map and manage estimated camera trajectory and pose with the dense 3D map are
prepared or application in robotics’ SLAM unctions; see Figure 5, We have structured the OKVIS-SLAM workow into three key
part 14 or a detailed workow. phases (Leutenegger, 2022; Kasyanov et al., 2017; Wang et al., 2023).
Te rst phase ocuses on receiving initial sensor inputs, including
3.1.4 DVO-SLAM IMU and visual data. It initializes the system, conducts IMU
DVO-SLAM, which stands or dense visual odometry SLAM, is integration, and employs tracking techniques to prepare the data or
designed to acilitate real-time motion estimation and map creation subsequent processing. Te second phase is the real-time estimator
using depth-sensing devices, such as stereo and mono cameras and odometry ltering phase, covering various operations, such
(Schöps et al., 2014). It stands out or its ability to generate detailed as landmark triangulation and status updating. Te triangulation
and accurate environment maps while tracking the position and process is used or estimation used to generate the 3D position
orientation (Luo et al., 2021; Zhu et al., 2022). DVO-SLAM uses o visual landmarks to enhance SLAM operation (Yousi et al.,
point-to-plane metrics in photo metric bundle adjustment (PBA), 2015). In the last phase, optimization and ull graph estimation
enhancing the navigation o robotic systems, especially in situations are perormed. Tis includes loop closure detection, window
with less textured points. Te point-to-plane metric is a cost unction sliding, and marginalization. Te phase selects relevant rames and
and optimization tool that is used to optimize the depth sensor optimizes the overall graph structure, ultimately providing essential
poses and plane parameters or 3D reconstruction (Alismail et al., outputs or the SLAM system; see Figure 5, part 11.
2017; Zhou et al., 2020; Newcombe et al., 2011). Tese eatures make
DVO-SLAM suitable or more accurate applications such as in 3.2.2 ROVIO-SLAM
robotics and augmented reality (AR), and it is robust or operating in ROVIO-SLAM, which stands or robust visual-inertial
slightly unstable light sources (Khoyani and Amini, 2023; Kerl et al., odometry SLAM, is a cutting-edge sensor usion method that
2013); see able 2. smoothly combines visual and inertial data. Tis integration
signicantly enhances navigation accuracy, leading to improved
work eciency in robotics systems (Blo et al., 2015; Wang et al.,
3.2 Visual-inertial SLAM 2023). It brings valuable attributes or robotics, excelling in robust
perormance in challenging environments, and presents a smooth
VI-SLAM is a technique that combines the capabilities o interaction between the robot and its surroundings (Li et al., 2023a).
visual sensors, such as stereo cameras, and inertial measurement It eciently handles extensive mapping processes, making it suitable
sensors (IMUs) to achieve its SLAM objectives and operations or large-scale applications (Kasyanov et al., 2017). Moreover, it
(Servières et al., 2021; Leut et al., 2015). Tis hybrid approach allows operates with low computational demands and high robustness to
a comprehensive modeling o the environment, where robots light, making it ideal or cost-eective robotic platorms designed
operate (Zhang et al., 2023). It can be applied to various real-world or sustained, long-term operations (Leutenegger, 2022).
applications, such as drones and mobile robotics (aketomi et al., ROVIO-SLAM workow is divided into three stages
2017). Te integration o IMU data enhances and augments (Picard et al., 2023; Nguyen et al., 2020; Schneider et al., 2018).
the inormation available or environment modeling, resulting First, data rom visual cameras and IMU are obtained and prepared
in improved accuracy and reduced errors within the system’s or processing. In the next stage, eature detection, tracking, and
unctioning (Macario Barros et al., 2022; Mur-Artal and ardós semantic segmentation are done or visual data, while IMU data are
2017b). Te methods and algorithms used in this approach, while prepared or integration rom the other side. Te processing stage
implemented in real-lie applications, can be listed as shown in the involves loop closure operations, new keyrames insertion, and state
ollowing section. transition, along with data ltering. State transitions lead to the
generation o the key output, which is then transerred to the nal
3.2.1 OKVIS-SLAM stage, providing estimated position, orientation, and 3D landmarks;
OKVIS-SLAM, which stands or open keyrame-based see Figure 5, part 8.
visual-inertial SLAM, is designed or robotics and computer
vision applications that require real-time 3D reconstruction, 3.2.3 VINS Mono-SLAM
object tracking, and position estimation (Kasyanov et al., 2017). VINS Mono-SLAM, which stands or the visual-inertial
It combines visual and inertial measurements to accurately navigation system, is an advanced sensor usion technology that
predict the position and orientation o a robot simultaneously precisely tracks the motion and position o a robot or sensor in
(Leut et al., 2015). real-time. Utilizing only a single camera and an IMU, it combines
It accurately tracks the camera’s position and orientation in real- visual and inertial data to enhance accuracy and ensure precise
time control during a robot’s motion (Leutenegger, 2022). It uses unctionality o robot operations (Mur-Artal and ardós, 2017b).
image retrieval to connect keyrames in the SLAM pose-graph, aided Known or its eciency in creating maps and minimizing drif
by the pose estimator or locations beyond the optimization window errors, VINS-Mono excels in navigating challenging environments
o visual–inertial odometry (Kasyanov et al., 2017; Wang et al., with dynamic obstacles (Bruno and Colombini, 2021). Its smooth
2023). For portability, a lightweight semantic segmentation CNN perormance in dicult lighting conditions highlights its reliability,
is used to remove dynamic objects during navigation (Leutenegger, ensuring optimal unctionality or mobile robots operating in
2022). OKVIS’s real-time precision and resilience make it suitable unstable lighting conditions (Song et al., 2022; Kuang et al., 2022).
or various applications, including robotics and unmanned aerial Tis power-ecient, real-time monocular VIO method is suitable
vehicles (UAVs). It can operate eectively in complex and unstable or visual SLAM applications in robotics, virtual reality, and
illumination environments (Wang et al., 2023); see able 2. augmented reality (Gu et al., 2022); see able 2.
Te VINS-Mono SLAM workow is organized into our stages areas with low-textured suraces (Zhang et al., 2021b). Te objective
(Qin et al., 2018; Xu et al., 2021). In the rst stage, we gathered o RGB-D SLAM is to generate a precise 3D reconstruction or the
visual and inertial data and prepared them or acquisition and system surroundings, with a ocus on the acquisition o geometric
measurement processing, including eature extraction, matching, data to build a comprehensive 3D model (Chang et al., 2023). Te
and IMU data preparation, and sent them or visual and inertial methods used in this section are listed as ollows:
alignment. Te second stage handles loop closure operations and
re-localization to adjust old states with additional eature retrieval
or the next step. Te third stage ocuses on process optimization, 3.3.1 RTAB-Map SLAM
incorporating bundle adjustments and additional propagation or
eciency. Te nal stage outputs the system’s estimated pose and RAB-Map SLAM, which stands or real-time appearance-
a keyrame database, applicable to SLAM; see Figure 5, part 13. based mapping, is a visual SLAM technique that works with
RGB-D and stereo cameras (Ragot et al., 2019). It is a versatile
3.2.4 Kimera-SLAM algorithm that can handle 2D and 3D mapping tasks depending
Kimera-SLAM is an open-source SLAM technique applied on the sensor and data that are given (Peter et al., 2023; Acosta-
or real-time metric semantic purposes. Its ramework is highly Amaya et al., 2023). It integrates RGB-D and stereo data or
dependent on previous methodologies such as ORB-SLAM, VINS- 3D mapping, enabling the detection o static and dynamic 3D
Mono SLAM, OKVIS, and ROVIO-SLAM (Ros. et al., 2020). objects in the robot’s environment (Ragot et al., 2019). It is
Exhibiting robustness in dynamic scenes, particularly in the applicable in large outdoor environments where LiDAR rays cannot
presence o moving objects (Wang et al., 2022), Kimera-SLAM reect and manage the eld around the robot (Gurel, 2018).
showcases resilience to variations in lighting conditions. It operates Variable lighting and environmental interactions can cause robotic
eectively in both indoor and outdoor settings, making it highly localization and mapping errors. Tereore, RAB’s robustness and
compatible with integration into interactive robotic systems adaptability to changing illumination and scenes enable accurate
(Rosinol et al., 2021). In summary, Kimera-SLAM provides a operation in challenging environments. It can handle large, complex
thorough and ecient solution or real-time metric-semantic environments and is quickly adaptable to work with multiple
SLAM, prioritizing accuracy, modality, and robustness in its cameras or laser rangenders (Li et al., 2018; Peter et al., 2023).
operations (Rosinol et al., 2021); see able 2. Additionally, the integration o 265 (Intel RealSense Camera)
Te procedural workow o this technique can be summarized and implementation o ultra-wideband (UWB) (Lin and Yeh,
in ve stages (Ros et al. (2020). First, the input pre-processing 2022) address robot wheel slippage with drifing error handling,
includes dense 2D semantics, dense stereo, and Kimera-VIO. It also enhancing system eciency with precise tracking and 3D point
includes ront-end and back-end operations such as tracking, eature cloud generation, as done in Persson et al. (2023); see able 2.
extraction, and matching, which yield an accurate state estimation. Te RAB-MAP SLAM method involves a series o steps that
Te second stage involves robust pose graph optimization (Kimera- enable it to unction (Gurel, 2018; Labbé and Michaud, 2019).
RPGO), tasked with optimization and the ormulation o a global Initially, the hardware and ront-end stage is responsible or tasks
trajectory. Subsequently, the third stage eatures the per-rame such as obtaining data rom stereo and RGB-D cameras, generating
and multi-rame 3D mesh generator (Kimera–Mesher), responsible rames, and integrating sensors. Tis stage prepares the rames that
or the execution and generation o 3D meshes representing the will be used in the subsequent stage. Afer the rames have been
environment. Te ourth stage introduces semantically annotated processed simultaneously with the tracking process, the loop closure
3D meshes (Kimera-Semantics), dedicated to generating 3D meshes is activated to generate the necessary odometry. Subsequently, the
with semantic annotations. Tis stage sets the groundwork or the keyrames equalization and optimization processes are initiated to
subsequent and nal stage, where the generated 3D meshes are improve the quality o the 2D and 3D maps generated or SLAM
utilized or output visualization, ultimately serving SLAM purposes, applications, as shown in Figure 5, part 7.
as illustrated in Figure 5, part 9.
3.3.2 DTAM-SLAM
DAM-SLAM, which stands or dense tracking and mapping,
3.3 RGB-D SLAM is a V-SLAM algorithm specied or real-time camera tracking.
It provides robust six degrees o reedom (6 DoF) tracking
RGB-D is an innovative approach that integrates RGB-D and acilitates ecient environmental modeling or robotic
cameras with depth sensors to estimate and build models o systems (Ne. et al., 2011; Macario Barros et al., 2022). Tis
the environment (Ji et al., 2021; Macario Barros et al., 2022). approach plays a undamental role in advancing applications
Tis technique has ound applications in various domains, such as robotics, augmented reality, and autonomous navigation,
including robotic navigation and perception (Luo et al., 2021). It delivering precise tracking and high-quality map reconstruction.
demonstrates ecient perormance, particularly in well-lit indoor Furthermore, it is slightly dynamic with light; thus, it is accurate
environments, providing valuable insights into the spatial landscape to operate in high and strong illumination elds (Zhu et al., 2022;
(Dai et al., 2021). Yang et al., 2022); see able 2.
Te incorporation o RGB-D cameras and depth sensors Te DAM-SLAM workow is divided into a series o steps,
enables the system to capture both color and depth inormation each with its own purpose (Ne et al., 2011; Macario Barros et al.,
simultaneously. Tis capability is advantageous in indoor 2022). It begins with the input such as the RGB-D camera,
applications, addressing the challenge o dense reconstruction in which helps initialize the system work. In the camera tracking
and reconstruction stage, the system selects rames and estimates and makes it useul with greater accuracy and robustness in dynamic
textures on the image. It then accurately tracks the 6DoF situations with the help o merging semantic and geometric data and
camera motion, determining its exact position and orientation. leveraging YOLOv7 or quick object recognition (Wu et al., 2022).
Furthermore, the optimization ramework is activated and uses Tanks to these improvements, the SLAM algorithms can be well-
techniques such as spatially regularized energy minimization to suited or dynamic scenarios which allows in greater adaptability
enhance data terms, thereby improving the image quality that and comprehension o system surroundings. Tis enables robotic
is captured rom video streaming. As a result, the advanced systems to operate in more complex circumstances with the ewer
process tuning carries out operations that improve the method’s mistakes or slippage errors (Liu and Miura, 2021). Moreover, robots
perormance and producing precise outputs such as dense models, equipped with SCE-SLAM are empowered to operate in a more
surace patchwork, and texture depth maps (see Figure 5, part 2). exible and error-reduced manner, and it can operate in challenging
light environments (Son et al., 2023; Ren et al., 2022); see able 2.
3.3.3 RGBD-SLAM Te SCE-SLAM workow is divided into three key stages
RGDB-SLAM, which stands or simultaneous localization and (Son et al., 2023). Te rst stage involves the semantic module.
mapping using red–green–blue and depth data, is an important Tis module processes camera input data and employs Yolov2 to
method that creates a comprehensive 3D map containing both remove noise rom the input. Te second stage is the geometry
static and dynamic elements (Ji et al., 2021). Tis method involves module, where depth image analysis and spatial coordinate recovery
the tracking o trajectories and mapping o points associated with are perormed, preparing the system or integration with ORB-
moving objects (Steinbrücker et al., 2011; Niu et al., 2019). Using SLAM3. Te nal stage is dedicated to the integration o ORB-
these data types enhances and provides precise SLAM results SLAM3. Tis integration acilitates the execution o processes within
(End et al., 2012; Li Q. et al., 2022a). It has the ability to create ORB-SLAM3. Te process works in parallel with the loop closure
registered point clouds or OctoMaps or the purpose that can be technique, which results in a more accurate and precise system
used or robotic systems (Zhang and Li 2023; Ren et al., 2022). In output; see Figure 5, Part 12.
robotics applications, RGB-D SLAM, specically V-SLAM, excels
in both robustness and accuracy. It eectively addresses challenges
such as working in a dynamic environment (Steinbrücker et al., 4 Visual SLAM evolution and datasets
2011; Niu et al., 2019). Te implementation o RGB-D SLAM aced a
challenge in balancing segmentation accuracy, system load, and the Te roots o SLAM can be traced back to nearly three decades
number o detected classes rom images. Tis challenge was tackled ago, when it was rst introduced by Smith et al. Picard et al. (2023);
using ensorR, optimized by YOLOX or high-precision real- Khoyani and Amini (2023). Recently, visual SLAM has changed a lot
time object recognition (Chang et al., 2023; Martínez-Otzeta et al., and made a big impact on robotics and computer vision (Khoyani
2022). It has versatile applications in real-world robotics scenarios, and Amini, 2023). Along this journey, dierent V-SLAM methods
including autonomous driving cars, mobile robotics, and augmented have been created to tackle specic challenges in robot navigation,
reality (Zhang and Li, 2023; Bahraini et al., 2018); see able 2. mapping, and understanding the surroundings (Aloui et al., 2022;
Te RGB-D SLAM workow can be organized into ve essential Sun et al., 2017). o veriy and compare these V-SLAM methods,
stages, each playing a crucial role in the SLAM process (Ji et al., important datasets have been created which played a crucial role
2021; Hastürk and Erkmen, 2021; End et al., 2012). Te initial stage in the eld (Pal et al., 2022; ian et al., 2023a). In this section, we
involves data acquisition, where RGB-D and depth camera data are explore the evolution o V-SLAM methods over time and how they
collected as the oundational input or subsequent stages. Moving have advanced with the help o using the suitable datasets.
on to the second stage, processing o RGB-D details was activated. o oer a more comprehensible perspective, we provide an
During this phase, tasks include eature extraction and pairwise illustrative timeline depicting the evolution o the most well-
matching while simultaneously addressing depth-related activities, known V-SLAM methods, as shown in Figure 6. Tis graphical
such as storing point clouds, and aligning lines or shapes. In the third representation illustrates the development o the V-SLAM
stage, activities such as noise removal and semantic segmentation methodologies rom 2007 to 2021. Tese methods have been
(SS), in addition to loop closure detection, are perormed to lay the applied in various elds, including agriculture, healthcare, and
groundwork or map construction. Te ourth stage is dedicated industrial sectors, with a specic ocus on interactive mobile
to ocus on pose estimation and optimization techniques, leading robots. Additionally, we highlight several signicant and widely
to improvement in the accuracy o the system output. Te nal recognized benchmark datasets crucial to V-SLAM, as shown in the
stage involves generating trajectory estimation and maps, rening ollowing section.
the outputs or use in SLAM applications in robotic systems; see
Figure 5, part 3.
4.1 TUM RGB-D dataset
3.3.4 SCE-SLAM
SCE-SLAM, which stands or spatial coordinate errors SLAM, Te UM RGB-D dataset is a widely used resource in the
represents an innovative real-time semantic RGB-D SLAM eld o V-SLAM, which helps demonstrate the eectiveness and
technique. It has been developed to tackle the constraints posed by practicality o V-SLAM techniques. Tis dataset provides both
traditional SLAM systems when operating in dynamic environments RGB images and depth maps, with the RGB images saved in a
(Li et al., 2020). Te method was improved to increase the 640 × 480 8-bit ormat and the depth maps in a 640 × 480 16-
perormance o existing V-SLAM methods such as ORB-SLAM3 bit monochrome (Chu et al., 2018). It oers RGB-D data, making
FIGURE 6
Timeline illustrates the evolutionary journey o SLAM techniques, accompanied by the datasets that have played a pivotal role in their development. It
showcases the dynamic progression o SLAM technologies over time, refecting the symbiotic relationship between innovative methods and the rich
variety o datasets they have been tested and rened with.
it appropriate or both depth-based and V-SLAM techniques. providing comprehensive resources or algorithm development.
Its useulness extends to essential tasks such as mapping and Its comprehensive data structure makes it highly suitable or
odometry, providing researchers with a considerable volume o data thoroughly testing and validating algorithms tailored or MAV
or testing SLAM algorithms across diverse robotic applications purposes (Burri et al., 2016). For more details, reer to the EuRoC
(Ji et al., 2021; End et al., 2012). Te adaptability o these datasets is MAV dataset.
remarkable, as they nd application in mobile robotics and handheld
platorms, demonstrating eectiveness in both indoor and outdoor
environments (Martínez-Otzeta et al., 2022; Son et al., 2023). 4.3 KITTI dataset
Some o the recent studies used UM datasets, such as in Li et al.
(2023c). Tey have leveraged the UM RGB-D dataset to establish Te KII dataset is a widely utilized resource in robotics
benchmarks customized to their specic research objectives. Te navigation and SLAM, with a particular emphasis on V-SLAM.
study initiated its investigations with RGB-D images and ground Designed or outdoor SLAM applications in urban environments,
truth poses provided by the UM datasets, utilizing them to KII integrates data rom multiple sensors, including depth
construct 3D scenes characterized with real space eatures. Te cameras, lidar, GPS, and inertial measurement unit (IMU),
integrative role assumed by the UM RGB-D dataset in this context contributing to the delivery o precise results or robotic applications
attains proound signicance as a undamental resource within the (Geiger et al., 2013). Its versatility extends to supporting diverse
domain o V-SLAM research. For more details, reer to the UM research objectives such as 3D object detection, semantic
RGB-D SLAM dataset. segmentation, moving object detection, visual odometry, and
road-detection algorithms (Wang et al., 2023; Raikwar et al., 2023).
As a valuable asset, researchers routinely rely on the KII
4.2 EuRoC MAV benchmark dataset dataset to evaluate the eectiveness o V-SLAM techniques in real-
time tracking scenarios. In addition, it serves as an essential tool or
Te EuRoC MAV benchmark dataset is specically designed or researchers and developers engaged in the domains o sel-driving
micro aerial vehicles (MAVs) and contributes a valuable resource cars and mobile robotics (Geiger et al., 2012; Ortega-Gomez et al.,
in the domain o MAV-SLAM research since it includes sensor data 2023). Furthermore, its adaptability acilitates the evaluation o
such as IMU and visual data such as stereo images. Tese datasets, sensor congurations, thereby contributing to the renement and
published in early 2016, are made accessible or research purposes assessment o algorithms crucial to these elds Geiger et al. (2013).
and oer a diverse usability in indoor and outdoor applications. For more details, reer to the KII Vision Benchmark Suite.
Consequently, it serves as a relevant choice or evaluating MAV
navigation and mapping algorithms, particularly in conjunction
with various visual V-SLAM methodologies (Sharautdinov et al., 4.4 Bonn RGB-D dynamic dataset
2023; Leutenegger, 2022; Burri et al., 2016).
Te EuRoC MAV benchmark dataset, o notable benets to Te Bonn dataset is purposeully designed or RGB-D SLAM,
robotics, is particularly valuable or researchers working on visual- containing dynamic sequences o objects. It showcases RGB-D
inertial localization algorithms like OpenVINS (Geneva et al., data accompanied by a 3D point cloud representing the dynamic
2020; Sumikura et al., 2019) and ORB-SLAM2 (Mur-Artal and environment, which has the same ormat as UM RGB-D datasets
ardós, 2017a). Tis dataset incorporates synchronized stereo (Palazzolo et al., 2019). It covers both indoor and outdoor scenarios,
images, IMU measurements, and precise ground truth data, extending beyond the boundaries o controlled environments. It
proves valuable or developing and evaluating algorithms related collectively enhance the algorithm’s reliability in challenging real-
to tasks such as robot navigation, object recognition, and scene world situations, making them crucial actors or successul mobile
understanding. Signicantly, this dataset is versatile enough to robotic applications.
address the complexities o applications used in light-challenging
areas (Soares et al., 2021; Ji et al., 2021). In addition, it proves
to be an important resource or evaluating V-SLAM techniques 5.2 Computational efciency and real-time
characterized by high dynamism and crowds where the robot might requirements
ace the challenge o object detection and interaction with the
surrounding environment (Dai et al., 2021; Yan et al., 2022). For In the application o mobile robotics, the selection o the SLAM
more details, reer to the Bonn RGB-D dynamic dataset. algorithm is extremely important, ocusing on the eciency o the
process happening inside the robot’s computational architecture
(Macario Barros et al., 2022). Tereore, the chosen V-SLAM
algorithm must be careully tailored to meet the computational
4.5 ICL-NUIM dataset demands imposed by the real-time constraints o the robot. Tis
entails a delicate balancing act as the selected algorithm should
It is a benchmark dataset which is designed or RGB-D
be seamlessly integrated with the available processing power and
applications, serving as a valuable tool or evaluating RGB-D,
hardware resources, all while satisying the stringent real-time
visual odometry, and V-SLAM algorithms, particularly in indoor
requirements o the application. Te critical consideration or this
situations (Handa et al., 2014). It includes 3D sensor data and
step is the quality o the sensors, the proessors, and/or computers
ground truth poses, acilitating the benchmarking o techniques
so that they can generate a quick response and accurate localization
related to mapping, localization, and object detection in the domain in a very limited time (Henein et al., 2020).
o robotic systems. Its pre-rendered sequences, scripts or generating
test data, and standardized data ormats are benecial or researchers
in evaluating and improving their SLAM algorithms (Chen et al., 5.3 Flexible hardware integration
2020). A unique aspect o the ICL-NUIM dataset is its inclusion
o a three-dimensional model. Tis eature empowers researchers to In robotic applications, it is important or researchers to choose
explore and devise new scenarios or robotic systems, which operates a SLAM algorithm that works well with the robot’s sensors.
in unknown environments. Moreover, it promotes improvements Integrating suitable hardware improves speed and perormance
in V-SLAM, which makes it possible to generate semantic maps in SLAM systems through accelerators, method optimization,
that improve robots’ exibility and adaptability to integration into and energy-ecient designs (Eyvazpour et al., 2023). Various V-
that environment easily and exibly (Zhang et al., 2021a). For more SLAM algorithms are designed or specic sensor types such
details, reer to the ICL-NUIM dataset. as RGB-D, lidar, and stereo cameras. Tis acilitates seamless
integration into the SLAM system, enhancing the unctionality
o utilizing integrated hardware (Wang et al., 2022). Moreover, the
5 Guidelines or evaluating and availability o ROS packages and open-source sofware or sensors
selecting visual SLAM methods and cameras provides increased modality and exibility during
system installation. Tis, in turn, enhances adaptability and makes
Choosing the right visual SLAM algorithm is crucial or building integration easy and ree o challenges (Sharautdinov et al., 2023;
an eective SLAM system. With the continuous advancements in V- Roch et al., 2023). For example, the OAK-D Camera, also known as
SLAM methodologies responding to diverse challenges, it is essential the OpenCV AI Kit, is a smart camera that is great or indoor use. It
can automatically process data les and use neural reasoning right
to navigate structured criteria to deploy and implement precise
inside the camera, without needing extra computer power rom the
solutions (Placed et al., 2023; Sousa et al., 2023). In the context o
robot. Tis means it can run neural network models without making
robotic systems, we provide important parameters. We outline them
the robot’s operating system work harder (Han et al., 2023).
by oering concise explanations o the selection criteria that guide
how to choose suitable SLAM methods or eld applications. Tese
parameters are listed below. 5.4 System scalability
In SLAM algorithms or robotics, scalability is a vital actor
5.1 Robustness and accuracy to keep in mind during the design o the system Middleware
architecture. It enables rapid situational awareness over large
When choosing among V-SLAM methods, a key consideration areas, supports exible dense metric-semantic SLAM in multi-
is the robustness and accuracy o the method (Zhu et al., 2022). In robot systems, and acilitates ast map learning in unknown
particular, a robust algorithm can handle sensor noise, obstacles, and environments (Castro, 2021). Tis parameter needs to evaluate
changing environments to ensure continuous and reliable operation the algorithm’s capability to adjust to dierent mapping sizes and
(Bongard, 2008). Additionally, accuracy is equally important or environmental conditions, particularly considering light emission,
creating precise maps and localization, allowing the robot to make video, and/or image clarity. It should also provide versatility or
inormed decisions and move through the environment without various application needs, applicable to both indoor and outdoor
errors (Kucner et al., 2023; Nakamura et al., 2023). Tese qualities scenarios (Laidlow et al., 2019; Zhang et al., 2023).
5.5 Adapting to dynamic environments It also assesses their perormance under varying illumination
conditions, classiying algorithms based on their robustness, with
Te ability o a SLAM algorithm to handle dynamic objects categories ranging rom the lowest, which represents (+) and
in the environment is an important consideration or robotics to the highest which represents (+++++). Additionally, the table
systems. Tis parameter assesses the algorithm’s ability to detect, categorizes the algorithms based on their range o light intensity
track, and incorporate dynamic objects and moving obstacles (RoLI), which reects the robot’s ability to operate eectively in
into the mapping process (Lopez et al., 2020). It ocuses on the diverse lighting conditions, spanning rom very dim to extremely
algorithm’s capability to enable the robot to handle these objects bright. Moreover, the tolerance to directionality (2D) category
eectively and respond quickly during the ongoing SLAM process assesses the algorithm’s ability to unction in environments with
(Wu et al., 2022). A robust dynamic environment should ensure the strong directional light sources, such as spotlights and windows.
algorithm’s ability to adapt and respond in real-time applications. Collectively, these criteria collectively urnish a valuable resource or
Tis is crucial or systems operating in environments where changes researchers seeking to pick the most tting SLAM approach or their
occur instantaneously, such as in interactive robotics applications specic research endeavors.
(Li et al., 2018).
6 Conclusion
5.6 Open-source availability and
Te study simplies the evaluation o V-SLAM methods,
community support
making it easy to understand their behavior and suitability or
robotics applications. It covers various active V-SLAM methods,
When choosing a SLAM algorithm or our project, it is
each with unique strengths, limitations, specialized use cases,
important to observe whether it is open-source and has a
and special workows. It has served as a solid oundation or
community o active users. It is important because it makes it
the proposed research methodology or selection among V-
easier to customize and adapt the system according to our needs,
SLAM methods. Troughout the research, it becomes evident
beneting rom the experiences o the user community (Khoyani
that V-SLAM’s evolution is importantly linked to the availability
and Amini 2023; Xiao et al., 2019). Additionally, having community
o benchmark datasets, serving as a ground base or method
support ensures that the algorithm receives updates, bug xes, and
validation. Consequently, the work has laid a strong oundation
improvements. Tis enhances the reliability and longevity o the
or understanding the system behavior o the working V-SLAM
algorithm, making it better equipped to handle challenges during
methods. It explores SLAM techniques that operate in the ROS
system implementation (Persson et al., 2023).
environment, oering exibility in simpliying the architecture
o robotic systems. Te study includes the identication o
suitable algorithms and sensor usion approaches relevant to
5.7 Map data representation and storage researchers’ work.
By examining previous studies, we identied the potential
Tis parameter ocuses on how a SLAM algorithm is represented benets o incorporating V-SLAM sofware tools into the system
and manages maps, allowing the researcher to determine its architecture. Additionally, the integration o hardware tools
suitability or system hardware implementation. Te evaluation such as the 265 camera and OAK-D camera emerged as a
includes the chosen method’s map representation, whether it is valuable strategy. Tis integration has a signicant potential in
grid-based, eature-based, or point cloud, helping in assessing the reducing errors during robot navigation, thereby enhancing overall
eciency o storing map inormation in the robotic system without system robustness.
encountering challenges (Persson et al., 2023; Acosta-Amaya et al.,
2023). Te selection o map representation inuences memory
usage and computational demands. It is a critical actor or robotic Author contributions
applications, especially those based on CNN and deep learning
approaches (Duan et al., 2019). BA: investigation, sofware, supervision, and writing–review
In conclusion, we have summarized the preceding details in and editing. H: data curation, methodology, conceptualization,
able 2, oering a comprehensive overview o various V-SLAM validation, investigation, resources, visualization, writing–review
algorithms. Tis table serves as a valuable resource or inormed and editing. AA: methodology, ormal analysis, validation,
algorithm selection with comparative details or each method. It investigation, visualization, sofware, writing–review and editing.
oers insights into the sensor capabilities, examining the types AA–H: methodology, supervision, project administration,
o sensors most eectively used by each algorithm and their validation, unding acquisition, resources, writing–review and
role in acilitating algorithmic unctionality. Moreover, the table editing.
underscores the potential application domains o the methods,
empowering researchers to align their research objectives with
suitable V-SLAM methodologies. Te table also classies algorithms Funding
based on their mapping scale distinguishing between small-scale (up
to 100 m), medium-scale (up to 500 m), and large-scale (1 km and Te author(s) declare that nancial support was
beyond) mapping capabilities (ian et al., 2023b; Hong et al., 2021). received or the research, authorship, and/or publication
o this article. Tis work is unded and supported by the that could be construed as a potential conict o
Federal Ministry o Education and Research o Germany interest.
(BMBF) (AutoKoWA-3DMAt under grant No. 13N16336)
and German Research Foundation (DFG) under grants
Al 638/15-1. Publisher’s note
All claims expressed in this article are solely those o the
authors and do not necessarily represent those o their aliated
Conict o interest organizations, or those o the publisher, the editors, and the
reviewers. Any product that may be evaluated in this article, or claim
Te authors declare that the research was conducted in that may be made by its manuacturer, is not guaranteed or endorsed
the absence o any commercial or nancial relationships by the publisher.
Reerences
Abbad, A. M., Haouala, I., Raisov, A., and Benkredda, R. (2023). Low cost mobile Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., and ardós, J. D. (2021).
navigation using 2d-slam in complex environments Orb-slam3: an accurate open-source library or visual, visual–inertial, and multimap
slam. IEEE rans. Robotics 37, 1874–1890. doi:10.1109/tro.2021.3075644
Acosta-Amaya, G. A., Cadavid-Jimenez, J. M., and Jimenez-Builes, J. A. (2023).
Tree-dimensional location and mapping analysis in mobile robotics based on visual Castro, G. I. (2021). Scalability and consistency improvements in SLAM systems with
slam methods. J. Robotics 2023, 1–15. doi:10.1155/2023/6630038 applications in active multi-robot exploration. Ph.D. thesis (FACULY OF EXAC AND
NAURAL SCIENCES DEPARMEN OF COMPUAIONÓN Improvements).
Ai, Y.-b., Rui, ., Yang, X.-q., He, J.-l., Fu, L., Li, J.-b., et al. (2021). Visual slam
in dynamic environments based on object detection. De. echnol. 17, 1712–1721. Chang, Z., Wu, H., and Li, C. (2023). Yolov4-tiny-based robust rgb-d slam approach
doi:10.1016/j.dt.2020.09.012 with point and surace eature usion in complex indoor environments. J. Field Robotics
40, 521–534. doi:10.1002/rob.22145
Alismail, H., Browning, B., and Lucey, S. (2017). “Photometric bundle adjustment
or vision-based slam,” in Computer Vision–ACCV 2016: 13th Asian Conerence on Chen, H., Yang, Z., Zhao, X., Weng, G., Wan, H., Luo, J., et al. (2020).
Computer Vision, aipei, aiwan, November 20-24, 2016 (Springer), 324–341. Revised Advanced mapping robot and high-resolution dataset. Robotics Aut. Syst. 131, 103559.
Selected Papers, Part IV. doi:10.1016/j.robot.2020.103559
Aloui, K., Guizani, A., Hammadi, M., Haddar, M., and Soriano, . (2022). “Systematic Chou, C., Wang, D., Song, D., and Davis, . A. (2019). “On the tunable
literature review o collaborative slam applied to autonomous mobile robots,” in 2022 sparse graph solver or pose graph optimization in visual slam problems,” in 2019
IEEE Inormation echnologies and Smart Industrial Systems (ISIS), 1–5. IEEE/RSJ International Conerence on Intelligent Robots and Systems (IROS) (IEEE),
1300–1306.
Altawil, B., and Can, F. C. (2023). Design and analysis o a our do robotic arm with
two grippers used in agricultural operations. Int. J. Appl. Math. Electron. Comput. 11, Chu, P. M., Sung, Y., and Cho, K. (2018). Generative adversarial network-based
79–87. doi:10.18100/ijamec.1217072 method or transorming single rgb image into 3d point cloud. IEEE Access 7,
1021–1029. doi:10.1109/access.2018.2886213
Ara, E. (2022). Study and implementation o LiDAR-based SLAM algorithm and
map-based autonomous navigation or a telepresence robot to be used as a chaperon Chung, C.-M., seng, Y.-C., Hsu, Y.-C., Shi, X.-Q., Hua, Y.-H., Yeh, J.-F., et al. (2023).
or smart laboratory requirements. Master’s thesis. “Orbeez-slam: a real-time monocular visual slam with orb eatures and ner-realized
mapping,” in 2023 IEEE International Conerence on Robotics and Automation (ICRA)
Aslan, M. F., Durdu, A., Yuse, A., Sabanci, K., and Sungur, C. (2021). A tutorial:
(IEEE), 9400–9406.
mobile robotics, slam, bayesian lter, keyrame bundle adjustment and ros applications.
Robot Operating Syst. (ROS) Complete Reerence 6, 227–269. Civera, J., Gálvez-López, D., Riazuelo, L., ardós, J. D., and Montiel, J. M.
M. (2011). “owards semantic slam using a monocular camera,” in 2011
Awais, M., and Henrich, D. (2010). Human-robot collaboration by intention
IEEE/RSJ international conerence on intelligent robots and systems (IEEE),
recognition using probabilistic state machines , 75–80.
1277–1284.
Bahraini, M. S., Bozorg, M., and Rad, A. B. (2018). Slam in dynamic environments
Cui, Y., Chen, X., Zhang, Y., Dong, J., Wu, Q., and Zhu, F. (2022). Bow3d: bag o words
via ml-ransac. Mechatronics 49, 105–118. doi:10.1016/j.mechatronics.2017.12.002
or real-time loop closing in 3d lidar slam. IEEE Robotics Automation Lett. 8, 2828–2835.
Beghdadi, A., and Mallem, M. (2022). A comprehensive overview o dynamic visual doi:10.1109/lra.2022.3221336
slam and deep learning: concepts, methods and challenges. Mach. Vis. Appl. 33, 54.
CVG, . U. o. M. (2023). LSD-SLAM: large-scale direct monocular SLAM. Available
doi:10.1007/s00138-022-01306-w
at: https://cvg.cit.tum.de/research/vslam/lsdslam?redirect.
Blo, M., Omari, S., Hutter, M., and Siegwart, R. (2015). “Robust visual inertial
Dai, W., Zhang, Y., Zheng, Y., Sun, D., and Li, P. (2021). Rgb-d slam with moving
odometry using a direct ek-based approach,” in 2015 IEEE/RSJ international
object tracking in dynamic environments. IE Cyber-Systems Robotics 3, 281–291.
conerence on intelligent robots and systems (IROS) (IEEE), 298–304.
doi:10.1049/csy2.12019
Bongard, J. (2008). Probabilistic robotics. sebastian thrun, wolram burgard, and dieter
[Dataset] uz.slaml (2023). ORB-SLAM3. Available at: https://github.com/UZ-
ox. Cambridge, MA, United States: MI press, 647. 2005.
SLAMLab/ORB_SLAM3.
Bruno, H. M. S., and Colombini, E. L. (2021). Lif-slam: a deep-learning
Davison, A. J., Reid, I. D., Molton, N. D., and Stasse, O. (2007). Monoslam: real-
eature-based monocular visual slam method. Neurocomputing 455, 97–110.
time single camera slam. IEEE rans. pattern analysis Mach. Intell. 29, 1052–1067.
doi:10.1016/j.neucom.2021.05.027
doi:10.1109/tpami.2007.1049
Burri, M., Nikolic, J., Gohl, P., Schneider, ., Rehder, J., Omari, S., et al.
De Croce, M., Pire, ., and Bergero, F. (2019). Ds-ptam: distributed stereo
(2016). Te euroc micro aerial vehicle datasets. Int. J. Robotics Res. 35, 1157–1163.
parallel tracking and mapping slam system. J. Intelligent Robotic Syst. 95, 365–377.
doi:10.1177/0278364915620033
doi:10.1007/s10846-018-0913-6
Bustos, A. P., Chin, .-J., Eriksson, A., and Reid, I. (2019). “Visual slam: why bundle
Duan, C., Junginger, S., Huang, J., Jin, K., and Turow, K. (2019). Deep learning
adjust?,” in 2019 international conerence on robotics and automation (ICRA) (IEEE),
or visual slam in transportation robotics: a review. ransp. Sa. Environ. 1, 177–184.
2385–2391.
doi:10.1093/tse/tdz019
Buyval, A., Aanasyev, I., and Magid, E. (2017). “Comparative analysis o ros-based
Durrant-Whyte, H. F. (2012). Integration, coordination and control o multi-sensor
monocular slam methods or indoor navigation,” in Ninth International Conerence on
robot systems, 36. Springer Science and Business Media.
Machine Vision (ICMV 2016) (SPIE), 305–310.
El Bouazzaoui, I., Rodriguez, S., Vincke, B., and El Ouardi, A. (2021). Indoor
Ca, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., and ardós, J. D. (2021). Orb-
visual slam dataset with various acquisition modalities. Data Brie 39, 107496.
slam3: an accurate open-source library or visual, visual–inertial, and multimap slam.
doi:10.1016/j.dib.2021.107496
IEEE rans. Robotics 37, 1874–1890. doi:10.1109/tro.2021.3075644
End, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., and Burgard, W. (2012). “An Heyer, C. (2010). “Human-robot interaction and uture industrial robotics
evaluation o the rgb-d slam system,” in 2012 IEEE international conerence on robotics applications,” in 2010 ieee/rsj international conerence on intelligent robots and systems
and automation (IEEE), 1691–1696. (IEEE), 4749–4754.
Eng, J., Schöps, ., and Cremers, D. (2014). “Lsd-slam: large-scale direct monocular hkust.a (2023). VINS-Mono. Available at: https://github.com/HKUS-Aerial-
slam,” in European conerence on computer vision (Springer), 834–849. Robotics/VINS-Mono.
Engel, J., Stückler, J., and Cremers, D. (2015). “Large-scale direct slam with stereo Hong, S., Bangunharcana, A., Park, J.-M., Choi, M., and Shin, H.-S. (2021). Visual
cameras,” in 2015 IEEE/RSJ international conerence on intelligent robots and systems slam-based robotic mapping method or planetary construction. Sensors 21, 7715.
(IROS) (IEEE), 1935–1942. doi:10.3390/s21227715
eth.a (2023a). OKVIS: open keyrame-based visual-inertial SLAM. Available at: Hsiao, M., Westman, E., Zhang, G., and Kaess, M. (2017). “Keyrame-based dense
https://github.com/ethz-asl/okvis. planar slam,” in 2017 IEEE International Conerence on Robotics and Automation
(ICRA) (Ieee), 5110–5117.
eth.a (2023b). Rovio: robust visual inertial odometry. Available at: https://github.
com/ethz-asl/rovio. Huang, L. (2021). “Review on lidar-based slam techniques,” in 2021 International
Conerence on Signal Processing and Machine Learning (CONF-SPML) (IEEE),
Eudes, A., Lhuillier, M., Naudet-Collette, S., and Dhome, M. (2010).
163–168.
“Fast odometry integration in local bundle adjustment-based visual
slam,” in 2010 20th International Conerence on Pattern Recognition Introlab (2023). RAB-Map. Available at: http://introlab.github.io/rtabmap/.
(IEEE), 290–293.
Ji, ., Wang, C., and Xie, L. (2021). “owards real-time semantic rgb-d slam in
Eyvazpour, R., Shoaran, M., and Karimian, G. (2023). Hardware dynamic environments,” in 2021 IEEE International Conerence on Robotics and
implementation o slam algorithms: a survey on implementation Automation (ICRA) (IEEE), 11175–11181.
approaches and platorms. Arti. Intell. Rev. 56, 6187–6239.
Joo, S.-H., Manzoor, S., Rocha, Y. G., Bae, S.-H., Lee, K.-H., Kuc, .-Y., et al.
doi:10.1007/s10462-022-10310-5
(2020). Autonomous navigation ramework or intelligent robots based on a semantic
Fan, ., Wang, H., Rubenstein, M., and Murphey, . (2020). Cpl-slam: ecient and environment modeling. Appl. Sci. 10, 3219. doi:10.3390/app10093219
certiably correct planar graph-based slam using the complex number representation.
Kasyanov, A., Engelmann, F., Stückler, J., and Leibe, B. (2017). Keyrame-based
IEEE rans. Robotics 36, 1719–1737. doi:10.1109/tro.2020.3006717
visual-inertial online slam with relocalization , 6662–6669.
elix (2023). RGB-D SLAM v2. Available at: https://github.
Kazerouni, I. A., Fitzgerald, L., Dooly, G., and oal, D. (2022). A survey o state-o-
com/elixendres/rgbdslam_v2.
the-art on visual slam. Expert Syst. Appl. 205, 117734. doi:10.1016/j.eswa.2022.117734
Fernández-Moral, E., Jiménez, J. G., and Arévalo, V. (2013). Creating metric-
Kerl, C., Sturm, J., and Cremers, D. (2013). “Dense visual slam or rgb-d cameras,”
topological maps or large-scale monocular slam. ICINCO (2), 39–47.
in 2013 IEEE/RSJ International Conerence on Intelligent Robots and Systems (IEEE),
Fiedler, M.-A., Werner, P., Khalia, A., and Al-Hamadi, A. (2021). Spd: simultaneous 2100–2106.
ace and person detection in real-time or human–robot interaction. Sensors 21, 5918.
Khoyani, A., and Amini, M. (2023). A survey on visual slam algorithms compatible
doi:10.3390/s21175918
or 3d space reconstruction and navigation , 01–06.
Fong, ., Nourbakhsh, I., and Dautenhahn, K. (2003). A survey o socially interactive
Klein, G., and Murray, D. (2007). “Parallel tracking and mapping or small ar
robots. Robotics Aut. Syst. 42, 143–166. doi:10.1016/s0921-8890(02)00372-x
workspaces,” in 2007 6th IEEE and ACM international symposium on mixed and
Gao, B., Lang, H., and Ren, J. (2020). “Stereo visual slam or autonomous vehicles: augmented reality (IEEE), 225–234.
a review,” in 2020 IEEE International Conerence on Systems, Man, and Cybernetics
Kuang, Z., Wei, W., Yan, Y., Li, J., Lu, G., Peng, Y., et al. (2022). A real-time and
(SMC) (IEEE), 1316–1322.
robust monocular visual inertial slam system based on point and line eatures or
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013). Vision meets robotics: the kitti mobile robots o smart cities toward 6g. IEEE Open J. Commun. Soc. 3, 1950–1962.
dataset. Int. J. Robotics Res. 32, 1231–1237. doi:10.1177/0278364913491297 doi:10.1109/ojcoms.2022.3217147
Geiger, A., Lenz, P., and Urtasun, R. (2012). “Are we ready or autonomous driving? Kucner, . P., Magnusson, M., Mghames, S., Palmieri, L., Verdoja, F., Swaminathan,
the kitti vision benchmark suite,” in 2012 IEEE conerence on computer vision and C. S., et al. (2023). Survey o maps o dynamics or mobile robots. Int. J. Robotics Res.,
pattern recognition (IEEE), 3354–3361. 02783649231190428.
Geneva, P., Eckenho, K., Lee, W., Yang, Y., and Huang, G. (2020). “OpenVINS: Labbé, F., and Michaud, M. (2019). Rtab-map as an open-source lidar and visual
a research platorm or visual-inertial estimation,” in Proc. o the IEEE International simultaneous localization and mapping library or large-scale and long-term online
Conerence on Robotics and Automation, Paris, France. operation. J. feld robotics 36, 416–446. doi:10.1002/rob.21831
GPL (2023). Available at: https://github.com/Oxord-PAM/PAM-GPL. Laidlow, ., Czarnowski, J., and Leutenegger, S. (2019). Deepusion: real-time dense
3d reconstruction or monocular slam using single-view depth and gradient predictions
Grisetti, G., Stachniss, C., and Burgard, W. (2007). Improved techniques or grid
, 4068–4074.
mapping with rao-blackwellized particle lters. IEEE rans. Robotics 23, 34–46.
doi:10.1109/tro.2006.889486 Lee, G., Moon, B.-C., Lee, S., and Han, D. (2020). Fusion o the slam with wi--based
positioning methods or mobile robot-based learning data collection, localization, and
Gu, P., Meng, Z., and Zhou, P. (2022). Real-time visual inertial odometry with a
tracking in indoor spaces. Sensors 20, 5182. doi:10.3390/s20185182
resource-ecient harris corner detection accelerator on pga platorm , 10542–10548.
Leut, S., Lynen, S., Bosse, M., Siegwart, R., and Furgale, P. (2015). Keyrame-based
Gurel, C. S. (2018). Real-time 2d and 3d slam using rtab-map, gmapping, and
visual–inertial odometry using nonlinear optimization. Int. J. Robotics Res. 34, 314–334.
cartographer packages. University o Maryland.
doi:10.1177/0278364914554813
Han, Y., Mokhtarzadeh, A. A., and Xiao, S. (2023). Novel cartographer using an oak-d
Leutenegger, S. (2022). Okvis2: realtime scalable visual-inertial slam with loop closure.
smart camera or indoor robots location and navigation. J. Phys. Con. Ser. 2467, 012029.
arXiv preprint arXiv:2202.09199.
doi:10.1088/1742-6596/2467/1/012029
Li, D., Shi, X., Long, Q., Liu, S., Yang, W., Wang, F., et al. (2020). “Dxslam: a robust
Handa, A., Whelan, ., McDonald, J., and Davison, A. J. (2014). “A benchmark
and ecient visual slam system with deep eatures,” in 2020 IEEE/RSJ International
or rgb-d visual odometry, 3d reconstruction and slam,” in 2014 IEEE international
conerence on intelligent robots and systems (IROS) (IEEE), 4958–4965.
conerence on Robotics and automation (ICRA) (IEEE), 1524–1531.
Li, G., Hou, J., Chen, Z., Yu, L., and Fei, S. (2023a). Robust stereo inertial odometry
Hastürk, Ö., and Erkmen, A. M. (2021). Dudmap: 3d rgb-d mapping or dense,
based on sel-supervised eature points. Appl. Intell. 53, 7093–7107. doi:10.1007/s10489-
unstructured, and dynamic environment. Int. J. Adv. Robotic Syst. 18, 172988142110161.
022-03278-w
doi:10.1177/17298814211016178
Li, P., Qin, ., and Shen, S. (2018). “Stereo vision-based semantic 3d object and ego-
Hempel, ., and Al-Hamadi, A. (2020). Pixel-wise motion segmentation
motion tracking or autonomous driving,” in Proceedings o the European Conerence
or slam in dynamic environments. IEEE Access 8, 164521–164528.
on Computer Vision (ECCV), 646–661.
doi:10.1109/access.2020.3022506
Li, Q., Wang, X., Wu, ., and Yang, H. (2022a). Point-line eature usion based eld
Hempel, ., Dinges, L., and Al-Hamadi, A. (2023). “Sentiment-based engagement
real-time rgb-d slam. Comput. Graph. 107, 10–19. doi:10.1016/j.cag.2022.06.013
strategies or intuitive human-robot interaction,” in Proceedings o the 18th
International Joint Conerence on Computer Vision, 680–686. Imaging and Computer Li, S., Zhang, D., Xian, Y., Li, B., Zhang, ., and Zhong, C. (2022b). Overview o deep
Graphics Teory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP. INSICC learning application on visual slam, 102298. Displays.
(SciePress). doi:10.5220/0011772900003417
Li, S., Zheng, P., Liu, S., Wang, Z., Wang, X. V., Zheng, L., et al. (2023b).
Henein, M., Zhang, J., Mahony, R., and Ila, V. (2020). Dynamic slam: the need or Proactive human–robot collaboration: mutual-cognitive, predictable, and
speed, 2123–2129. sel-organising perspectives. Robotics Computer-Integrated Manu. 81, 102510.
doi:10.1016/j.rcim.2022.102510
Hess, W., Kohler, D., Rapp, H., and Andor, D. (2016). “Real-time loop closure in 2d
lidar slam,” in 2016 IEEE international conerence on robotics and automation (ICRA) Li, Y., Guo, Z., Yang, Z., Sun, Y., Zhao, L., and ombari, F. (2023c). Open-structure: a
(IEEE), 1271–1278. structural benchmark dataset or slam algorithms. arXiv preprint arXiv:2310.10931.
Lin, H.-Y., and Yeh, M.-C. (2022). Drif-ree visual slam or mobile robot Palazzolo, E., Behley, J., Lottes, P., Giguere, P., and Stachniss, C. (2019). “Reusion:
localization by integrating uwb technology. IEEE Access 10, 93636–93645. 3d reconstruction in dynamic environments or rgb-d cameras exploiting residuals,”
doi:10.1109/access.2022.3203438 in 2019 IEEE/RSJ International Conerence on Intelligent Robots and Systems (IROS)
(IEEE), 7855–7862.
Liu, Y., and Miura, J. (2021). Rds-slam: real-time dynamic slam using semantic
segmentation methods. Ieee Access 9, 23772–23785. doi:10.1109/access.2021.3050617 Persson, N., Ekström, M. C., Ekström, M., and Papadopoulos, A. V. (2023). “On the
initialization problem or timed-elastic bands,” in Proceedings o the 22nd IFAC World
Lopez, J., Sanchez-Vilarino, P., Cacho, M. D., and Guillén, E. L. (2020). Obstacle
Congress (IFAC WC).
avoidance in dynamic environments based on velocity space optimization. Robotics Aut.
Syst. 131, 103569. doi:10.1016/j.robot.2020.103569 Peter, J., Tomas, M. J., and Mohan, S. (2023). Development o an autonomous
ground robot using a real-time appearance based (rtab) algorithm or enhanced spatial
Luo, H., Pape, C., and Reithmeier, E. (2021). Robust rgbd visual odometry
mapping
using windowed direct bundle adjustment and slanted support plane. IEEE Robotics
Automation Lett. 7, 350–357. doi:10.1109/lra.2021.3126347 Picard, Q., Chevobbe, S., Darouich, M., and Didier, J.-Y. (2023). A survey on real-
time 3d scene reconstruction with slam methods in embedded systems. arXiv preprint
Lynch, C., Wahid, A., ompson, J., Ding, ., Betker, J., Baruch, R., et al. (2023).
arXiv:2309.05349.
Interactive language: talking to robots in real time. IEEE Robotics Automation Lett., 1–8.
doi:10.1109/lra.2023.3295255 Placed, J. A., Strader, J., Carrillo, H., Atanasov, N., Indelman, V., Carlone, L., et al.
(2023). A survey on active simultaneous localization and mapping: state o the art and
Macario Barros, A., Michel, M., Moline, Y., Corre, G., and Carrel, F.
new rontiers. IEEE rans. Robotics 39, 1686–1705. doi:10.1109/tro.2023.3248510
(2022). A comprehensive survey o visual slam algorithms. Robotics 11, 24.
doi:10.3390/robotics11010024 Prati, E., Villani, V., Grandi, F., Peruzzini, M., and Sabattini, L. (2021).
Use o interaction design methodologies or human–robot collaboration
Mane, A. A., Parihar, M. N., Jadhav, S. P., and Gadre, R. (2016). “Data acquisition
in industrial scenarios. IEEE rans. Automation Sci. Eng. 19, 3126–3138.
analysis in slam applications,” in 2016 International Conerence on Automatic Control
doi:10.1109/tase.2021.3107583
and Dynamic Optimization echniques (ICACDO) (IEEE), 339–343.
Qin, ., Li, P., and Shen, S. (2018). Vins-mono: a robust and versatile
Martínez-Otzeta, J. M., Rodríguez-Moreno, I., Mendialdua, I., and Sierra, B. (2022).
monocular visual-inertial state estimator. IEEE rans. Robotics 34, 1004–1020.
Ransac or robotic applications: a survey. Sensors 23, 327. doi:10.3390/s23010327
doi:10.1109/tro.2018.2853729
Mazumdar, H., Chakraborty, C., Sathvik, M., Jayakumar, P., and Kaushik, A.
Ragot, N., Khemmar, R., Pokala, A., Rossi, R., and Ertaud, J.-Y. (2019). “Benchmark
(2023). Optimizing pix2pix gan with attention mechanisms or ai-driven polyp
o visual slam algorithms: orb-slam2 vs rtab-map,” in 2019 Eighth International
segmentation in iomt-enabled smart healthcare. IEEE J. Biomed. Health In., 1–8.
Conerence on Emerging Security echnologies (ES) (IEEE), 1–6.
doi:10.1109/jbhi.2023.3328962
Raikwar, S., Yu, H., and Herlitzius, . (2023). 2d lidar slam localization system or
Meng, X., Gao, W., and Hu, Z. (2018). Dense rgb-d slam with multiple cameras.
a mobile robotic platorm in gps denied environment. J. Biosyst. Eng. 48, 123–135.
Sensors 18, 2118. doi:10.3390/s18072118
doi:10.1007/s42853-023-00176-y
Meng, X., Li, B., Li, B., Li, B., and Li, B. (2022). “Prob-slam: real-time
raulmur (2023a). ORB-SLAM. Available at: https://github.com/raulmur/ORB_
visual slam based on probabilistic graph optimization,” in Proceedings o the
SLAM.
8th International Conerence on Robotics and Articial Intelligence, 39–45.
doi:10.1145/3573910.3573920 raulmur (2023b). ORB-SLAM2. Available at: https://github.com/raulmur/ORB_
SLAM2.
MI.S (2023). Kimera: an open-source library or real-time metric-semantic
localization and mapping. Available at: https://github.com/MI-SPARK/Kimera. Ren, G., Cao, Z., Liu, X., an, M., and Yu, J. (2022). Plj-slam: monocular visual slam
with points, lines, and junctions o coplanar lines. IEEE Sensors J. 22, 15465–15476.
Mohamed, N., Al-Jaroodi, J., and Jawhar, I. (2008). “Middleware or robotics: a
doi:10.1109/jsen.2022.3185122
survey,” in 2008 IEEE Conerence on Robotics, Automation and Mechatronics (Ieee),
736–742. Rintar (2023). dtam-1. Available at: https://github.com/Rintarooo/dtam-1.
Mur-A, J. D., and ars, R. (2014). “Orb-slam: tracking and mapping recognizable,” in Roch, J., Fayyad, J., and Najjaran, H. (2023). Dopeslam: high-precision ros-based
Proceedings o the Workshop on Multi View Geometry in Robotics (MVIGRO)-RSS. semantic 3d slam in a dynamic environment. Sensors 23, 4364. doi:10.3390/s23094364
Mur-Artal, R., Montiel, J. M. M., and ardos, J. D. (2015). Orb-slam: a Ros, A., Abate, M., Chang, Y., and Carlone, L. (2020). “Kimera: an open-source library
versatile and accurate monocular slam system. IEEE rans. robotics 31, 1147–1163. or real-time metric-semantic localization and mapping,” in 2020 IEEE International
doi:10.1109/tro.2015.2463671 Conerence on Robotics and Automation (ICRA) (IEEE), 1689–1696.
Mur-Artal, R., and ardós, J. D. (2017a). Orb-slam2: an open-source slam system Rosinol, A., Violette, A., Abate, M., Hughes, N., Chang, Y., Shi, J., et al. (2021). Kimera:
or monocular, stereo, and rgb-d cameras. IEEE rans. robotics 33, 1255–1262. rom slam to spatial perception with 3d dynamic scene graphs. Int. J. Robotics Res. 40,
doi:10.1109/tro.2017.2705103 1510–1546. doi:10.1177/02783649211056674
Mur-Artal, R., and ardós, J. D. (2017b). Visual-inertial monocular slam with map Scaradozzi, D., Zingaretti, S., and Ferrari, A. (2018). Simultaneous localization and
reuse. IEEE Robotics Automation Lett. 2, 796–803. doi:10.1109/lra.2017.2653359 mapping (slam) robotics techniques: a possible application in surgery. Shanghai Chest
2, 5. doi:10.21037/shc.2018.01.01
Nakamura, ., Kobayashi, M., and Motoi, N. (2023). Path planning or mobile
robot considering turnabouts on narrow road by deep q-network. IEEE Access 11, Schneider, ., Dymczyk, M., Fehr, M., Egger, K., Lynen, S., Gilitschenski,
19111–19121. doi:10.1109/access.2023.3247730 I., et al. (2018). maplab: an open ramework or research in visual-inertial
mapping and localization. IEEE Robotics Automation Lett. 3, 1418–1425.
Navvis (2023). Map orming. Available at: https://www.navvis.com/technology/slam
doi:10.1109/lra.2018.2800113
(Accessed on November 14, 2023).
Schöps, ., Engel, J., and Cremers, D. (2014). “Semi-dense visual odometry or ar on
Ne, R. A., Lovegrove, S. J., and Davison, A. J. (2011). “Dtam: dense tracking and
a smartphone,” in 2014 IEEE international symposium on mixed and augmented reality
mapping in real-time,” in 2011 international conerence on computer vision (IEEE),
(ISMAR) (IEEE), 145–150.
2320–2327.
Servières, M., Renaudin, V., Dupuis, A., and Antigny, N. (2021). Visual and visual-
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., et al.
inertial slam: state o the art, classication, and experimental benchmarking. J. Sensors
(2011). “Kinectusion: real-time dense surace mapping and tracking,” in 2011 10th
2021, 1–26. doi:10.1155/2021/2054828
IEEE international symposium on mixed and augmented reality (Ieee), 127–136.
Sharautdinov, D., Griguletskii, M., Kopanev, P., Kurenkov, M., Ferrer, G., Burkov, A.,
Nguyen, Q. H., Johnson, P., and Latham, D. (2022). Perormance evaluation o ros-
et al. (2023). Comparison o modern open-source visual slam approaches. J. Intelligent
based slam algorithms or handheld indoor mapping and tracking systems. IEEE Sensors
Robotic Syst. 107, 43. doi:10.1007/s10846-023-01812-7
J. 23, 706–714. doi:10.1109/jsen.2022.3224224
Sheng, L., Xu, D., Ouyang, W., and Wang, X. (2019). “Unsupervised collaborative
Nguyen, ., Mann, G. K., Vardy, A., and Gosine, R. G. (2020). Ck-based
learning o keyrame detection and visual odometry towards monocular deep slam,”
visual inertial odometry or long-term trajectory operations. J. Robotics 2020, 1–14.
in Proceedings o the IEEE/CVF International Conerence on Computer Vision,
doi:10.1155/2020/7362952
4302–4311.
Niu, X., Liu, H., and Yuan, J. (2019). “Rgb-d indoor simultaneous location and
Sheridan, . B. (2016). Human–robot interaction: status and challenges. Hum. actors
mapping based on inliers tracking statistics,” in Journal o Physics: Conerence Series
58, 525–532. doi:10.1177/0018720816644364
(IOP Publishing), 1176, 062023.
Soares, J. C. V., Gattass, M., and Meggiolaro, M. A. (2021). Crowd-slam: visual slam
Ortega-Gomez, J. I., Morales-Hernandez, L. A., and Cruz-Albarran, I. A. (2023).
towards crowded environments using object detection. J. Intelligent Robotic Syst. 102,
A specialized database or autonomous vehicles based on the kitti vision benchmark.
50. doi:10.1007/s10846-021-01414-1
Electronics 12, 3165. doi:10.3390/electronics12143165
Soliman, A., Bonardi, F., Sidibé, D., and Bouchaa, S. (2023). Dh-ptam: a deep
Pal, S., Gupta, S., Das, N., and Ghosh, K. (2022). Evolution o simultaneous
hybrid stereo events-rames parallel tracking and mapping system. arXiv preprint
localization and mapping ramework or autonomous robotics—a comprehensive
arXiv:2306.01891
review. J. Aut. Veh. Syst. 2, 020801. doi:10.1115/1.4055161
Son, S., Chen, J., Zhong, Y., Zhang, W., Hou, W., and Zhang, L. (2023). Sce-slam: a Wang, Z., Pang, B., Song, Y., Yuan, X., Xu, Q., and Li, Y. (2023). Robust visual-inertial
real-time semantic rgbd slam system in dynamic scenes based on spatial coordinate odometry based on a kalman lter and actor graph. IEEE rans. Intelligent ransp. Syst.
error. Meas. Sci. echnol. 34, 125006. doi:10.1088/1361-6501/aceb7e 24, 7048–7060. doi:10.1109/tits.2023.3258526
Song, K., Li, J., Qiu, R., and Yang, G. (2022). Monocular visual-inertial Wu, W., Guo, L., Gao, H., You, Z., Liu, Y., and Chen, Z. (2022). Yolo-slam: a semantic
odometry or agricultural environments. IEEE Access 10, 103975–103986. slam system towards dynamic environment with geometric constraint. Neural Comput.
doi:10.1109/access.2022.3209186 Appl. 34, 6011–6026. doi:10.1007/s00521-021-06764-3
Song, Y., Zhang, Z., Wu, J., Wang, Y., Zhao, L., and Huang, S. (2021). A right Xiao, L., Wang, J., Qiu, X., Rong, Z., and Zou, X. (2019). Dynamic-slam: semantic
invariant extended kalman lter or object based slam. IEEE Robotics Automation Lett. monocular visual localization and mapping based on deep learning in dynamic
7, 1316–1323. doi:10.1109/lra.2021.3139370 environment. Robotics Aut. Syst. 117, 1–16. doi:10.1016/j.robot.2019.03.012
Sousa, R. B., Sobreira, H. M., and Moreira, A. P. (2023). A systematic literature Xu, C., Liu, Z., and Li, Z. (2021). Robust visual-inertial navigation system or low
review on long-term localization and mapping or mobile robots. J. Field Robotics 40, precision sensors under indoor and outdoor environments. Remote Sens. 13, 772.
1245–1322. doi:10.1002/rob.22170 doi:10.3390/rs13040772
Steinbrücker, F., Sturm, J., and Cremers, D. (2011). “Real-time visual odometry Yan, L., Hu, X., Zhao, L., Chen, Y., Wei, P., and Xie, H. (2022). Dgs-slam: a ast
rom dense rgb-d images,” in 2011 IEEE international conerence on computer vision and robust rgbd slam in dynamic environments combined by geometric and semantic
workshops (ICCV Workshops) (IEEE), 719–722. inormation. Remote Sens. 14, 795. doi:10.3390/rs14030795
Strazdas, D., Hintz, J., Felßberg, A.-M., and Al-Hamadi, A. (2020). Robots and Yang, X., Li, H., Zhai, H., Ming, Y., Liu, Y., and Zhang, G. (2022). “Vox-usion:
wizards: an investigation into natural human–robot interaction. IEEE Access 8, dense tracking and mapping with voxel-based neural implicit representation,” in 2022
207635–207642. doi:10.1109/access.2020.3037724 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (IEEE),
499–507.
Sumikura, S., Shibuya, M., and Sakurada, K. (2019). “Openvslam: a versatile visual
slam ramework,” in Proceedings o the 27th ACM International Conerence on Yousi, K., Bab-Hadiashar, A., and Hoseinnezhad, R. (2015). An overview to visual
Multimedia, 2292–2295. odometry and visual slam: applications to mobile robotics. Intell. Ind. Syst. 1, 289–311.
doi:10.1007/s40903-015-0032-7
Sun, Y., Liu, M., and Meng, M. Q.-H. (2017). Improving rgb-d slam in
dynamic environments: a motion removal approach. Robotics Aut. Syst. 89, 110–122. Zang, Q., Zhang, K., Wang, L., and Wu, L. (2023). An adaptive orb-slam3 system or
doi:10.1016/j.robot.2016.11.012 outdoor dynamic environments. Sensors 23, 1359. doi:10.3390/s23031359
aheri, H., and Xia, Z. C. (2021). Slam; denition and evolution. Eng. Appl. Arti. Zhang, J., Zhu, C., Zheng, L., and Xu, K. (2021a). Roseusion: random optimization
Intell. 97, 104032. doi:10.1016/j.engappai.2020.104032 or online dense reconstruction under ast camera motion. ACM rans. Graph. (OG)
40, 1–17. doi:10.1145/3476576.3476604
aketomi, ., Uchiyama, H., and Ikeda, S. (2017). Visual slam algorithms: a survey
rom 2010 to 2016. IPSJ rans. Comput. Vis. Appl. 9, 16–11. doi:10.1186/s41074-017- Zhang, Q., and Li, C. (2023). Semantic slam or mobile robots in dynamic
0027-2 environments based on visual camera sensors. Meas. Sci. echnol. 34, 085202.
doi:10.1088/1361-6501/acd1a4
Teodorou, C., Velisavljevic, V., Dyo, V., and Nonyelu, F. (2022). Visual slam
algorithms and their application or ar, mapping, localization and waynding. Array Zhang, S., Zheng, L., and ao, W. (2021b). Survey and evaluation o rgb-d slam. IEEE
15, 100222. doi:10.1016/j.array.2022.100222 Access 9, 21367–21387. doi:10.1109/access.2021.3053188
ian, Y., Chang, Y., Quang, L., Schang, A., Nieto-Granda, C., How, J. P., et al. (2023a). Zhang, W., Wang, S., Dong, X., Guo, R., and Haala, N. (2023). Bam-slam: bundle
Resilient and distributed multi-robot visual slam: datasets, experiments, and lessons adjusted multi-sheye visual-inertial slam using recurrent eld transorms. arXiv
learned. arXiv preprint arXiv:2304.04362. preprint arXiv:2306.01173
ian, Y., Chang, Y., Quang, L., Schang, A., Nieto-Granda, C., How, J. P., et al. (2023b). Zhang, X., Liu, Q., Zheng, B., Wang, H., and Wang, Q. (2020). A visual
Resilient and distributed multi-robot visual slam: datasets, experiments, and lessons simultaneous localization and mapping approach based on scene segmentation
learned. arXiv preprint arXiv:2304.04362. and incremental optimization. Int. J. Adv. Robotic Syst. 17, 172988142097766.
doi:10.1177/1729881420977669
ourani, A., Bavle, H., Sanchez-Lopez, J. L., and Voos, H. (2022). Visual slam: what
are the current trends and what to expect? Sensors 22, 9297. doi:10.3390/s22239297 Zhang, X., Su, Y., and Zhu, X. (2017). “Loop closure detection or visual slam
systems using convolutional neural network,” in 2017 23rd International Conerence
sintotas, K. A., Bampis, L., and Gasteratos, A. (2022). Te revisiting problem in
on Automation and Computing (ICAC) (IEEE), 1–6.
simultaneous localization and mapping: a survey on visual loop closure detection. IEEE
rans. Intelligent ransp. Syst. 23, 19929–19953. doi:10.1109/tits.2022.3175656 Zheng, P., Li, S., Xia, L., Wang, L., and Nassehi, A. (2022). A visual reasoning-based
approach or mutual-cognitive human-robot collaboration. CIRP Ann. 71, 377–380.
tum.v (2023). DVO-SLAM: direct visual odometry or monocular cameras. Available
doi:10.1016/j.cirp.2022.04.016
at: https://github.com/tum-vision/dvo_slam.
Zheng, S., Wang, J., Rizos, C., Ding, W., and El-Moway, A. (2023). Simultaneous
Ullah, I., Su, X., Zhang, X., and Choi, D. (2020). Simultaneous localization and
localization and mapping (slam) or autonomous driving: concept and analysis. Remote
mapping based on kalman lter and extended kalman lter. Wirel. Commun. Mob.
Sens. 15, 1156. doi:10.3390/rs15041156
Comput. 2020, 1–12. doi:10.1155/2020/2138643
Zhou, L., Koppel, D., Ju, H., Steinbruecker, F., and Kaess, M. (2020). “An ecient
Van Nam, D., and Gon-Woo, K. (2021). “Solid-state lidar based-slam: a concise
planar bundle adjustment algorithm,” in 2020 IEEE International Symposium on Mixed
review and application,” in 2021 IEEE International Conerence on Big Data and Smart
and Augmented Reality (ISMAR) (IEEE), 136–145.
Computing (BigComp) (IEEE), 302–305.
Zhu, Z., Peng, S., Larsson, V., Xu, W., Bao, H., Cui, Z., et al. (2022). Nice-slam: neural
Wang, H., Ko, J. Y., and Xie, L. (2022). Multi-modal semantic slam or complex dynamic
implicit scalable encoding or slam , 12786–12796.
environments. arXiv preprint arXiv:2205.04300.