0% found this document useful (0 votes)
5 views

06_machines-13-00130

This research presents a novel deep learning-based adaptive multi-sensor fusion framework for UAV state estimation, integrating data from various sensors to enhance robustness in challenging environments. Utilizing a long short-term memory (LSTM) model, the system dynamically adjusts sensor fusion weights in real-time, outperforming traditional methods that rely on static weights. Experimental validation demonstrates the framework's effectiveness in maintaining accurate state estimation under diverse conditions, paving the way for improved UAV autonomy and reliability.

Uploaded by

rjhz9kmw9m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

06_machines-13-00130

This research presents a novel deep learning-based adaptive multi-sensor fusion framework for UAV state estimation, integrating data from various sensors to enhance robustness in challenging environments. Utilizing a long short-term memory (LSTM) model, the system dynamically adjusts sensor fusion weights in real-time, outperforming traditional methods that rely on static weights. Experimental validation demonstrates the framework's effectiveness in maintaining accurate state estimation under diverse conditions, paving the way for improved UAV autonomy and reliability.

Uploaded by

rjhz9kmw9m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

machines

Article

LSAF-LSTM-Based Self-Adaptive Multi-Sensor Fusion for


Robust UAV State Estimation in Challenging Environments †
Mahammad Irfan 1, * , Sagar Dalai 1 , Petar Trslic 2 , James Riordan 3 and Gerard Dooly 2

1 CRIS Research Group, Department of Electronic and Computer Engineering, University of Limerick,
V94 T9PX Limerick, Ireland; [email protected]
2 School of Engineering, University of Limerick, The Lonsdale Building, V94 T9PX Limerick, Ireland;
[email protected] (P.T.); [email protected] (G.D.)
3 Drone Systems Lab, School of Computing Engineering and Physical Science, University of the West of
Scotland, Glasgow G72 0LH, UK; [email protected]
* Correspondence: [email protected]; Tel.: +353-899812440
† This paper is an extended version of our paper published in Irfan, M.; Dalai, S.; Vishwakarma, K.; Trslic, P.;
Riordan, J.; Dooly, G. Multi-Sensor Fusion for Efficient and Robust UAV State Estimation. In Proceedings of
the 2024 12th International Conference on Control, Mechatronics and Automation (ICCMA), London, UK,
11–13 November 2024.

Abstract: Unmanned aerial vehicle (UAV) state estimation is fundamental across applica-
tions like robot navigation, autonomous driving, virtual reality (VR), and augmented reality
(AR). This research highlights the critical role of robust state estimation in ensuring safe
and efficient autonomous UAV navigation, particularly in challenging environments. We
propose a deep learning-based adaptive sensor fusion framework for UAV state estimation,
integrating multi-sensor data from stereo cameras, an IMU, two 3D LiDAR’s, and GPS.
The framework dynamically adjusts fusion weights in real time using a long short-term
memory (LSTM) model, enhancing robustness under diverse conditions such as illumi-
nation changes, structureless environments, degraded GPS signals, or complete signal
loss where traditional single-sensor SLAM methods often fail. Validated on an in-house
integrated UAV platform and evaluated against high-precision RTK ground truth, the
algorithm incorporates deep learning-predicted fusion weights into an optimization-based
odometry pipeline. The system delivers robust, consistent, and accurate state estimation,
Academic Editor: Xinli Du outperforming state-of-the-art techniques. Experimental results demonstrate its adaptabil-
Received: 15 January 2025 ity and effectiveness across challenging scenarios, showcasing significant advancements
Revised: 31 January 2025 in UAV autonomy and reliability through the synergistic integration of deep learning and
Accepted: 7 February 2025 sensor fusion.
Published: 9 February 2025

Citation: Irfan, M.; Dalai, S.; Trslic, P.; Keywords: ROS; LSTM; adaptive fusion; multi-sensor fusion; state estimation; UAV;
Riordan, J.; Dooly, G. LiDAR-visual-inertial odometry; MSCKF
LSAF-LSTM-Based Self-Adaptive
Multi-Sensor Fusion for Robust UAV
State Estimation in Challenging
Environments. Machines 2025, 13, 130.
https://doi.org/10.3390/
1. Introduction
machines13020130 Recent advances in computing power, sensor technologies, and machine learning have
Copyright: © 2025 by the authors.
significantly fueled interest in autonomous unmanned aerial vehicles (UAVs), also known
Licensee MDPI, Basel, Switzerland. as drones. These systems have become indispensable across a wide range of applications,
This article is an open access article including robot navigation, autonomous driving, virtual reality (VR), augmented reality
distributed under the terms and (AR), environmental monitoring, delivery services, and disaster response. In such contexts,
conditions of the Creative Commons
navigation and positioning are essential to ensuring the UAV’s operational accuracy, safety,
Attribution (CC BY) license
and efficiency. Modern UAVs heavily rely on sensor fusion techniques to provide robust
(https://creativecommons.org/
licenses/by/4.0/).
state estimation that enables them to operate autonomously, even in complex or dynamic

Machines 2025, 13, 130 https://doi.org/10.3390/machines13020130


Machines 2025, 13, 130 2 of 29

environments. Beyond UAVs, sensor fusion plays a vital role in the Internet of Vehicles
(IoV), autonomous robots, and other emerging technologies [1,2].
The field of state estimation in navigation and control systems for autonomous robots
has evolved significantly over the years, driven by technological advancements in sensor
hardware and computational algorithms. State estimation involves deriving accurate
information about a system’s position, velocity, and orientation based on sensor data. While
single-sensor solutions have been extensively studied, their limitations have increasingly
motivated research into multi-sensor fusion approaches. These approaches leverage the
complementary characteristics of diverse sensors to overcome the constraints of individual
sensors and enhance the accuracy, robustness, and resilience of state estimation systems [3].
Despite the progress made, achieving robust, accurate, and seamless navigation and
positioning solutions remains a major challenge when relying solely on single-sensor
systems. For example, the inertial navigation system (INS), which relies on accelerometers
and gyroscopes to compute relative positions, is highly accurate only for short durations.
Over time, the accumulation of sensor noise and integration errors causes significant drift.
Similarly, GPS, while offering absolute positioning data, is effective primarily in open
sky environments but is prone to signal blockage, multipath interference, and degraded
performance in urban canyons, dense forests, or indoor environments. These limitations
demand the integration of additional sensor types, such as cameras, LiDAR, and IMU, to
ensure robust state estimation with enhanced spatial and temporal coverage.
Visual inertial navigation systems (VINS) [4] have emerged as a cost effective and
practical solution for state estimation in UAVs, combining visual and inertial data to
achieve higher accuracy. However, VINS performance in complex environments is often
hindered by its susceptibility to changing illumination, low texture regions, and dynamic
obstacles. LiDAR, on the other hand, provides accurate distance measurements and
operates independently of lighting conditions. Its growing affordability and precision
have made it a popular choice for UAVs. Nonetheless, LiDAR systems face challenges
related to sparse data and difficulty in extracting semantic information. Similarly, vision-
based approaches using monocular or stereo cameras struggle with initialization, sensitivity
to illumination changes, and distance variability. These challenges highlight the need for
multi-sensor fusion, where the strengths of different sensors are combined to overcome
individual shortcomings.
In recent years, multi sensor fusion approaches have advanced significantly, enabling
UAVs to achieve real-time, high-precision positioning and mapping. For example, integrat-
ing GPS with IMU data mitigates inertial navigation drift and improves noise filtering in
complex environments. Incorporating LiDAR and visual data further enhances accuracy
by providing rich spatial and semantic information. However, traditional sensor fusion
methods often rely on static weighting of sensor inputs, which can lead to suboptimal
performance in dynamic or degraded scenarios. These limitations have driven research
toward adaptive sensor fusion techniques that dynamically adjust sensor contributions
based on real-time environmental conditions and sensor reliability [5,6].
Recent advancements in deep learning have introduced a powerful paradigm for
adaptive sensor fusion. Deep learning models, such as long short-term memory (LSTM)
networks, can effectively learn temporal dependencies in sensor data and adaptively com-
pute fusion weights based on real-time input. This capability allows UAVs to dynamically
prioritize reliable sensors and minimize the impact of degraded or faulty sensor data. Such
adaptability is particularly valuable in scenarios involving sudden illumination changes,
feature deprived environments, degraded GPS signals, or complete signal loss where
traditional single-sensor systems and static-weight fusion approaches often fail.
Machines 2025, 13, 130 3 of 29

This paper presents a novel, deep learning-based adaptive multi-sensor fusion frame-
work for UAV state estimation. The proposed framework integrates stereo cameras, IMU,
LiDAR sensors, and GPS-RTK data into a unified system, which is depicted in Figure 1.
A long short-term memory (LSTM) model is used to dynamically compute sensor fusion
weights in real time, ensuring robust, accurate, and consistent state estimation under diverse
conditions. Unlike conventional methods that rely on fixed sensor weights, our approach
leverages the real-time adaptability of deep learning to optimize sensor contributions based
on environmental and operational factors.
Our approach is validated on an in-house UAV platform equipped with an internally
integrated and calibrated sensor suite. The system is evaluated against high-precision
RTK ground truth, demonstrating its ability to maintain robust state estimation in both
GPS-enabled and GPS-denied scenarios. The algorithms autonomously determine relevant
sensor data, leveraging stereo inertial or LiDAR inertial odometry outputs to ensure global
positioning in the absence of GPS.

Figure 1. Proposed architecture for LSTM-based self-adaptive multi-sensor fusion (LSAF).

The major contributions of this research are as follows:


• We propose an innovative multi-sensor fusion system integrating a VGA stereo camera,
two 3D LiDAR sensors, a nine-degree-of-freedom IMU, and optimized GPS-RTK
networking to achieve precise UAV state estimation.
• A deep learning-based adaptive weighting mechanism is implemented using LSTM
to dynamically adjust sensor contributions, ensuring robust state estimation across
diverse and challenging environments.
• A commercial UAV equipped with an internally integrated and calibrated sensor
platform is used to collect complex datasets, enabling robust evaluation of the pro-
posed method.
• Extensive evaluations confirm the efficacy and performance of the stereo-visual-LiDAR
fusion framework, demonstrating high efficiency, robustness, consistency, and accu-
racy in challenging scenarios.
By addressing the limitations of traditional methods and introducing dynamic adapt-
ability through deep learning, this work significantly advances the field of UAV state
estimation, paving the way for more reliable autonomous navigation systems.

2. Related Work
In recent decades, many innovative approaches for UAV state estimation have been
proposed, leveraging different types of sensors. Among these, vision-based and LiDAR-
based methods have gained substantial attention due to their ability to provide rich en-
vironmental data for accurate localization and mapping. Researchers have extensively
Machines 2025, 13, 130 4 of 29

explored the fusion of visual and inertial sensors, given their complementary properties in
addressing UAV navigation challenges [7].
For state estimation, sensors such as IMUs are frequently used in fusion designs
that can be broadly categorized into loosely coupled and tightly coupled approaches. In
loosely coupled systems, sensor outputs are independently processed and subsequently
fused, offering simplicity and flexibility when integrating diverse sensors. However, tightly
coupled systems have gained increasing preference due to their ability to process raw
sensor data directly, such as utilizing raw IMU measurements in pose estimation. This
allows for more accurate state estimation, especially in scenarios with high dynamic motion
or challenging environmental conditions. Papers [8,9] propose tightly coupled methods
that integrate visual and inertial data for efficient and robust state estimation. By exploiting
the raw data from IMU and cameras, these methods address issues like drift and improve
system robustness compared to loosely coupled alternatives.

2.1. Multi-Sensor Fusion Approaches


Current multi-sensor fusion methods can be broadly classified into filtering-based,
optimization-based, and deep learning-based approaches [10].

2.1.1. Filtering-Based Methods


Filtering-based methods, such as the extended Kalman filter (EKF) and unscented
Kalman filter (UKF), have been widely adopted for sensor fusion due to their computational
efficiency and ability to handle real-time applications. These methods assume Gaussian
noise and rely on linearization techniques to model system dynamics. However, their
performance deteriorates in the presence of nonlinear models or non-Gaussian noise distri-
butions. Furthermore, their reliance on static sensor weightings can result in suboptimal
performance in dynamic and unpredictable environments.

2.1.2. Optimization-Based Methods


Optimization-based approaches address the limitations of filtering methods by for-
mulating the state estimation problem as an optimization task. These methods, such as
bundle adjustment (BA) and factor graph optimization (FGO), are well suited for handling
nonlinearities and non-Gaussian noise. Although optimization methods are computa-
tionally more demanding, they provide higher precision and robustness, making them
popular for applications requiring high accuracy, such as simultaneous localization and
mapping (SLAM). For example, techniques that combine visual, inertial, and LiDAR data in
optimization frameworks have demonstrated significant improvements in state estimation
accuracy in diverse scenarios.

2.1.3. Deep Learning-Based Methods


With the rapid advancements in deep learning, researchers have increasingly explored
neural network-based algorithms for sensor fusion and state estimation [11]. These methods
leverage the ability of neural networks to learn complex, nonlinear relationships directly
from data. For instance, networks designed for depth estimation and motion representation
from image sequences have shown promise in improving pose estimation accuracy and
robustness. Furthermore, neural networks can dynamically adapt sensor fusion weights
based on real-time sensor reliability, enabling more robust state estimation in dynamic
environments. However, the high computational cost and the need for extensive training
data remain significant challenges for deploying deep learning-based methods in real-time
UAV applications.
Machines 2025, 13, 130 5 of 29

2.2. Sensor-Specific Contributions


2.2.1. Vision-Based SLAM
Vision based approaches, such as monocular or stereo visual SLAM, utilize cameras to
map the environment and estimate the UAV’s pose. These methods offer a cost-effective
solution but are highly sensitive to illumination changes, feature poor environments, and
dynamic objects. Moreover, challenges such as scale ambiguity in monocular systems and
computational overhead in stereo systems limit their widespread application.

2.2.2. LiDAR-Based SLAM


LiDAR systems generate dense 3D point clouds of the environment, providing high-
precision spatial information that is resilient to lighting variations. Compared to vision-
based SLAM, LiDAR-based SLAM demonstrates superior performance in feature-poor
or dynamic environments [12]. However, LiDAR data are inherently sparse and lacks
semantic information, necessitating integration with other sensors such as cameras and
IMUs for robust state estimation.

2.2.3. Multi-Sensor Fusion for SLAM


Recent studies highlight the importance of integrating complementary sensor types,
such as cameras, LiDAR, IMU, and GNSS, to achieve robust and efficient SLAM-based
navigation [13,14]. For instance, adding visual, LiDAR, or inertial factors enhances SLAM
systems by improving robustness and state estimation accuracy [15,16]. Combining LiDAR
and visual data mitigates the limitations of each sensor, while IMUs provide continuous
data for motion prediction and noise filtering. The integration of GPS and GNSS further en-
sures resilience against environmental variability and provides accurate global positioning
to address drift in large-scale environments.

2.3. Challenges and Opportunities


While current state estimation techniques show significant promise, challenges such as
accumulated drift, sensitivity to environmental factors, and limited adaptability in dynamic
scenarios persist. To address these issues, adaptive multi-sensor fusion techniques that
dynamically adjust sensor weights based on environmental and operational factors have
emerged as a promising solution. For example, learning-based frameworks leverage the
adaptability of neural networks to dynamically compute sensor fusion weights, improving
resilience and robustness in degraded conditions. Table 1. represents the summary of the
comparison between LSAF and other state-of-the-art methods.
The proposed (LSAF) LSTM-based dynamic weight adjustment differs from existing
methods by integrating LSTM-derived adaptive weights into MSCKF for real-time UAV
state estimation, rather than just optimizing fusion weights offline. Unlike prior works, our
approach employs an attention-based mechanism within LSTM to dynamically prioritize
sensor reliability at each time step, ensuring robustness in SLAM-based pose estimation.
Additionally, our hierarchical fusion strategy combines LSTM, SLAM, and MSCKF, making
it more adaptable to real-world UAV applications, especially in GPS-denied and dynamic
environments. These innovations differentiate our work from conventional LSTM-based
fusion techniques.
Machines 2025, 13, 130 6 of 29

Table 1. Summary of the comparison between proposed LSAF vs. state-of-the-art methods.

Dynamic
Method Sensors Fusion Strategy Adaptability Key Strengths Key Limitations
Weighting
LSTM-based Robust to sensor Requires model
LSAF Stereo, IMU,
Adaptive Yes High degradation and training and
(Proposed) GPS, LiDAR
Fusion + MSKF dynamic targets tuning
High drift in
VINS- Efficient
Stereo, IMU, Predefined GPS-denied or
Fusion No Moderate visual-inertial
GPS Weighting dynamic
(2018) [4] fusion
environments
Degrades in
FASTLIO2 Predefined High accuracy in dynamic or
LiDAR, IMU No Moderate
(2022) [17] Weighting LiDAR-rich areas sparse
environments
Fails in
ORB- Visual SLAM Loop-closure-
Low to low-texture or
SLAM3 Stereo, IMU with Fixed No enabled global
Moderate poor-light
(2020) [18] Weights optimization
conditions
Multi-State Fast and
MSCKF Low to Limited accuracy;
Stereo, IMU Constraint No computationally
(2017 [19]) Moderate lacks adaptability
Kalman Filter efficient
Robust Poor performance
OKVIS Probabilistic Low to
Stereo, IMU No visual-inertial in dynamic
(2015) [20] Fusion Moderate
tracking environments
LiDAR Accurate
LOAM Computationally
LiDAR, IMU Odometry + No Moderate LiDAR-based
(2014) [21] intensive
Mapping mapping
Deep Learning Requires
DLIO (2021) Partial (Learned Data-driven
LiDAR, IMU for LiDAR Moderate extensive training
[22] Weights) LiDAR odometry
Odometry data
Accurate in High drift in
R-VIO
Stereo, IMU Fixed Weights No Moderate small-scale large-scale
(2021) [23]
scenarios environments
Fails in
Accurate global
GPS-denied or
GPS Only GPS Single-Sensor No Low positioning in
multipath
open areas
environments
Combines LiDAR Requires high
LVI-SAM LiDAR, IMU, Joint Moderate to
No and visual computational
(2021) [24] Stereo Optimization High
optimization power
Needs large
Deep Learns fusion
DeepVIO Yes (Learned datasets;
Stereo, IMU Learning-based Moderate weights; robust in
(2020) [25] Weights) computationally
Visual-Inertial some cases
intensive
Combines
SC-LIO- semantic
Joint LiDAR High dependency
SAM (2022) LiDAR, IMU No High segmentation
SLAM on LiDAR quality
[26] with LiDAR
SLAM

This paper builds on this body of work by proposing a novel framework that combines
the strengths of optimization-based and deep learning-based approaches. Using long-short-
term memory (LSTM) networks, our method dynamically computes sensor fusion weights
in real time, adapting to environmental conditions and sensor reliability. This framework
Machines 2025, 13, 130 7 of 29

integrates stereo cameras, LiDAR, IMU, and GPS-RTK data into a unified system, achieving
superior performance in both GPS-enabled and GPS-denied scenarios.

3. Methodology
This research aims to achieve robust and accurate UAV state estimation by integrating
measurements from multiple sensors, including GPS, stereo cameras, LiDARs, and IMUs,
into a unified framework. The proposed system combines a multi-state constraint Kalman
filter (MSCKF) [27] with a long short-term memory (LSTM)-based self-adaptive sensor
fusion mechanism. This hybrid framework dynamically adjusts sensor fusion weights
based on real-time environmental conditions and sensor reliability, ensuring consistent
performance in challenging scenarios, such as GPS-degraded environments, rapid motion,
and feature-deprived areas.

3.1. Coordinate Systems and Sensor Calibration


To ensure consistency across multi-sensor measurements, the system defines two
primary coordinate systems—the world frame (W) and the UAV body frame (B)—as can
be seen in the Figure 2. These systems represent the proposed LSAF framework. The
body frame is aligned with the IMU frame for simplicity, as the IMU serves as the central
reference for state propagation. Local sensors, such as stereo cameras, LiDARs, and IMUs,
measure relative motion and require initialization of their reference frames. Initialization is
typically performed by setting the UAV’s first pose as the origin. Global sensors, such as
GPS, operate in an Earth-centered global coordinate frame and provide absolute positioning
measurements. GPS data, expressed as latitude, longitude, and altitude, are converted into
Cartesian coordinates (x, y, z) for consistency with local sensor measurements.

Figure 2. An illustration of the proposed LSAF framework. The global estimator combines local
estimations from various global sensors to achieve precise local accuracy and globally drift free pose
estimation, which builds upon our previous work [28].

Offline calibration of all sensors is performed to reduce measurement biases, align


coordinate frames, and ensure accurate fusion of data. This calibration accounts for sensor-
specific offsets, such as biases in IMU accelerometers and gyroscopes, misalignment of
LiDAR and camera frames, and GPS inaccuracies due to multipath effects or environmental
interference. The calibration process ensures that measurements from all sensors are
consistent and directly comparable within the fusion framework.
Machines 2025, 13, 130 8 of 29

3.2. State Representation and Propagation


The UAV’s motion is modeled using a six-degree-of-freedom (6-DOF) representation,
including position, velocity, orientation, and sensor biases. The state vector x is defined
as follows:  
p
v
 
x =  q , (1)
 
 
 ba 
bg

where p ∈ R3 is the position of the UAV, v ∈ R3 is the velocity, q ∈ R4 is the orientation


represented as a quaternion, b a ∈ R3 is the accelerometer bias, and b g ∈ R3 is the gyro-
scope bias. The state is propagated forward in time using IMU measurements of linear
acceleration (am ) and angular velocity (ωm ) as follows:

ṗ = v, (2)
v̇ = R(q)(am − b a − n a ) + g, (3)
1 
q̇ = q ⊗ ωm − b g − n g , (4)
2
ḃ a = nba , (5)
ḃ g = nbg . (6)

Here, R(q) represents the rotation matrix derived from the quaternion q, and g is the
gravity vector. The terms n a , n g , nba , and nbg denote process noise, modeled as zero-mean
Gaussian distributions.

3.3. Measurement Models for Multi-Sensor Integration


Each sensor provides measurements that are incorporated into the fusion framework
through dedicated measurement models. These models relate sensor observations to
the UAV’s state, ensuring accurate integration. The key measurement models are as
described below.
1. IMU measurements provide linear acceleration and angular velocity. These are
modeled as follows:

am = R(q) T (v̇ − g) + b a + n a , (7)


ωm = ω + b g + n g . (8)

2. GPS provides absolute position measurements in the global frame:

zGPS = p + nGPS , (9)

where nGPS denotes measurement noise.


3. LiDAR generates 3D point clouds, providing precise spatial measurements:

p L = R IL (p − p I ) + nLiDAR , (10)

where R IL is the rotation matrix from the IMU to LiDAR frame, and p I is the IMU’s position.
4. Stereo Camera provide 2D projections of 3D feature points:

x y
u = fx + cx , v = fy + cy , (11)
z z
Machines 2025, 13, 130 9 of 29

where ( x, y, z) are the 3D coordinates of a feature in the camera frame, and (u, v) are the
corresponding pixel coordinates.

3.4. Self-Adaptive Fusion with LSTM


Accurate state estimation for autonomous UAVs in dynamic and uncertain environ-
ments remains a critical challenge. Traditional sensor fusion methods such as the multi-state
constraint Kalman filter (MSCKF) assume fixed measurement noise covariance (R), which
limits their ability to adapt to varying sensor reliability. To address this limitation, this
work introduces a long short-term memory (LSTM)-based self-adaptive fusion framework,
which dynamically adjusts the measurement noise covariance for each sensor based on
real-time reliability assessments. By leveraging temporal dependencies in sensor data, the
proposed approach improves robustness to environmental variations, sensor degradation,
and measurement inconsistencies.
The LSTM model takes the following characteristics as input key features indicative of
sensor reliability: GPS signal strength, visual feature density, LiDAR point cloud density,
and IMU noise levels. These features are processed over time to generate adaptive fusion
weights, which are used to modify the sensor measurement models dynamically. The
LSTM network is trained offline on a dataset comprising diverse environmental conditions,
including urban landscapes, forested areas, and GPS-denied spaces, with ground truth
obtained from GPS-RTK and high-accuracy SLAM systems. The ability of the LSTM
to learn and generalize from these varied conditions enables it to adjust sensor fusion
parameters optimally in real time, improving the overall accuracy and robustness of UAV
state estimation.
Table 2 summarizes the LSTM-based self-adaptive multi-sensor fusion (LSAF) frame-
work, which enhances UAV state estimation by dynamically weighting multi-sensor inputs.
The framework integrates data from GPS, IMU, stereo cameras, and LiDAR, leveraging
an LSTM model to extract temporal dependencies and compute adaptive sensor reliabil-
ity scores. These weights dynamically adjust sensor contributions to SLAM-based pose
estimation and MSCKF-based state correction, improving accuracy and robustness. The
model architecture comprises two LSTM layers followed by a time-distributed dense layer,
trained using the mean squared error (MSE) loss function and optimized via the Adam
optimizer over 1000 epochs. Unlike traditional fusion techniques, the LSTM updates sensor
weights at each time step, allowing for real-time adaptation to environmental variations.
By assigning higher weightage to more reliable sensors, the system ensures precise state
estimation, particularly in GPS-denied environments, high-speed maneuvers, and feature-
less conditions, ultimately enhancing UAV navigation and autonomous flight performance.
Algorithm 1 represent the proposed LSAF process training phase steps.

Table 2. Summary of the LSTM-based self-adaptive multi-sensor fusion (LSAF) framework.

Component Description
GPS-RTK (latitude, longitude, altitude), IMU (linear acceleration, angular
Input Sensors
velocity),
Stereo Cameras (left and right image illumination), LiDAR (point cloud density)
Input Shape ( N, T, 12) (Number of samples, time steps, sensor features)
Output Shape ( N, T, 4) (Sensor reliability weights for fusion)
LSTM Architecture Two LSTM layers (128, 64 units) followed by a Time-Distributed Dense Layer
Loss Function Mean Squared Error (MSE)
Optimizer Adam
Training Epochs 1000
Batch Size 32
Weight Updates At each time step (Real-time adjustment)
Machines 2025, 13, 130 10 of 29

Algorithm 1 Proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) training phase


1: Input:
• St = {st1 , st2 , . . . , stSN }: Multi-sensor measurements
• G: Ground truth values
• ϕLSTM : LSTM model parameters
• x̂t−1|t−1 , Pt−1|t−1 : Initial state estimate and covariance
• R, Q: Measurement and process noise covariance
• η: Convergence threshold
2: Output:
• Final estimated state: x̂t|t
• Final covariance: Pt|t
3: Step 1: Initialization
• Initialize LSTM model parameters: ϕLSTM
• Set noise covariances: R, Q
• Define training parameters: number of epochs N, learning rate α
4: Step 2: Training Phase (For each epoch N)
5: for epoch = 1 to N do
• Encode Sensor Data: Compute hidden states using LSTM:

ht = LSTM(St ; ϕLSTM )

• Compute Adaptive Weights: Use attention mechanism:

wt,i = Attention(ht , st,i )

• Update Noise Covariance: Adjust sensor uncertainty:

N
Rt = ∑ wt,i · Ri
i =1

• Predict Next State: Using motion model:

x t | t −1 = f ( x t −1 ), Pt|t−1 = Ft−1 Pt−1 FtT−1 + Q

• Compute Kalman Gain: Optimize state estimation:

Kt = Pt|t−1 HtT (Ht Pt|t−1 HtT + Rt )−1

• State and Covariance Update: Using Kalman filter:

xt|t = xt|t−1 + Kt (zt − h(xt|t−1 ))

P t | t = ( I − K t H t ) P t | t −1
• Compute Loss: Evaluate using Mean Squared Error (MSE):

N
1
L( x̂t , gt ) =
N ∑ (x̂t − gt )2
t =1

• Update Model Parameters: Adjust LSTM weights using Adam optimizer:

ϕLSTM ← ϕLSTM − α · ∇ L

6: end for
7: Step 3: Output the Final State Estimation
• Return final estimated state x̂t|t and covariance Pt|t
Machines 2025, 13, 130 11 of 29

3.5. Proposed LSTM-Based Multi-Sensor Fusion Architecture


The proposed LSTM-based multi-sensor fusion framework is designed to effectively
integrate long-term temporal dependencies into sensor data, enabling robust and adaptive
fusion. The architecture, illustrated in Figure 3, consists of two sequential LSTM layers
followed by a time-distributed dense layer, ensuring optimal processing of time-series
sensor inputs.

Multi-Sensor Input LSTM Layer (128 Units) LSTM Layer (64 Units) Time-Distributed Dense Predicted UAV State
( Xt ) Captures Long-Term Dependencies Refines Temporal Patterns Fully Connected Feature Mapping (Yt )
(Batch, Time Steps, Features)

Figure 3. Proposed LSTM-based multi-sensor fusion architecture for UAV state estimation.

The proposed architecture is designed to efficiently process sequential multi-sensor


data for adaptive state estimation in UAV applications. At the core of this framework is the
multi-sensor input layer, which aggregates data from various sources, including inertial
measurement units (IMU), LiDAR, GPS, and stereo cameras. This structured representation
ensures that the model can effectively capture variations in sensor reliability over time,
providing a robust foundation for subsequent processing. By concatenating information
from different sensor modalities, the input layer creates a time-series feature space that
allows the network to analyze both spatial and temporal correlations in sensor data.
The first LSTM layer, comprising 128 units, plays a crucial role in capturing long-term
dependencies in sensor reliability. Since real-world sensor data exhibit complex temporal
dynamics, this layer enables the model to recognize patterns related to sensor degradation,
noise fluctuations, and environmental interference. By leveraging its ability to retain past
information through memory cells, the LSTM network ensures that historical context is
incorporated into the state estimation process, allowing for more informed predictions. This
is particularly valuable in scenarios where certain sensors intermittently provide unreliable
measurements due to external disturbances or occlusions.
Following this, the second LSTM layer, consisting of 64 units, is responsible for
refining the temporal features extracted by the first layer. This secondary processing stage
reduces the dimensionality of the extracted feature set while preserving the most relevant
sequential information. By compressing high-dimensional sensor data into a more compact
representation, the network becomes more efficient in distinguishing meaningful trends
from noise. The stacking of LSTM layers further enhances the model’s ability to discern
complex dependencies between different sensor modalities, leading to improved estimation
accuracy. To maintain temporal consistency in the output, the architecture incorporates a
time-distributed dense layer. Unlike conventional fully connected layers, which process
entire input sequences at once, this layer applies dense transformations independently
to each time step. This ensures that the predicted UAV states remain aligned with the
corresponding sensor measurements, preserving the sequential integrity of the data. The
time-distributed nature of this layer allows the model to generate real-time predictions
without disrupting the temporal structure of the input.
The final output layer provides the estimated UAV state by incorporating adaptive
fusion weights derived from past sensor behavior. These weights are dynamically adjusted
based on the learned temporal dependencies, allowing the system to prioritize the most
reliable sensors under varying operational conditions. The model continuously refines its
predictions by leveraging historical patterns of sensor accuracy, leading to more robust
and adaptive state estimation. This approach is particularly beneficial in GPS-denied
environments, highly dynamic conditions, and scenarios where individual sensors expe-
rience intermittent failures. Through this structured design, the architecture effectively
integrates sequential information to enhance UAV navigation and state estimation accuracy
in challenging environments. The proposed model is optimized using the mean squared
Machines 2025, 13, 130 12 of 29

error (MSE) loss function, which is well suited for regression tasks as it minimizes the
squared differences between predicted and actual values. This approach ensures that larger
errors are penalized more heavily, leading to more precise predictions. For optimization,
the Adam optimizer is utilized due to its adaptive learning rate and ability to efficiently
handle complex datasets. Adam’s combination of momentum and adaptive gradient-based
optimization contributes to faster and more stable convergence. To evaluate the model’s
predictive performance, the mean absolute error (MAE) metric is employed, as it provides
a straightforward measure of the average prediction error magnitude. The training process
spans 1000 epochs with a batch size of 32, ensuring effective learning without overfit-
ting. Additionally, techniques such as early stopping or validation loss monitoring can be
incorporated to enhance model robustness and prevent unnecessary overtraining.

3.6. LSTM Cell Mechanism for Self-Adaptive Fusion


To achieve real time adaptation of sensor fusion weights, the LSTM cell oper-
ates at each time step to adjust the measurement noise covariance matrix dynamically.
Figure 4 illustrates the internal mechanism of the LSTM cell, detailing its role in self-
adaptive sensor fusion.

h t −1 Forget Gate
×
Previous Hidden State σ

ft Ct
+
Updated Cell State

C̃t
xt
Input Gate
Sensor Inputs ×
σ

it

Ct−1
Candidate
Previous Cell State tanh
tanh

Output Gate ht
×
σ New Hidden State

ot

Figure 4. LSTM cell architecture for adaptive multi-sensor fusion.

3.6.1. LSTM Training and Validation Loss


The training and validation loss curves, as shown in Figure 5, display a steady and
consistent decrease over the course of 1000 epochs. This behavior signifies the model’s
effective learning of temporal patterns from the multi-sensor dataset. The training loss
starts with a high initial value, reflecting the model’s early attempts to understand the
complexities of the dataset. Over successive epochs, the loss steadily declines as the LSTM
architecture refines its understanding of the data.
The minimal gap between the training and validation loss curves demonstrates effec-
tive generalization, indicating that the model avoids overfitting to the training data. This
alignment underscores the robustness of the chosen hyperparameters, including the learn-
ing rate, batch size, and architecture depth, in achieving optimal learning performance. The
observed convergence validates the model’s suitability for capturing sequential patterns in
multi-sensor data, making it highly reliable for downstream applications.
Machines 2025, 13, 130 13 of 29

Figure 5. Training and validation loss of the proposed LSTM-based self-adaptive multi-sensor fusion
(LSAF) framework over 1000 epochs.

3.6.2. Mean Absolute Error (MAE) Analysis


The MAE curves for training and validation, depicted in Figure 6, reveal a consistent
decline over 1000 epochs, highlighting the model’s ability to minimize prediction errors.
The MAE metric evaluates the absolute difference between the predicted and actual values,
making it an effective measure for assessing prediction accuracy.
The training and validation MAE curves are closely aligned, indicating that the model
generalizes well to unseen data without significant overfitting. The steady convergence
of these curves suggests that the proposed LSTM-based framework is highly effective in
learning the temporal dependencies in the multi-sensor dataset. This highlights the model’s
ability to accurately predict sequential data, even in the presence of noise and variability in
the sensor measurements.

Figure 6. Training and validation MAE of the proposed LSTM-based self-adaptive multi-sensor
fusion (LSAF) framework over 1000 epochs.
Machines 2025, 13, 130 14 of 29

3.6.3. Validation of the Proposed Framework


The results validate the efficacy of the proposed LSTM-based self-adaptive multi-
sensor fusion (LSAF) framework. The combination of temporal pattern learning via the
LSTM and its ability to minimize both loss and MAE ensures a comprehensive solution for
dynamic system modeling. The ability of the LSAF framework to generalize to unseen data
while maintaining precise predictions makes it highly reliable for complex applications
such as autonomous navigation and simultaneous localization and mapping (SLAM).
The convergence of training and validation metrics highlights the robustness and
adaptability of the system. These attributes make the proposed pipeline a reliable approach
for handling real-world multi-sensor fusion challenges in dynamic environments.

3.7. Fusion Framework Using MSCKF


The proposed LSTM-based self-adaptive multi-sensor fusion (LSAF) framework is
designed for real-time UAV state estimation by dynamically integrating data from GPS,
IMU, stereo cameras, and LiDAR. As illustrated in Figure 7, the system employs multiple
onboard sensors, including two Livox MID-360 LiDARs, a DJI front stereo camera, a DJI
IMU, and a GPS-RTK system, ensuring a comprehensive perception of the environment.
The IMU provides high-frequency motion tracking, while GPS-RTK offers precise global
positioning. The LiDARs generate dense 3D environmental maps, and the stereo camera
enhances spatial perception, particularly in visually rich environments. To efficiently pro-
cess these multimodal sensor data, an LSTM network extracts temporal dependencies and
evaluates sensor reliability. The attention-based mechanism within the LSTM model com-
putes adaptive fusion weights, dynamically adjusting the measurement noise covariance to
prioritize the most reliable sensors in real time. The weighted multi-sensor measurements
are then passed to a SLAM-based pose estimation module, which fuses all available sensors’
stereo cameras, LiDAR, IMU, and GPS-RTK, ensuring robust localization. The proposed
algorithm is based on the enhancement of VINS SLAM [4]. The LSTM-derived reliability
scores influence SLAM by assigning higher weightage to more reliable sensors, thereby
enhancing pose estimation accuracy. When all sensors are available, SLAM produces an
optimal UAV state estimate. Following SLAM-based pose estimation, the UAV state is
further refined using the multi-state constraint Kalman filter (MSCKF), which ensures
consistency in state propagation and correction. The Kalman gain is computed dynamically,
leveraging the LSTM-adapted fusion weights to optimally integrate new observations. This
adaptive approach mitigates the effects of sensor degradation, noise, and environmental
uncertainties, improving the UAV’s robustness in GPS-denied areas and high-speed motion
conditions. Following SLAM-based pose estimation, the multi-state constraint Kalman filter
(MSCKF) is employed to propagate the UAV state and refine it based on the LSTM-adapted
fusion weights. The state propagation step follows the motion model:

x k | k −1 = f ( x k −1 ), (12)
P k | k −1 = Fk−1 Pk−1 FkT−1 + Q. (13)

Once new sensor measurements are available, the Kalman gain is computed to opti-
mally integrate observations:

Kk = Pk|k−1 HkT (Hk Pk|k−1 HkT + R)−1 , (14)


xk = xk|k−1 + Kk (zk − h(xk|k−1 )), (15)
P k = ( I − K k H k ) P k | k −1 . (16)
Machines 2025, 13, 130 15 of 29

Figure 7. Proposed block diagram for LSTM-based self-adaptive multi-sensor fusion (LSAF).

By incorporating LSTM-based sensor reliability assessments into the MSCKF, the


framework dynamically adapts to changing sensor conditions, enhancing robustness in
GPS-denied environments and complex dynamic flight scenarios. The complete algorithmic
workflow is detailed in Algorithm 2, outlining sensor preprocessing, adaptive sensor fusion,
SLAM-based pose estimation, and MSCKF-based state correction.

Algorithm 2 LSTM-based self-adaptive multi-sensor fusion (LSAF) algorithm


Input:
• St = {st1 , st2 , ..., stN }: Multi-sensor inputs (GPS-RTK, IMU, Stereo Camera, LiDAR)
• ϕLSTM : Pre-trained LSTM model for adaptive fusion
• xt−1 : Previous UAV state estimate (position, velocity, orientation)
• R, Q: Measurement and process noise covariance
Output: Updated UAV state xt = (pt , vt , qt ).
Step 1: Sensor Data Preprocessing
• Synchronize, filter, and normalize multi-sensor inputs.
• Extract temporal dependencies and sensor reliability via LSTM:

ht = LSTM(St ; ϕLSTM )

Step 2: Adaptive Sensor Fusion


• Compute dynamic sensor reliability weights:

wt,i = Attention(ht , st,i )

• Adjust measurement noise covariance dynamically:

N
Rt = ∑ wt,i · Ri
i =1

• Compute fused sensor measurement:

N
zt = ∑ wt,i · st,i
i =1

Step 3: UAV State Prediction


• Predict initial UAV state using IMU measurements:

xt|t−1 = f (xt−1 , IMUt )

• Propagate state covariance:

Pt|t−1 = Ft−1 Pt−1 FtT−1 + Q


Machines 2025, 13, 130 16 of 29

Algorithm 2 Cont.
Step 4: LSTM-Guided Multi-Sensor SLAM-Based Pose Estimation
• Fuse all available sensors (Stereo Camera, LiDAR, IMU, and GPS-RTK) for SLAM-
based pose estimation.
• Assign higher weightage to more reliable sensors based on LSTM reliability scores:
wt,i
wSLAM,i = N
∀i ∈ {GPS-RTK, IMU, Stereo, LiDAR}
∑ j=1 wt,j

• Compute weighted SLAM pose estimation:

N
p̂t , q̂t = ∑ wSLAM,i · SLAM(st,i )
i =1

• Ensure global consistency using GPS-RTK when available.


• If all sensors degrade, rely on IMU-based odometry:

pt = pt−1 + vt−1 · ∆t

Step 5: State Correction using MSCKF


• Compute Kalman gain:

Kt = Pt|t−1 HtT (Ht Pt|t−1 HtT + Rt )−1

• Update UAV state:


xt = xt|t−1 + Kt (zt − h(xt|t−1 ))
• Update state covariance:

P t = ( I − K t H t ) P t | t −1

Algorithm 2 details the complete computational pipeline, outlining key steps such
as sensor preprocessing, LSTM-based adaptive fusion, SLAM-based pose estimation, and
MSCKF-based state correction. The proposed framework significantly improves UAV
navigation accuracy by enabling real-time adaptation to sensor reliability, making it well
suited for challenging flight conditions, including environments with limited GPS visibility,
rapid motion dynamics, and feature deprived landscapes.

3.8. Advantages of the Proposed Framework


The proposed framework combines the strengths of traditional filtering methods with
modern deep learning techniques, enabling robust UAV state estimation in real time. The
LSTM-based self-adaptive fusion mechanism allows the system to dynamically prioritize
sensor contributions based on their reliability, improving robustness in challenging envi-
ronments. The integration of the MSCKF ensures computational efficiency and consistency,
making the system suitable for real-time UAV operations in diverse scenarios.

3.9. Experimental Setup and Dataset


The experiments were carried out in an open-field outdoor environment, as shown
in Figure 8. The dataset was collected in a wide open lawn area with minimal features,
such as sparse distant trees and limited structural elements. The environment presented
significant challenges for single-sensor SLAM approaches due to the lack of features and
bright, sunny conditions that degraded stereo and LiDAR based odometry. The UAV
platform was handheld during data collection to simulate various motion patterns, and
the dataset included asynchronous measurements from all sensors. Sensor data were
Machines 2025, 13, 130 17 of 29

fused using event-based updates, where the state was propagated to the timestamp of
each measurement. Calibration parameters, such as camera-LiDAR-IMU extrinsics, were
estimated offline and incorporated into the extended state vector for accurate fusion.

Figure 8. The experimental environment in different scenarios during the data collection. Panel (a,b)
represent the UAV hardware along with sensor integration and panel (c,d) are the open-field dataset
environment view from stereo and LiDAR sensors, respectively, which build upon our previous
work [28].

The offline calibration of the proposed system consists of three key components:
estimation of the stereo camera’s intrinsic and extrinsic parameters, determination of
the IMU-camera extrinsic offset, and calibration of the LiDAR-IMU transformation. To
estimate both intrinsic and extrinsic parameters of the stereo camera, we employ the well-
established Kalibr calibration toolbox [29], ensuring precise alignment between the camera
and IMU. For 3D LiDAR-IMU calibration, we utilize the state-of-the-art LI-Init toolbox [30],
which provides a robust real-time initialization framework for LiDAR-inertial systems,
compensating for temporal offsets and extrinsic misalignments. To evaluate the robustness
of the proposed approach under diverse conditions, we collected multiple datasets across
three scenarios, such as handheld and UAV mounted configurations. The datasets, referred
to as UL Outdoor Car Parking Dataset, UL Outdoor Handheld Dataset, and UL Car Bridge Dataset,
were recorded at the University of Limerick Campus within the CRIS Lab research group.
Figure 8 illustrates the experimental environments, while Table 3 presents detailed a dataset
of UAV hardware sensor specifications. To address the challenge of asynchronous sensor
data, we employ first-order linear interpolation to estimate the IMU pose at each sensor’s
measurement time, mitigating time bias without significantly increasing computational
overhead. Instead of direct event-based updates, this method ensures that sensor data are
aligned with a consistent reference frame, preventing oversampling of high-frequency IMU
data or undersampling of low-frequency GPS and LiDAR measurements. Additionally,
ROS-based timestamp synchronization of DJI-OSDK and Livox LiDAR nodes further
Machines 2025, 13, 130 18 of 29

minimizes timing inconsistencies, enhancing fusion accuracy and reducing drift in state
estimation.
The proposed method was evaluated without loop closure mode to assess its con-
sistency and robustness. Performance metrics, including the absolute pose error (APE)
and the root mean square error (RMSE), were calculated to quantify the accuracy of the
estimated trajectory [31]. The comparison focused on the ability to mitigate cumulative
errors and maintain robust state estimation across large-scale environments. The details of
the hardware used during the experiments are listed in Table 3.

Table 3. Hardware configuration.

Sensor H/W Type Specifications Frequency


GPS/RTK DJI M300 GPS+GLONASS+BeiDou+Galileo 50 Hz
3D-LiDAR Livox Mid360 Laser: 905 nm, FOV: 360° (H)/−7° to 52° (V), Pointrate: 20 Hz
200k pts/s, Range: 40 m (10%), 70 m (80%)
IMU DJI M300 6-axis Mechanism 400 Hz
Stereo-Camera DJI M300 Grayscale, 640 × 480 resolution, 10 fps 15 Hz

4. Results and Comparison


The evaluation of the proposed LSTM-based self-adaptive multi-sensor fusion system
was conducted on a collected dataset using a UAV equipped with state-of-the-art sensors,
including two Livox Mid360 LiDARs (facing forward and downward), front-facing stereo
cameras, an IMU, and GPS-RTK. The hardware configuration is summarized in Table 3.
The UAV configuration and experimental setup are shown in Figure 8. These sensors
provide complementary modalities that are dynamically fused using the proposed system,
leveraging the LSTM-based approach to adaptively weigh sensor contributions based on
their reliability and environmental conditions.

4.1. UL Outdoor Car Parking Dataset


The car parking dataset provides a complex testing environment with open air spaces,
tree shadows, and dynamic illumination changes, as depicted in Figure 8. This experiment
assessed the LSAF approach in a large-scale outdoor setting without loop closure to verify
the robustness of the proposed methodology. The UAV operated in a vast, open lawn
with minimal tree coverage and bright sunlight conditions that significantly challenge
stereo-based odometry systems. These scenarios lead to frequent failures in vision-only or
LiDAR-only methods.
During the UAV’s navigation over the parking area, most of the LiDAR-detected
features were confined to the ground, resulting in degraded motion estimation. The
trajectory plots, when compared to ground truth RTK data, showed that FASTLIO2 [17]
suffered from substantial errors due to LiDAR degradation. Additionally, VINS-Fusion
(stereo-inertial) [4] performed poorly, exhibiting the highest position drifts, while VINS-
Fusion (stereo-IMU-GPS) [4] showed noticeable drifts under these conditions. Sparse
LiDAR features in the dataset further impacted LiDAR-based methods like FASTLIO2.
However, the proposed LSAF system, leveraging stereo, IMU, LiDAR, and GPS with a
pre-trained LSTM-based deep learning model, provided enhanced UAV state estimation
and consistently smoother trajectories in this challenging environment. Figure 9 displays
the trajectories obtained using different methods, while Figure 10 highlights the box plots
showing the overall APE of each strategy. Table 4 provides the RMSE values for each
method and Figures 11 and 12 represent absolute position errors (x, y, z) and orientation
errors (roll, pitch, and yaw) for the UL outdoor car parking dataset.
Machines 2025, 13, 130 19 of 29

Figure 9. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and
VINS-Fusion.

Figure 10. Box plots showing the overall APE of each strategy.
Machines 2025, 13, 130 20 of 29

Figure 11. Absolute estimated position of x, y, and z axes showing plots of various methods on the
UAV car parking dataset.

Figure 12. Absolute position error of roll, yaw, and pitch showing the plots of various methods on
the UAV car parking dataset.

Table 4. Summary of accuracy evaluation of the UL outdoor car parking dataset (in metres).

Method Max Mean Median Min RMSE


Proposed (LSAF) 0.889442 0.296818 0.280041 0.04252 0.328436
Proposed (without LSAF) 1.97031 0.345241 0.30052 0.043532 0.385019
FASTLIO2 (L+I) 3.241041 1.173322 1.097815 0.030088 1.321733
VINS-Fusion (S+I+G) 14.048131 8.664866 9.059936 1.096348 9.438278
VINS-Fusion (S+I) 14.043965 8.682303 9.079959 1.081238 9.45291

4.2. UL Outdoor Handheld Dataset


In this experiment, we employed a custom-designed UAV sensor suite to evaluate
the capabilities of our proposed framework. The RTK position was used as ground truth,
leveraging the high-quality GPS signal recorded throughout the experiment. Data collection
was performed using a handheld UAV method while navigating an outdoor environment.
This setup presented challenges such as image degradation, structureless surroundings,
dynamic targets, and unstable feature conditions that are particularly difficult for vision-
based and LiDAR-based methods.
To validate the consistency of the proposed LSAF framework, the experiment was
conducted without loop closure. The handheld mode eliminated the noise typically intro-
duced during flight missions, providing a clean dataset to assess the proposed LSAF under
challenging conditions. State estimation was performed using LSAF across various sensor
combinations and compared with state-of-the-art (SOTA) methods such as VINS-Fusion [4]
and FASTLIO2 [17]. Figure 13 displays the trajectories obtained using different methods,
Machines 2025, 13, 130 21 of 29

while Figure 14 highlights the box plots showing the overall APE of each of the strategies.
Table 5 provides the RMSE values for each method, and Figures 15 and 16 present the abso-
lute position errors (x, y, z) and orientation errors (roll, pitch, and yaw) for the handheld
UAV dataset.
The results demonstrate that significant position drifts occurred in the stereo IMU-
only scenario. However, accuracy improved considerably when LiDAR, GPS, or their
combination was integrated. VINS-Fusion exhibited growing errors due to accumulated
drift, whereas LSAF maintained a smooth trajectory consistent with the ground truth.
Unlike VINS-Fusion and FASTLIO2, which failed to align precisely with the reference
data, LSAF achieved superior performance by leveraging the LSTM-based self-adaptive
multi-sensor fusion (LSAF) framework and MSCKF fusion.

Figure 13. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and
VINS-Fusion on the UL outdoor handheld dataset.

The system was compared with state-of-the-art algorithms, including VINS-Fusion [4]
and FASTLIO2 [17]. VINS-Fusion integrates visual inertial odometry with/without GPS
data, while FASTLIO2 employs LiDAR inertial odometry combined with global optimization,
including loop closure. In comparison, the proposed method utilizes an LSTM-based adaptive
weighting mechanism to enhance robustness against sensor degradation and environmental
variability, ensuring accurate and reliable state estimation in dynamic conditions.
Machines 2025, 13, 130 22 of 29

Figure 14. Box plots showing the overall APE of each strategy.

Figure 15. Absolute estimated position of x, y, and z axes showing the plots of various methods on
the UL outdoor handheld dataset.

Figure 16. Absolute position error of roll, yaw, and pitch showing the plots of various methods on
the UL outdoor handheld dataset.
Machines 2025, 13, 130 23 of 29

Table 5. Summary of accuracy evaluation of the UL outdoor handheld dataset (in metres).

Method Max Mean Median Min RMSE


Proposed (LSAF) 2.982927 0.525667 0.450802 0.051566 0.598172
Proposed (without LSAF) 4.955625 0.232469 0.559698 0.064084 1.341954
FASTLIO2 (L+I) 11.488647 6.466611 5.564047 2.445243 6.830505
VINS-Fusion (S+I+G) 10.320782 6.479688 6.737983 1.811507 6.846302
VINS-Fusion (S+I) 14.391085 6.772864 6.961714 2.853421 7.18024

4.2.1. Quantitative Analysis


Table 4 summarizes the results of the accuracy evaluation for the proposed method
and the benchmark algorithms on the outdoor dataset. The proposed system, which incor-
porates adaptive fusion based on LSTM, outperformed both VINS-Fusion and FASTLIO2
in terms of maximum, mean, and RMSE metrics. Specifically, the proposed method
achieved the lowest RMSE of 0.328436 and 0.385019, significantly outperforming FASTLIO2
(0.385019), VINS-Fusion (S+I) 9.45291, and VINS-Fusion (S+I+G) 9.438278 on the UL out-
door car parking dataset. The maximum error for the proposed system was 0.889442, which
was notably lower than that of the other methods, indicating better robustness to outliers
and sensor degradation.
Table 5 presents the accuracy evaluation results for the proposed method and bench-
mark algorithms on the UL outdoor handheld dataset, highlighting the superior perfor-
mance of the proposed system incorporating LSTM-based self-adaptive fusion (LSAF).
The proposed method achieved the lowest RMSE of 0.598172, significantly outperforming
benchmark methods such as FASTLIO2 (6.830505), VINS-Fusion (S+I+G) (6.846302), and
VINS-Fusion (S+I) (7.18024). Additionally, the proposed system demonstrated a maximum
error of 2.982927, which is substantially lower compared to the other methods, reflecting its
robustness to outliers and sensor degradation.
The mean and median errors for the proposed method were also the lowest, at
0.525667 and 0.450802, respectively, showcasing its consistent accuracy. In contrast,
methods like VINS-Fusion and FASTLIO2 exhibited significantly higher errors due to
their limitations in handling dynamic environments and sensor noise. The results further
emphasize the advantages of incorporating LSTM-based adaptive fusion for enhanced
performance in challenging real-world scenarios. The performance gap between the pro-
posed system with and without LSAF also highlights the critical role of adaptive fusion in
reducing positional errors and ensuring reliable state estimation.

4.3. UL Car Bridge Dataset


The experimental evaluation was conducted at the University of Limerick’s Car Bridge,
where the UAV was flown both above and beneath the bridge to assess the performance
of the proposed LSAF framework under varying environmental conditions, as shown in
Figure 17. Ground truth reference data for LSAF was obtained using an RTK system, while
the UAV was manually operated by a trained pilot. To validate the global consistency of
LSAF, the experiment was performed without loop closure. The test environment presented
significant challenges, including rapid illumination changes, agile UAV maneuvers, and
GPS-degraded conditions, as illustrated in the trajectory (Figure 17). These conditions
are particularly demanding for visual-inertial odometry (VIO) and LiDAR odometry (LO)
methods, where the proposed LSAF fusion approach demonstrated superior performance
compared to state-of-the-art techniques. Throughout the experiment, the UAV maintained
a strong RTK signal lock, ensuring fixed position accuracy for most of the flight. However,
while navigating under the bridge, the number of visible GPS satellites temporarily dropped
to 11, causing intermittent signal degradation in this challenging environment.
Machines 2025, 13, 130 24 of 29

This experiment evaluates the global consistency of the proposed LSAF framework
in a challenging environment with unstable and noisy GPS signals, particularly under a
bridge, where localization accuracy and trajectory smoothness are significantly affected.
The results demonstrate that LSAF effectively mitigates single sensor drift, maintaining
global consistency and ensuring smooth local trajectory estimation despite degraded GPS
conditions. Figures 18 and 19 illustrate the absolute position errors in the x, y, and z
coordinates and the roll, pitch, and yaw angles, comparing multiple methods on the UAV
car bridge dataset, while Table 6 presents the corresponding RMSE values for each approach.
Figure 20 shows a box plot of the overall relative pose error (RPE) for five different strategies,
demonstrating that LSAF outperforms other state-of-the-art (SOTA) methods.

Figure 17. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and
VINS-Fusion.

Table 6. Summary of accuracy evaluation of the UL car bridge dataset (in metres).

Method Max Mean Median Min RMSE


Proposed (LSAF) 1.103132 0.283116 0.193321 0.002481 0.318363
Proposed (without LSAF) 1.296882 0.384646 0.291391 0.012473 0.473501
FASTLIO2 (L+I) 2.485825 0.630901 0.438535 0.032907 0.805311
VINS-Fusion (S+I+G) 2.284776 0.421503 0.253154 0.006963 0.683928
VINS-Fusion (S+I) 22.006355 7.392873 5.851291 1.614134 8.892644
Machines 2025, 13, 130 25 of 29

Figure 18. Absolute estimated position of the x, y, and z axes showing the plots of various methods
on the UAV car bridge dataset.

Figure 19. Absolute position error of roll, yaw, and pitch showing plots of various methods on the
UAV car bridge dataset.

Figure 20. Box plots showing the overall APE of each strategy.
Machines 2025, 13, 130 26 of 29

4.4. Qualitative Analysis and Trajectory Comparison


The trajectory plots illustrated in Figures 9, 13, and 17 compare the estimated tra-
jectories of the proposed method, FASTLIO2, and VINS-Fusion. The proposed method
demonstrates superior consistency and alignment with the ground truth provided by RTK,
particularly in regions with sparse LiDAR and stereo features. In contrast, FASTLIO2 ex-
hibits significant drift in regions with degraded LiDAR feature density, while VINS-Fusion
suffers from cumulative errors due to visual degradation under high illumination.
The absolute pose error (APE) plots for each axis in Figures 11, 15 and 18 further
highlight the advantages of the proposed system. The box plots in Figures 10, 14, and 20
compare the overall APE distributions for all methods, showing that the proposed method
achieves the smallest error spread and highest accuracy across the dataset. The LSTM-based
adaptive fusion mechanism effectively mitigates sensor-specific errors by dynamically
adjusting sensor contributions in real time. For instance, in regions where LiDAR features
are sparse, the LSTM assigns higher weights to IMU or GPS or stereo camera data, thereby
maintaining accurate state estimation.

4.5. Analysis of LSTM-Based Adaptive Fusion


The inclusion of the LSTM-based adaptive fusion mechanism introduces several
advantages over traditional fixed-weight fusion approaches. The dynamic weighting
process enables the system to adapt to environmental changes and sensor degradation. For
example, in bright outdoor conditions, the LSTM down-weights stereo camera data when
visual feature density is low, prioritizing LiDAR and IMU measurements instead. Similarly,
in areas with sparse LiDAR features, GPS data are weighted more heavily to mitigate drift.
Figures 9, 13, and 17 demonstrate the proposed method’s ability to maintain trajectory
accuracy despite varying sensor reliability. This is further supported by the quantitative
metrics in Tables 4–6, which show that the proposed system consistently outperforms the
benchmark methods in all scenarios. The adaptive nature of the LSTM allows the system
to handle asynchronous and noisy sensor measurements more effectively than traditional
Kalman filter-based fusion approaches.

4.6. System Robustness and Computational Efficiency


The proposed system is designed to be robust to individual sensor failures, ensuring
continuous operation in challenging scenarios. For instance, temporary GPS signal loss or
degraded LiDAR performance does not significantly impact the overall state estimation
due to the self-adaptive fusion mechanism. This resilience is critical for real-world UAV
applications, where sensor reliability can vary due to environmental factors.
The computational efficiency of the proposed system was validated on an Ubuntu
Linux laptop equipped with an Intel Core(TM) i7-10750H CPU (3.70 GHz) and 32GB of
memory. The implementation utilized C++ and ROS, ensuring real-time performance
with minimal latency. The inclusion of the LSTM, while adding computational complexity,
was optimized using hardware acceleration, ensuring that the system operates within
real-time constraints.

5. Discussion
This paper builds upon the work presented at the 12th International Conference on
Control, Mechatronics, and Automation (ICCMA 2024) [28] by introducing a significant
enhancement that includes an LSTM-based self-adaptive fusion technique. This addition
allows the system to dynamically adjust sensor contributions in real time, making it more
robust to challenging environmental conditions and improving sensor reliability compared
to the earlier fixed-weight fusion approach. The LSTM mechanism ensures that the most
Machines 2025, 13, 130 27 of 29

reliable sensors are prioritized during operation. For example, in GPS-degraded areas,
the system gives more weight to LiDAR, IMU, GPS and stereo cameras. In contrast, in
environments with sparse LiDAR features, GPS data becomes more influential. This
flexibility ensures accurate UAV state estimation, even in challenging scenarios like bright
outdoor conditions or feature-poor environments. Extensive testing on real-world datasets
confirmed the effectiveness of the proposed system. The results showed that the LSTM-
based fusion method outperformed state-of-the-art algorithms such as VINS-Fusion and
FASTLIO2, as well as fusion approaches without LSAF, in terms of accuracy and resilience
to sensor degradation. The system achieved lower trajectory errors and demonstrated its
ability to handle complex environments with minimal cumulative errors. Overall, this work
successfully combines traditional Kalman filtering with modern deep learning to improve
UAV state estimation. The LSTM-based adaptive fusion framework sets a strong foundation
for future research and practical UAV applications in complex and diverse environments.

6. Conclusions and Future Work


This study introduces a novel LSTM-based self-adaptive multi-sensor fusion frame-
work aimed at improving UAV state estimation accuracy and robustness. The proposed
approach dynamically adjusts sensor fusion weights in real time, leveraging an LSTM
network to account for varying environmental conditions and sensor reliability. By inte-
grating measurements from GPS, LiDAR, stereo cameras, and IMU, the system effectively
addresses challenges posed by GPS-degraded environments, sparse feature areas, and high
motion dynamics. The framework was validated on real-world datasets collected using a
UAV platform in challenging outdoor environments. Experimental results demonstrate
that the proposed fusion framework outperforms state-of-the-art methods such as VINS-
Fusion (S+I), VINS-Fusion (S+I+G) and FASTLIO2 (L+I), as well as approaches without
LSAF fusion, achieving superior trajectory accuracy and consistency. The incorporation
of the LSTM-based adaptive weighting mechanism significantly enhances the system’s
ability to handle sensor degradation and environmental variability. In scenarios where
traditional fixed-weight fusion methods struggle, such as in bright, sunny conditions with
degraded stereo or sparse LiDAR features, the LSTM dynamically prioritizes the most
reliable sensors, ensuring robust and accurate state estimation. This adaptability is a key
innovation that bridges the gap between traditional filtering techniques and modern deep
learning approaches for UAV navigation.
Despite the system’s demonstrated success, there remain opportunities for further
enhancement. Future work could focus on extending the proposed framework to broader
and more diverse datasets, particularly in GPS-denied environments such as dense urban
canyons, forested areas, or indoor spaces. Incorporating additional sensor modalities,
such as radar or thermal cameras, could further enhance robustness in low-visibility
conditions or adverse weather. Moreover, improving the LSTM model by incorporating
uncertainty estimation techniques, such as Bayesian neural networks, could provide better
confidence measures for the adaptive weighting process. Another promising direction lies
in the optimization of computational efficiency to enable deployment on smaller, resource-
constrained UAV platforms. Techniques such as model pruning, quantization, or the use of
edge AI hardware could be explored to reduce the computational overhead of the LSTM
while maintaining real-time performance. Additionally, investigating the integration of
reinforcement learning into the fusion framework could enable the system to autonomously
adapt to new environments during operation without the need for extensive retraining.
In conclusion, the proposed LSTM-based self-adaptive fusion framework represents a
significant advancement in UAV state estimation, combining the strengths of traditional
Kalman filtering with the flexibility of modern deep learning. The demonstrated robust-
Machines 2025, 13, 130 28 of 29

ness and adaptability of the system position it as a valuable contribution to the field of
autonomous UAV navigation, with the potential for further enhancements and applications
in diverse operational scenarios.

Author Contributions: Conceptualization, methodology, validation, format analysis, writing—original


draft preparation, visualization: M.I. and S.D.; writing—review and editing, supervision: P.T., J.R.
and G.D. All authors have read and agreed to the published version of the manuscript.

Funding: This work was funded by the European Commission’s Horizon 2020 Project RAPID under
Grant 861211 (EU RAPD N°861211), and in part by the Enterprise Ireland’s Disruptive Technologies
Innovation Fund (DTIF) Project GUARD under Grant DT2020 0286B.

Data Availability Statement: Data are contained within the article.

Conflicts of Interest: The authors declare no conflicts of interest. The funders had no role in the design
of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or
in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

UAV Unmanned aerial vehicles


LiDAR Light detection and ranging
CNN Convolutional neural network
LSTM Long short-term memory
GNSS Global navigation satellite system
AVs Autonomous vehicles
IMU Inertial measurement unit
RTK Real-time kinematic
GPS Global positioning system

References
1. Ye, X.; Song, F.; Zhang, Z.; Zeng, Q. A Review of Small UAV Navigation System Based on Multisource Sensor Fusion. IEEE Sens. J.
2023, 23, 18926–18948. [CrossRef]
2. Irfan, M.; Kishore, K.; Chhabra, V.A. Smart Vehicle Management System Using Internet of Vehicles (IoV). In Proceedings of the
International Conference on Advanced Computing Applications, Advances in Intelligent Systems and Computing, Virtually, 27–28 March
2021; Mandal, J.K., Buyya, R., De, D., Eds.; Springer: Singapore, 2022; Volume 1406.
3. Wang, Z.; Wu, Y.; Niu, Q. Multi-Sensor Fusion in Automated Driving: A Survey. IEEE Access 2020, 8, 2847–2868. [CrossRef]
4. Qin, T.; Cao, S.; Pan, J.; Shen, S. A general optimization-based framework for global pose estimation with multiple sensors. arXiv
2019, arXiv:1901.03642.
5. Lee, W.; Geneva, P.; Chen, C.; Huang, G. Mins: Efficient and robust multisensor-aided inertial navigation system.arXiv 2023,
arXiv:2309.15390.
6. Irfan, M.; Dalai, S.; Trslic, P.; Santos, M.C.; Riordan, J.; Dooly, G. LGVINS: LiDAR-GPS-Visual and Inertial System Based
Multi-Sensor Fusion for Smooth and Reliable UAV State Estimation. IEEE Trans. Intell. Veh. 2024. [CrossRef]
7. Zhu, J.; Li, H.; Zhang, T. Camera, LiDAR, and IMU Based Multi-Sensor Fusion SLAM: A Survey. Tsinghua Sci. Technol. 2024, 29,
415–429. [CrossRef]
8. Irfan, M.; Dalai, S.; Kishore, K.; Singh, S.; Akbar, S.A. Vision-based Guidance and Navigation for Autonomous MAV in Indoor
Environment. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking
Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–5. [CrossRef]
9. Harun, M.H.; Abdullah, S.S.; Aras, M.S.M.; Bahar, M.B. Sensor Fusion Technology for Unmanned Autonomous Vehicles (UAV):
A Review of Methods and Applications. In Proceedings of the 2022 IEEE 9th International Conference on Underwater System
Technology: Theory and Applications (USYS), Kuala Lumpur, Malaysia, 5–6 December 2022; pp. 1–8. [CrossRef]
10. Geneva, P.; Eckenhoff, K.; Lee, W.; Yang, Y.; Huang, G. Openvins: A research platform for visual-inertial estimation. In Proceedings
of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020.
11. Fei, S.; Hassan, M.A.; Xiao, Y.; Su, X.; Chen, Z.; Cheng, Q.; Duan, F.; Chen, R.; Ma, Y. UAV-based multi-sensor data fusion and
machine learning algorithm for yield prediction in wheat. Precis. Agric. 2023, 24, 187–212. [CrossRef] [PubMed]
Machines 2025, 13, 130 29 of 29

12. Wu, Y.; Li, Y.; Li, W.; Li, H.; Lu, R. Robust LiDAR-based localization scheme for unmanned ground vehicle via multisensor fusion.
IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 5633–5643. [CrossRef]
13. Singh, S.; Kishore, K.; Dalai, S.; Irfan, M.; Singh, S.; Akbar, S.A.; Sachdeva, G.; Yechangunja, R. CACLA-Based Local Path Planner
for Drones Navigating Unknown Indoor Corridors. IEEE Intell. Syst. 2022, 37, 32–41. [CrossRef]
14. Dalai, S.; O’Connell, E.; Newe, T.; Trslic, P.; Manduhu, M.; Irfan, M.; Riordan, J.; Dooly, G. CDDQN based efficient path planning
for Aerial surveillance in high wind scenarios. In Proceedings of the OCEANS 2023—Limerick, Limerick, Ireland, 5–8 June 2023;
pp. 1–7. [CrossRef]
15. Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734.
[CrossRef]
16. O’Riordan, A.; Newe, T.; Dooly, G.; Toal, D. Stereo vision sensing: Review of existing systems. In Proceedings of the 2018 12th
International Conference on Sensing Technology (ICST), Limerick, Ireland, 4–6 December 2018.
17. Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Trans. Robot. 2022, 38, 2053–2073.
[CrossRef]
18. Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual,
visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [CrossRef]
19. Sun, K.; Mohta, K.; Pfrommer, B.; Watterson, M.; Liu, S.; Mulgaonkar, Y.; Taylor, C.J.; Kumar, V. Robust stereo visual inertial
odometry for fast autonomous flight. IEEE Robot. Autom. Lett. 2018, 3, 965–972. [CrossRef]
20. Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual–inertial odometry using nonlinear optimiza-
tion. Int. J. Robot. Res. 2015, 34, 314–334. [CrossRef]
21. Zhang, J.; Singh, S. LOAM: Lidar odometry and mapping in real-time. In Proceedings of the Robotics: Science and Systems,Berkeley,
CA, USA, 12–16 July 2014; Volume 2.
22. Devarajan, H.; Zheng, H.; Kougkas, A.; Sun, X.H.; Vishwanath, V. Dlio: A data-centric benchmark for scientific deep learning
applications. In Proceedings of the 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing
(CCGrid), Melbourne, Australia, 10–13 May 2021.
23. Huai, Z.; Huang, G. Robocentric visual–inertial odometry. Int. J. Robot. Res. 2022, 41, 667–689. [CrossRef]
24. Shan, T.; Englot, B.; Ratti, C.; Rus, D. Lvi-sam: Tightly-coupled lidar-visual-inertial odometry via smoothing and mapping. In
Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021.
25. Han, L.; Lin, Y.; Du, G.; Lian, S. DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D
Geometric Constraints. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Macau, China, 3–8 November 2019; pp. 6906–6913. [CrossRef]
26. Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and
mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV,
USA, 24 October–24 January 2020.
27. Mourikis, A.I.; Roumeliotis, S.I. A multi-state constraint Kalman filter for vision-aided inertial navigation. In Proceedings of the
2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007.
28. Irfan, M.; Dalai, S.; Vishwakarma, K.; Trslic, P.; Riordan, J.; Dooly, G. Multi-Sensor Fusion for Efficient and Robust UAV State
Estimation. In Proceedings of the 2024 12th International Conference on Control, Mechatronics and Automation (ICCMA), London,
UK, 11–13 November 2024.
29. Rehder, J.; Nikolic, J.; Schneider, T.; Hinzmann, T.; Siegwart, R. Extending kalibr: Calibrating the extrinsics of multiple IMUs and
of individual axes. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm,
Sweden, 16–21 May 2016.
30. Zhu, F.; Ren, Y.; Zhang, F. Robust real-time lidar-inertial initialization. In Proceedings of the 2022 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022.
31. Grupp, M. evo: Python Package for the Evaluation of Odometry and Slam. 2017. Available online: https://github.com/
MichaelGrupp/evo (accessed on 6 February 2025).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like