06_machines-13-00130
06_machines-13-00130
Article
1 CRIS Research Group, Department of Electronic and Computer Engineering, University of Limerick,
V94 T9PX Limerick, Ireland; [email protected]
2 School of Engineering, University of Limerick, The Lonsdale Building, V94 T9PX Limerick, Ireland;
[email protected] (P.T.); [email protected] (G.D.)
3 Drone Systems Lab, School of Computing Engineering and Physical Science, University of the West of
Scotland, Glasgow G72 0LH, UK; [email protected]
* Correspondence: [email protected]; Tel.: +353-899812440
† This paper is an extended version of our paper published in Irfan, M.; Dalai, S.; Vishwakarma, K.; Trslic, P.;
Riordan, J.; Dooly, G. Multi-Sensor Fusion for Efficient and Robust UAV State Estimation. In Proceedings of
the 2024 12th International Conference on Control, Mechatronics and Automation (ICCMA), London, UK,
11–13 November 2024.
Abstract: Unmanned aerial vehicle (UAV) state estimation is fundamental across applica-
tions like robot navigation, autonomous driving, virtual reality (VR), and augmented reality
(AR). This research highlights the critical role of robust state estimation in ensuring safe
and efficient autonomous UAV navigation, particularly in challenging environments. We
propose a deep learning-based adaptive sensor fusion framework for UAV state estimation,
integrating multi-sensor data from stereo cameras, an IMU, two 3D LiDAR’s, and GPS.
The framework dynamically adjusts fusion weights in real time using a long short-term
memory (LSTM) model, enhancing robustness under diverse conditions such as illumi-
nation changes, structureless environments, degraded GPS signals, or complete signal
loss where traditional single-sensor SLAM methods often fail. Validated on an in-house
integrated UAV platform and evaluated against high-precision RTK ground truth, the
algorithm incorporates deep learning-predicted fusion weights into an optimization-based
odometry pipeline. The system delivers robust, consistent, and accurate state estimation,
Academic Editor: Xinli Du outperforming state-of-the-art techniques. Experimental results demonstrate its adaptabil-
Received: 15 January 2025 ity and effectiveness across challenging scenarios, showcasing significant advancements
Revised: 31 January 2025 in UAV autonomy and reliability through the synergistic integration of deep learning and
Accepted: 7 February 2025 sensor fusion.
Published: 9 February 2025
Citation: Irfan, M.; Dalai, S.; Trslic, P.; Keywords: ROS; LSTM; adaptive fusion; multi-sensor fusion; state estimation; UAV;
Riordan, J.; Dooly, G. LiDAR-visual-inertial odometry; MSCKF
LSAF-LSTM-Based Self-Adaptive
Multi-Sensor Fusion for Robust UAV
State Estimation in Challenging
Environments. Machines 2025, 13, 130.
https://doi.org/10.3390/
1. Introduction
machines13020130 Recent advances in computing power, sensor technologies, and machine learning have
Copyright: © 2025 by the authors.
significantly fueled interest in autonomous unmanned aerial vehicles (UAVs), also known
Licensee MDPI, Basel, Switzerland. as drones. These systems have become indispensable across a wide range of applications,
This article is an open access article including robot navigation, autonomous driving, virtual reality (VR), augmented reality
distributed under the terms and (AR), environmental monitoring, delivery services, and disaster response. In such contexts,
conditions of the Creative Commons
navigation and positioning are essential to ensuring the UAV’s operational accuracy, safety,
Attribution (CC BY) license
and efficiency. Modern UAVs heavily rely on sensor fusion techniques to provide robust
(https://creativecommons.org/
licenses/by/4.0/).
state estimation that enables them to operate autonomously, even in complex or dynamic
environments. Beyond UAVs, sensor fusion plays a vital role in the Internet of Vehicles
(IoV), autonomous robots, and other emerging technologies [1,2].
The field of state estimation in navigation and control systems for autonomous robots
has evolved significantly over the years, driven by technological advancements in sensor
hardware and computational algorithms. State estimation involves deriving accurate
information about a system’s position, velocity, and orientation based on sensor data. While
single-sensor solutions have been extensively studied, their limitations have increasingly
motivated research into multi-sensor fusion approaches. These approaches leverage the
complementary characteristics of diverse sensors to overcome the constraints of individual
sensors and enhance the accuracy, robustness, and resilience of state estimation systems [3].
Despite the progress made, achieving robust, accurate, and seamless navigation and
positioning solutions remains a major challenge when relying solely on single-sensor
systems. For example, the inertial navigation system (INS), which relies on accelerometers
and gyroscopes to compute relative positions, is highly accurate only for short durations.
Over time, the accumulation of sensor noise and integration errors causes significant drift.
Similarly, GPS, while offering absolute positioning data, is effective primarily in open
sky environments but is prone to signal blockage, multipath interference, and degraded
performance in urban canyons, dense forests, or indoor environments. These limitations
demand the integration of additional sensor types, such as cameras, LiDAR, and IMU, to
ensure robust state estimation with enhanced spatial and temporal coverage.
Visual inertial navigation systems (VINS) [4] have emerged as a cost effective and
practical solution for state estimation in UAVs, combining visual and inertial data to
achieve higher accuracy. However, VINS performance in complex environments is often
hindered by its susceptibility to changing illumination, low texture regions, and dynamic
obstacles. LiDAR, on the other hand, provides accurate distance measurements and
operates independently of lighting conditions. Its growing affordability and precision
have made it a popular choice for UAVs. Nonetheless, LiDAR systems face challenges
related to sparse data and difficulty in extracting semantic information. Similarly, vision-
based approaches using monocular or stereo cameras struggle with initialization, sensitivity
to illumination changes, and distance variability. These challenges highlight the need for
multi-sensor fusion, where the strengths of different sensors are combined to overcome
individual shortcomings.
In recent years, multi sensor fusion approaches have advanced significantly, enabling
UAVs to achieve real-time, high-precision positioning and mapping. For example, integrat-
ing GPS with IMU data mitigates inertial navigation drift and improves noise filtering in
complex environments. Incorporating LiDAR and visual data further enhances accuracy
by providing rich spatial and semantic information. However, traditional sensor fusion
methods often rely on static weighting of sensor inputs, which can lead to suboptimal
performance in dynamic or degraded scenarios. These limitations have driven research
toward adaptive sensor fusion techniques that dynamically adjust sensor contributions
based on real-time environmental conditions and sensor reliability [5,6].
Recent advancements in deep learning have introduced a powerful paradigm for
adaptive sensor fusion. Deep learning models, such as long short-term memory (LSTM)
networks, can effectively learn temporal dependencies in sensor data and adaptively com-
pute fusion weights based on real-time input. This capability allows UAVs to dynamically
prioritize reliable sensors and minimize the impact of degraded or faulty sensor data. Such
adaptability is particularly valuable in scenarios involving sudden illumination changes,
feature deprived environments, degraded GPS signals, or complete signal loss where
traditional single-sensor systems and static-weight fusion approaches often fail.
Machines 2025, 13, 130 3 of 29
This paper presents a novel, deep learning-based adaptive multi-sensor fusion frame-
work for UAV state estimation. The proposed framework integrates stereo cameras, IMU,
LiDAR sensors, and GPS-RTK data into a unified system, which is depicted in Figure 1.
A long short-term memory (LSTM) model is used to dynamically compute sensor fusion
weights in real time, ensuring robust, accurate, and consistent state estimation under diverse
conditions. Unlike conventional methods that rely on fixed sensor weights, our approach
leverages the real-time adaptability of deep learning to optimize sensor contributions based
on environmental and operational factors.
Our approach is validated on an in-house UAV platform equipped with an internally
integrated and calibrated sensor suite. The system is evaluated against high-precision
RTK ground truth, demonstrating its ability to maintain robust state estimation in both
GPS-enabled and GPS-denied scenarios. The algorithms autonomously determine relevant
sensor data, leveraging stereo inertial or LiDAR inertial odometry outputs to ensure global
positioning in the absence of GPS.
2. Related Work
In recent decades, many innovative approaches for UAV state estimation have been
proposed, leveraging different types of sensors. Among these, vision-based and LiDAR-
based methods have gained substantial attention due to their ability to provide rich en-
vironmental data for accurate localization and mapping. Researchers have extensively
Machines 2025, 13, 130 4 of 29
explored the fusion of visual and inertial sensors, given their complementary properties in
addressing UAV navigation challenges [7].
For state estimation, sensors such as IMUs are frequently used in fusion designs
that can be broadly categorized into loosely coupled and tightly coupled approaches. In
loosely coupled systems, sensor outputs are independently processed and subsequently
fused, offering simplicity and flexibility when integrating diverse sensors. However, tightly
coupled systems have gained increasing preference due to their ability to process raw
sensor data directly, such as utilizing raw IMU measurements in pose estimation. This
allows for more accurate state estimation, especially in scenarios with high dynamic motion
or challenging environmental conditions. Papers [8,9] propose tightly coupled methods
that integrate visual and inertial data for efficient and robust state estimation. By exploiting
the raw data from IMU and cameras, these methods address issues like drift and improve
system robustness compared to loosely coupled alternatives.
Table 1. Summary of the comparison between proposed LSAF vs. state-of-the-art methods.
Dynamic
Method Sensors Fusion Strategy Adaptability Key Strengths Key Limitations
Weighting
LSTM-based Robust to sensor Requires model
LSAF Stereo, IMU,
Adaptive Yes High degradation and training and
(Proposed) GPS, LiDAR
Fusion + MSKF dynamic targets tuning
High drift in
VINS- Efficient
Stereo, IMU, Predefined GPS-denied or
Fusion No Moderate visual-inertial
GPS Weighting dynamic
(2018) [4] fusion
environments
Degrades in
FASTLIO2 Predefined High accuracy in dynamic or
LiDAR, IMU No Moderate
(2022) [17] Weighting LiDAR-rich areas sparse
environments
Fails in
ORB- Visual SLAM Loop-closure-
Low to low-texture or
SLAM3 Stereo, IMU with Fixed No enabled global
Moderate poor-light
(2020) [18] Weights optimization
conditions
Multi-State Fast and
MSCKF Low to Limited accuracy;
Stereo, IMU Constraint No computationally
(2017 [19]) Moderate lacks adaptability
Kalman Filter efficient
Robust Poor performance
OKVIS Probabilistic Low to
Stereo, IMU No visual-inertial in dynamic
(2015) [20] Fusion Moderate
tracking environments
LiDAR Accurate
LOAM Computationally
LiDAR, IMU Odometry + No Moderate LiDAR-based
(2014) [21] intensive
Mapping mapping
Deep Learning Requires
DLIO (2021) Partial (Learned Data-driven
LiDAR, IMU for LiDAR Moderate extensive training
[22] Weights) LiDAR odometry
Odometry data
Accurate in High drift in
R-VIO
Stereo, IMU Fixed Weights No Moderate small-scale large-scale
(2021) [23]
scenarios environments
Fails in
Accurate global
GPS-denied or
GPS Only GPS Single-Sensor No Low positioning in
multipath
open areas
environments
Combines LiDAR Requires high
LVI-SAM LiDAR, IMU, Joint Moderate to
No and visual computational
(2021) [24] Stereo Optimization High
optimization power
Needs large
Deep Learns fusion
DeepVIO Yes (Learned datasets;
Stereo, IMU Learning-based Moderate weights; robust in
(2020) [25] Weights) computationally
Visual-Inertial some cases
intensive
Combines
SC-LIO- semantic
Joint LiDAR High dependency
SAM (2022) LiDAR, IMU No High segmentation
SLAM on LiDAR quality
[26] with LiDAR
SLAM
This paper builds on this body of work by proposing a novel framework that combines
the strengths of optimization-based and deep learning-based approaches. Using long-short-
term memory (LSTM) networks, our method dynamically computes sensor fusion weights
in real time, adapting to environmental conditions and sensor reliability. This framework
Machines 2025, 13, 130 7 of 29
integrates stereo cameras, LiDAR, IMU, and GPS-RTK data into a unified system, achieving
superior performance in both GPS-enabled and GPS-denied scenarios.
3. Methodology
This research aims to achieve robust and accurate UAV state estimation by integrating
measurements from multiple sensors, including GPS, stereo cameras, LiDARs, and IMUs,
into a unified framework. The proposed system combines a multi-state constraint Kalman
filter (MSCKF) [27] with a long short-term memory (LSTM)-based self-adaptive sensor
fusion mechanism. This hybrid framework dynamically adjusts sensor fusion weights
based on real-time environmental conditions and sensor reliability, ensuring consistent
performance in challenging scenarios, such as GPS-degraded environments, rapid motion,
and feature-deprived areas.
Figure 2. An illustration of the proposed LSAF framework. The global estimator combines local
estimations from various global sensors to achieve precise local accuracy and globally drift free pose
estimation, which builds upon our previous work [28].
ṗ = v, (2)
v̇ = R(q)(am − b a − n a ) + g, (3)
1
q̇ = q ⊗ ωm − b g − n g , (4)
2
ḃ a = nba , (5)
ḃ g = nbg . (6)
Here, R(q) represents the rotation matrix derived from the quaternion q, and g is the
gravity vector. The terms n a , n g , nba , and nbg denote process noise, modeled as zero-mean
Gaussian distributions.
p L = R IL (p − p I ) + nLiDAR , (10)
where R IL is the rotation matrix from the IMU to LiDAR frame, and p I is the IMU’s position.
4. Stereo Camera provide 2D projections of 3D feature points:
x y
u = fx + cx , v = fy + cy , (11)
z z
Machines 2025, 13, 130 9 of 29
where ( x, y, z) are the 3D coordinates of a feature in the camera frame, and (u, v) are the
corresponding pixel coordinates.
Component Description
GPS-RTK (latitude, longitude, altitude), IMU (linear acceleration, angular
Input Sensors
velocity),
Stereo Cameras (left and right image illumination), LiDAR (point cloud density)
Input Shape ( N, T, 12) (Number of samples, time steps, sensor features)
Output Shape ( N, T, 4) (Sensor reliability weights for fusion)
LSTM Architecture Two LSTM layers (128, 64 units) followed by a Time-Distributed Dense Layer
Loss Function Mean Squared Error (MSE)
Optimizer Adam
Training Epochs 1000
Batch Size 32
Weight Updates At each time step (Real-time adjustment)
Machines 2025, 13, 130 10 of 29
ht = LSTM(St ; ϕLSTM )
N
Rt = ∑ wt,i · Ri
i =1
P t | t = ( I − K t H t ) P t | t −1
• Compute Loss: Evaluate using Mean Squared Error (MSE):
N
1
L( x̂t , gt ) =
N ∑ (x̂t − gt )2
t =1
ϕLSTM ← ϕLSTM − α · ∇ L
6: end for
7: Step 3: Output the Final State Estimation
• Return final estimated state x̂t|t and covariance Pt|t
Machines 2025, 13, 130 11 of 29
Multi-Sensor Input LSTM Layer (128 Units) LSTM Layer (64 Units) Time-Distributed Dense Predicted UAV State
( Xt ) Captures Long-Term Dependencies Refines Temporal Patterns Fully Connected Feature Mapping (Yt )
(Batch, Time Steps, Features)
Figure 3. Proposed LSTM-based multi-sensor fusion architecture for UAV state estimation.
error (MSE) loss function, which is well suited for regression tasks as it minimizes the
squared differences between predicted and actual values. This approach ensures that larger
errors are penalized more heavily, leading to more precise predictions. For optimization,
the Adam optimizer is utilized due to its adaptive learning rate and ability to efficiently
handle complex datasets. Adam’s combination of momentum and adaptive gradient-based
optimization contributes to faster and more stable convergence. To evaluate the model’s
predictive performance, the mean absolute error (MAE) metric is employed, as it provides
a straightforward measure of the average prediction error magnitude. The training process
spans 1000 epochs with a batch size of 32, ensuring effective learning without overfit-
ting. Additionally, techniques such as early stopping or validation loss monitoring can be
incorporated to enhance model robustness and prevent unnecessary overtraining.
h t −1 Forget Gate
×
Previous Hidden State σ
ft Ct
+
Updated Cell State
C̃t
xt
Input Gate
Sensor Inputs ×
σ
it
Ct−1
Candidate
Previous Cell State tanh
tanh
Output Gate ht
×
σ New Hidden State
ot
Figure 5. Training and validation loss of the proposed LSTM-based self-adaptive multi-sensor fusion
(LSAF) framework over 1000 epochs.
Figure 6. Training and validation MAE of the proposed LSTM-based self-adaptive multi-sensor
fusion (LSAF) framework over 1000 epochs.
Machines 2025, 13, 130 14 of 29
x k | k −1 = f ( x k −1 ), (12)
P k | k −1 = Fk−1 Pk−1 FkT−1 + Q. (13)
Once new sensor measurements are available, the Kalman gain is computed to opti-
mally integrate observations:
Figure 7. Proposed block diagram for LSTM-based self-adaptive multi-sensor fusion (LSAF).
ht = LSTM(St ; ϕLSTM )
N
Rt = ∑ wt,i · Ri
i =1
N
zt = ∑ wt,i · st,i
i =1
Algorithm 2 Cont.
Step 4: LSTM-Guided Multi-Sensor SLAM-Based Pose Estimation
• Fuse all available sensors (Stereo Camera, LiDAR, IMU, and GPS-RTK) for SLAM-
based pose estimation.
• Assign higher weightage to more reliable sensors based on LSTM reliability scores:
wt,i
wSLAM,i = N
∀i ∈ {GPS-RTK, IMU, Stereo, LiDAR}
∑ j=1 wt,j
N
p̂t , q̂t = ∑ wSLAM,i · SLAM(st,i )
i =1
pt = pt−1 + vt−1 · ∆t
P t = ( I − K t H t ) P t | t −1
Algorithm 2 details the complete computational pipeline, outlining key steps such
as sensor preprocessing, LSTM-based adaptive fusion, SLAM-based pose estimation, and
MSCKF-based state correction. The proposed framework significantly improves UAV
navigation accuracy by enabling real-time adaptation to sensor reliability, making it well
suited for challenging flight conditions, including environments with limited GPS visibility,
rapid motion dynamics, and feature deprived landscapes.
fused using event-based updates, where the state was propagated to the timestamp of
each measurement. Calibration parameters, such as camera-LiDAR-IMU extrinsics, were
estimated offline and incorporated into the extended state vector for accurate fusion.
Figure 8. The experimental environment in different scenarios during the data collection. Panel (a,b)
represent the UAV hardware along with sensor integration and panel (c,d) are the open-field dataset
environment view from stereo and LiDAR sensors, respectively, which build upon our previous
work [28].
The offline calibration of the proposed system consists of three key components:
estimation of the stereo camera’s intrinsic and extrinsic parameters, determination of
the IMU-camera extrinsic offset, and calibration of the LiDAR-IMU transformation. To
estimate both intrinsic and extrinsic parameters of the stereo camera, we employ the well-
established Kalibr calibration toolbox [29], ensuring precise alignment between the camera
and IMU. For 3D LiDAR-IMU calibration, we utilize the state-of-the-art LI-Init toolbox [30],
which provides a robust real-time initialization framework for LiDAR-inertial systems,
compensating for temporal offsets and extrinsic misalignments. To evaluate the robustness
of the proposed approach under diverse conditions, we collected multiple datasets across
three scenarios, such as handheld and UAV mounted configurations. The datasets, referred
to as UL Outdoor Car Parking Dataset, UL Outdoor Handheld Dataset, and UL Car Bridge Dataset,
were recorded at the University of Limerick Campus within the CRIS Lab research group.
Figure 8 illustrates the experimental environments, while Table 3 presents detailed a dataset
of UAV hardware sensor specifications. To address the challenge of asynchronous sensor
data, we employ first-order linear interpolation to estimate the IMU pose at each sensor’s
measurement time, mitigating time bias without significantly increasing computational
overhead. Instead of direct event-based updates, this method ensures that sensor data are
aligned with a consistent reference frame, preventing oversampling of high-frequency IMU
data or undersampling of low-frequency GPS and LiDAR measurements. Additionally,
ROS-based timestamp synchronization of DJI-OSDK and Livox LiDAR nodes further
Machines 2025, 13, 130 18 of 29
minimizes timing inconsistencies, enhancing fusion accuracy and reducing drift in state
estimation.
The proposed method was evaluated without loop closure mode to assess its con-
sistency and robustness. Performance metrics, including the absolute pose error (APE)
and the root mean square error (RMSE), were calculated to quantify the accuracy of the
estimated trajectory [31]. The comparison focused on the ability to mitigate cumulative
errors and maintain robust state estimation across large-scale environments. The details of
the hardware used during the experiments are listed in Table 3.
Figure 9. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and
VINS-Fusion.
Figure 10. Box plots showing the overall APE of each strategy.
Machines 2025, 13, 130 20 of 29
Figure 11. Absolute estimated position of x, y, and z axes showing plots of various methods on the
UAV car parking dataset.
Figure 12. Absolute position error of roll, yaw, and pitch showing the plots of various methods on
the UAV car parking dataset.
Table 4. Summary of accuracy evaluation of the UL outdoor car parking dataset (in metres).
while Figure 14 highlights the box plots showing the overall APE of each of the strategies.
Table 5 provides the RMSE values for each method, and Figures 15 and 16 present the abso-
lute position errors (x, y, z) and orientation errors (roll, pitch, and yaw) for the handheld
UAV dataset.
The results demonstrate that significant position drifts occurred in the stereo IMU-
only scenario. However, accuracy improved considerably when LiDAR, GPS, or their
combination was integrated. VINS-Fusion exhibited growing errors due to accumulated
drift, whereas LSAF maintained a smooth trajectory consistent with the ground truth.
Unlike VINS-Fusion and FASTLIO2, which failed to align precisely with the reference
data, LSAF achieved superior performance by leveraging the LSTM-based self-adaptive
multi-sensor fusion (LSAF) framework and MSCKF fusion.
Figure 13. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and
VINS-Fusion on the UL outdoor handheld dataset.
The system was compared with state-of-the-art algorithms, including VINS-Fusion [4]
and FASTLIO2 [17]. VINS-Fusion integrates visual inertial odometry with/without GPS
data, while FASTLIO2 employs LiDAR inertial odometry combined with global optimization,
including loop closure. In comparison, the proposed method utilizes an LSTM-based adaptive
weighting mechanism to enhance robustness against sensor degradation and environmental
variability, ensuring accurate and reliable state estimation in dynamic conditions.
Machines 2025, 13, 130 22 of 29
Figure 14. Box plots showing the overall APE of each strategy.
Figure 15. Absolute estimated position of x, y, and z axes showing the plots of various methods on
the UL outdoor handheld dataset.
Figure 16. Absolute position error of roll, yaw, and pitch showing the plots of various methods on
the UL outdoor handheld dataset.
Machines 2025, 13, 130 23 of 29
Table 5. Summary of accuracy evaluation of the UL outdoor handheld dataset (in metres).
This experiment evaluates the global consistency of the proposed LSAF framework
in a challenging environment with unstable and noisy GPS signals, particularly under a
bridge, where localization accuracy and trajectory smoothness are significantly affected.
The results demonstrate that LSAF effectively mitigates single sensor drift, maintaining
global consistency and ensuring smooth local trajectory estimation despite degraded GPS
conditions. Figures 18 and 19 illustrate the absolute position errors in the x, y, and z
coordinates and the roll, pitch, and yaw angles, comparing multiple methods on the UAV
car bridge dataset, while Table 6 presents the corresponding RMSE values for each approach.
Figure 20 shows a box plot of the overall relative pose error (RPE) for five different strategies,
demonstrating that LSAF outperforms other state-of-the-art (SOTA) methods.
Figure 17. Trajectory plots of the proposed LSAF method and comparison with FASTLIO2 and
VINS-Fusion.
Table 6. Summary of accuracy evaluation of the UL car bridge dataset (in metres).
Figure 18. Absolute estimated position of the x, y, and z axes showing the plots of various methods
on the UAV car bridge dataset.
Figure 19. Absolute position error of roll, yaw, and pitch showing plots of various methods on the
UAV car bridge dataset.
Figure 20. Box plots showing the overall APE of each strategy.
Machines 2025, 13, 130 26 of 29
5. Discussion
This paper builds upon the work presented at the 12th International Conference on
Control, Mechatronics, and Automation (ICCMA 2024) [28] by introducing a significant
enhancement that includes an LSTM-based self-adaptive fusion technique. This addition
allows the system to dynamically adjust sensor contributions in real time, making it more
robust to challenging environmental conditions and improving sensor reliability compared
to the earlier fixed-weight fusion approach. The LSTM mechanism ensures that the most
Machines 2025, 13, 130 27 of 29
reliable sensors are prioritized during operation. For example, in GPS-degraded areas,
the system gives more weight to LiDAR, IMU, GPS and stereo cameras. In contrast, in
environments with sparse LiDAR features, GPS data becomes more influential. This
flexibility ensures accurate UAV state estimation, even in challenging scenarios like bright
outdoor conditions or feature-poor environments. Extensive testing on real-world datasets
confirmed the effectiveness of the proposed system. The results showed that the LSTM-
based fusion method outperformed state-of-the-art algorithms such as VINS-Fusion and
FASTLIO2, as well as fusion approaches without LSAF, in terms of accuracy and resilience
to sensor degradation. The system achieved lower trajectory errors and demonstrated its
ability to handle complex environments with minimal cumulative errors. Overall, this work
successfully combines traditional Kalman filtering with modern deep learning to improve
UAV state estimation. The LSTM-based adaptive fusion framework sets a strong foundation
for future research and practical UAV applications in complex and diverse environments.
ness and adaptability of the system position it as a valuable contribution to the field of
autonomous UAV navigation, with the potential for further enhancements and applications
in diverse operational scenarios.
Funding: This work was funded by the European Commission’s Horizon 2020 Project RAPID under
Grant 861211 (EU RAPD N°861211), and in part by the Enterprise Ireland’s Disruptive Technologies
Innovation Fund (DTIF) Project GUARD under Grant DT2020 0286B.
Conflicts of Interest: The authors declare no conflicts of interest. The funders had no role in the design
of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or
in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Ye, X.; Song, F.; Zhang, Z.; Zeng, Q. A Review of Small UAV Navigation System Based on Multisource Sensor Fusion. IEEE Sens. J.
2023, 23, 18926–18948. [CrossRef]
2. Irfan, M.; Kishore, K.; Chhabra, V.A. Smart Vehicle Management System Using Internet of Vehicles (IoV). In Proceedings of the
International Conference on Advanced Computing Applications, Advances in Intelligent Systems and Computing, Virtually, 27–28 March
2021; Mandal, J.K., Buyya, R., De, D., Eds.; Springer: Singapore, 2022; Volume 1406.
3. Wang, Z.; Wu, Y.; Niu, Q. Multi-Sensor Fusion in Automated Driving: A Survey. IEEE Access 2020, 8, 2847–2868. [CrossRef]
4. Qin, T.; Cao, S.; Pan, J.; Shen, S. A general optimization-based framework for global pose estimation with multiple sensors. arXiv
2019, arXiv:1901.03642.
5. Lee, W.; Geneva, P.; Chen, C.; Huang, G. Mins: Efficient and robust multisensor-aided inertial navigation system.arXiv 2023,
arXiv:2309.15390.
6. Irfan, M.; Dalai, S.; Trslic, P.; Santos, M.C.; Riordan, J.; Dooly, G. LGVINS: LiDAR-GPS-Visual and Inertial System Based
Multi-Sensor Fusion for Smooth and Reliable UAV State Estimation. IEEE Trans. Intell. Veh. 2024. [CrossRef]
7. Zhu, J.; Li, H.; Zhang, T. Camera, LiDAR, and IMU Based Multi-Sensor Fusion SLAM: A Survey. Tsinghua Sci. Technol. 2024, 29,
415–429. [CrossRef]
8. Irfan, M.; Dalai, S.; Kishore, K.; Singh, S.; Akbar, S.A. Vision-based Guidance and Navigation for Autonomous MAV in Indoor
Environment. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking
Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–5. [CrossRef]
9. Harun, M.H.; Abdullah, S.S.; Aras, M.S.M.; Bahar, M.B. Sensor Fusion Technology for Unmanned Autonomous Vehicles (UAV):
A Review of Methods and Applications. In Proceedings of the 2022 IEEE 9th International Conference on Underwater System
Technology: Theory and Applications (USYS), Kuala Lumpur, Malaysia, 5–6 December 2022; pp. 1–8. [CrossRef]
10. Geneva, P.; Eckenhoff, K.; Lee, W.; Yang, Y.; Huang, G. Openvins: A research platform for visual-inertial estimation. In Proceedings
of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020.
11. Fei, S.; Hassan, M.A.; Xiao, Y.; Su, X.; Chen, Z.; Cheng, Q.; Duan, F.; Chen, R.; Ma, Y. UAV-based multi-sensor data fusion and
machine learning algorithm for yield prediction in wheat. Precis. Agric. 2023, 24, 187–212. [CrossRef] [PubMed]
Machines 2025, 13, 130 29 of 29
12. Wu, Y.; Li, Y.; Li, W.; Li, H.; Lu, R. Robust LiDAR-based localization scheme for unmanned ground vehicle via multisensor fusion.
IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 5633–5643. [CrossRef]
13. Singh, S.; Kishore, K.; Dalai, S.; Irfan, M.; Singh, S.; Akbar, S.A.; Sachdeva, G.; Yechangunja, R. CACLA-Based Local Path Planner
for Drones Navigating Unknown Indoor Corridors. IEEE Intell. Syst. 2022, 37, 32–41. [CrossRef]
14. Dalai, S.; O’Connell, E.; Newe, T.; Trslic, P.; Manduhu, M.; Irfan, M.; Riordan, J.; Dooly, G. CDDQN based efficient path planning
for Aerial surveillance in high wind scenarios. In Proceedings of the OCEANS 2023—Limerick, Limerick, Ireland, 5–8 June 2023;
pp. 1–7. [CrossRef]
15. Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734.
[CrossRef]
16. O’Riordan, A.; Newe, T.; Dooly, G.; Toal, D. Stereo vision sensing: Review of existing systems. In Proceedings of the 2018 12th
International Conference on Sensing Technology (ICST), Limerick, Ireland, 4–6 December 2018.
17. Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Trans. Robot. 2022, 38, 2053–2073.
[CrossRef]
18. Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual,
visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [CrossRef]
19. Sun, K.; Mohta, K.; Pfrommer, B.; Watterson, M.; Liu, S.; Mulgaonkar, Y.; Taylor, C.J.; Kumar, V. Robust stereo visual inertial
odometry for fast autonomous flight. IEEE Robot. Autom. Lett. 2018, 3, 965–972. [CrossRef]
20. Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual–inertial odometry using nonlinear optimiza-
tion. Int. J. Robot. Res. 2015, 34, 314–334. [CrossRef]
21. Zhang, J.; Singh, S. LOAM: Lidar odometry and mapping in real-time. In Proceedings of the Robotics: Science and Systems,Berkeley,
CA, USA, 12–16 July 2014; Volume 2.
22. Devarajan, H.; Zheng, H.; Kougkas, A.; Sun, X.H.; Vishwanath, V. Dlio: A data-centric benchmark for scientific deep learning
applications. In Proceedings of the 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing
(CCGrid), Melbourne, Australia, 10–13 May 2021.
23. Huai, Z.; Huang, G. Robocentric visual–inertial odometry. Int. J. Robot. Res. 2022, 41, 667–689. [CrossRef]
24. Shan, T.; Englot, B.; Ratti, C.; Rus, D. Lvi-sam: Tightly-coupled lidar-visual-inertial odometry via smoothing and mapping. In
Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021.
25. Han, L.; Lin, Y.; Du, G.; Lian, S. DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D
Geometric Constraints. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Macau, China, 3–8 November 2019; pp. 6906–6913. [CrossRef]
26. Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and
mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV,
USA, 24 October–24 January 2020.
27. Mourikis, A.I.; Roumeliotis, S.I. A multi-state constraint Kalman filter for vision-aided inertial navigation. In Proceedings of the
2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007.
28. Irfan, M.; Dalai, S.; Vishwakarma, K.; Trslic, P.; Riordan, J.; Dooly, G. Multi-Sensor Fusion for Efficient and Robust UAV State
Estimation. In Proceedings of the 2024 12th International Conference on Control, Mechatronics and Automation (ICCMA), London,
UK, 11–13 November 2024.
29. Rehder, J.; Nikolic, J.; Schneider, T.; Hinzmann, T.; Siegwart, R. Extending kalibr: Calibrating the extrinsics of multiple IMUs and
of individual axes. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm,
Sweden, 16–21 May 2016.
30. Zhu, F.; Ren, Y.; Zhang, F. Robust real-time lidar-inertial initialization. In Proceedings of the 2022 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022.
31. Grupp, M. evo: Python Package for the Evaluation of Odometry and Slam. 2017. Available online: https://github.com/
MichaelGrupp/evo (accessed on 6 February 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.