0% found this document useful (0 votes)

11 views

Autonomous_UAV_Implementation_for_Facial_Recognition_and_Tracking_in_GPS-Denied_Environments (1)

This research presents an autonomous UAV system for facial recognition and tracking in GPS-denied environments, aimed at enhancing crime prevention in Latin America. The system utilizes a Siamese network for facial recognition and a novel tracking algorithm, achieving a recognition accuracy of 94.45% across various environments. The integration of deep learning algorithms with UAV technology offers a versatile solution for law enforcement efforts, addressing the limitations of traditional fixed-camera systems.

Uploaded by

Surendra Loya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Autonomous_UAV_Implementation_for_Facial_Recognition_and_Tracking_in_GPS-Denied_Environments (1)

Uploaded by

Surendra Loya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Received 26 July 2024, accepted 13 August 2024, date of publication 22 August 2024, date of current version 4 September 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3447899

Autonomous UAV Implementation for

Facial Recognition and Tracking in
GPS-Denied Environments
DIEGO A. HERRERA OLLACHICA , (Member, IEEE),
BISMARK K. ASIEDU ASANTE, (Member, IEEE), AND HIROKI IMAMURA, (Member, IEEE)
Department of Information System Science, SOKA University of Japan, Hachioji, Tokyo 192-8577, Japan
Corresponding author: Diego A. Herrera Ollachica ([email protected])
This research was supported by the Science and Technology Research Partnership for Sustainable Development (SATREPS; Grant
Number JPMJSA2005) funded by the Japan Science and Technology Agency (JST) and the Japan International Cooperation Agency
(JICA). The project’s website is: https://www.soka.ac.jp/en/satreps-earth/project/. As a PhD student, I worked as a research assistant
for this project, and the drones developed for this paper will be used for research purposes in the SATREPS-EARTh project.

ABSTRACT Surveillance with facial recognition holds immense potential as a technological tool for
combating crime in Latin American countries. However, the limitations of fixed cameras in covering wide
areas and tracking suspects as the evaded recognitions systems pose significant challenges. To address these
limitations, we propose a facial recognition system designed to recognize faces of suspected individuals
with criminal backgrounds and missing persons. Our solution combines facial recognition technology
with a custom-built unmanned aerial vehicle (UAV) for the identification and tracking of targeted persons
listed in a database for crimes. We utilize the inception v2 model to deploy a Siamese network on the
Jetson TX2 platform for facial recognition. Additionally, we introduce a novel tracking algorithm to track
suspected individuals in the event of evasion. During field test experiments, our system demonstrated
strong performance in facial recognition across three different environments: stationary, indoor flight, and
outdoor flight. The accuracy of our system was 94.45% for recognizing along with our tracking algorithms.
An improvement of 1.5% in recognition and better tracking approach for surveillance. This indicates the
versatility and effectiveness of our solution in various operational scenarios, enhancing its potential for crime
prevention and law enforcement efforts in Latin American countries.

INDEX TERMS Unmanned autonomous vehicles (UAV), facial recognition, object tracking, deep learning,
autonomous flight, embedded systems.

I. INTRODUCTION as the policies aimed at reducing crime in Latin America often

The last decade, has been marked by the growth and spread rely on approaches that have proven to be ineffective. On the
of crime, violence, and the disappearance of people in Latin other hand, the promising solutions are linked to the use of
America, with an increase of up to 11 percent in these information technologies that are yet to be fully exploited [3].
incidents between 2000 and 2010, which caused more than One of the technologies contributing to solving this
1,000,000 deaths [1]. In 2020, Latin America reported more problem is facial recognition, by recognizing the face of a
than 150,000 victims of intentional homicide [2]. Thus, Latin person we can identify suspected persons whose information
America is often described as the most violent region in the is available in the police database. Currently, this technology
world [3]. This situation has become even more challenging, is used with fixed-position cameras but the system can be
improved by using UAVs, to cover more areas and prevent
The associate editor coordinating the review of this manuscript and culprits from invading these fixed-position cameras. Some
approving it for publication was M. Anwar Hossain . other drone companies have achieved facial detection and
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
119464 For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 12, 2024
D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

FIGURE 1. System architecture of the drone. 1) Flight Management Unit (FMU). 2) Drivers, bring all the sensors necessary for autonomous drone flight.
3) Computer Vision module for facial recognition. 4) Motion Generator module that generates new setpoints.

tracking the detected faces but they are not capable of for instance, support vector machine (SVM), local binary
identifying who is the person in front of the drone [3]. pattern histogram (LBPH), Eigenfaces, and deep neural
The facial recognition system is an advanced method networks [8]. One of the prevalent deep convolution models
designed to detect and recognize a person in a digital image in use today is VGG16 [9], [10], which evaluates the depth
or video source. The facial recognition system continues of the convolutional network and its precision in extracting
to advance each year, to the extent that it can accurately features for recognition tasks in large-scale images using very
identify individuals even after they have undergone plastic small convolution filters (3 × 3). Another widely adopted
surgery. [4], [5]. This achievement is due to the artificial model is FaceNet [11], which incorporates the ‘triplet loss’
intelligence algorithms becoming more sophisticated due to function described in section two.
free access to vast amounts of data for training the algorithms. Several state-of-the-art deep learning models have pro-
These artificial intelligence algorithms are expanding their duced exciting results with high accuracy in various computer
capabilities to different areas of daily human life [4]. The vision tasks [12]. The accuracy of DNN has been shown
current state-of-the-art facial recognition technology has in identifying metastatic breast cancer, where it improves
made substantial advancements in various fields, especially detection to 98.4% [13]. In the same vein, one of the several
security. According to recent studies, facial recognition computer vision tasks is facial recognition which uses deep
systems are better at identifying individuals and acquiring convolutions neural networks and is being used widely in
information such as name, age, and nationality [6] and some the research community however one of the problems is
airports are using facial recognition instead of using boarding the quantity of data to train the network. To overcome this
passes [7]. challenge, we choose a Siamese network which was proposed
Facial recognition systems have been developed with by Koch et. al [14], the Siamese network is among the state-
machine learning and deep learning algorithms. Several of-the-art models for recognizing faces, we need a few images
methods for developing facial recognition systems exist, of each subject to recognize their faces. Due to its popularity

VOLUME 12, 2024 119465

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

as an excellent feature extractor, we have chosen the deep based on deep reinforcement learning for controlling vertical
convolutions neural network DCNN as the base network to take-off and landing (VTOL) UAVs amidst wind distur-
build the Siamese facial recognition system. bances, achieving high accuracy and robustness in tracking
Facial recognition systems are often integrated into fixed and flight stability [72]. Similarly, Xu et al. proposed
camera surveillance systems, this leads to a constraint on a reinforcement learning-based control method for UAV
the range of view thus making the system inefficient in formation in GPS-denied environments, which optimizes
monitoring wide areas. With the fixed camera systems, it can control policies and minimizes collision risks to enhance
only identify a person located in front of it, this leads to UAV swarm management [73]. These studies highlight
three main problems, 1. limited tracking ability, 2. limited the efficiency of reinforcement learning in boosting UAV
coverage area, and 3. lack of adaptability. In this manner, tracking accuracy and operational performance, which aligns
drones can track a person from a distance using various with our research aims. However, there are substantial
algorithms for tracking, this technology addresses the lack differences between these studies and our approach. Ma et al.
of tracking capabilities of the existing systems. Drones can primarily focus on maintaining the position of the UAV
cover vast areas, thereby resolving the second main problem and stability under environmental disturbances [72], while
of small coverage areas. Furthermore, drones can easily adapt Xu et al. concentrate on keeping UAV formations intact
to diverse situations and environments by adjusting their in GPS-denied environments [73]. Conversely, our research
position, changing the camera’s vision angle, and altering centers on the detection, recognition, and tracking of specific
their location. Thus, the use of drones enables us to overcome individuals using facial recognition technology. By employ-
the three main problems described earlier. ing a Siamese network for facial recognition and a unique
In recent studies, drones integrated with deep learning tracking algorithm that maintains the control of the drone
systems have emerged as highly effective tools for solving along with the Extended Kalman filter (EKF) algorithm, our
problems quickly in diverse fields. Particularly in the context UAV system aims to follow individuals identified from a
of rescuing injured individuals, where time sensitivity is criminal database, addressing the specific challenges of crime
paramount, drones play a pivotal role in the timely detection prevention and law enforcement in Latin American countries.
of missing persons [15]. Drones are also employed in the field In this research, we aim to provide a solution to the
of security to enable machines to interpret human behavior, challenges of citizen insecurity affecting several developing
for example, in surveillance, a drone can detect human poses nations. To achieve this, we present a novel integration of
in motion, in sports, drones can identify human behavior facial recognition technologies using transfer learning and
to obtain information using human pose estimation [16]. autonomous UAVs. Our system stands out by incorporating
Additionally, employing drone-captured images for citizen deep learning algorithms and a custom-built drone platform
safety entails analyzing human behavior patterns, adding a to perform real-time detection, recognition, and continuous
layer of sophistication to security measures. Our research tracking of individuals. We use a compact Jetson TX2
is also related to citizen security through the use of deep computer, the Pixhawk 4 flight controller, and the T265
learning models to help in solving the problem of identifying positioning camera. The system runs on the ROS Melodic
criminal suspects. framework, which includes all the necessary nodes to operate
Using drones with integrated neural network systems can the system in real time.
contribute to the reduction of humans making contact with The focus of this research is not only on face detection
disease-transmitting agents such as birds when observing the and recognition but also on tracking, some studies have
agents [17]. Targeted application of pesticides in large-scale been oriented toward object tracking using drones, in this
commercial farms is another exciting use of drones which regard, several research studies are addressing this topic,
contributes to the well-being of plants and protects already as mentioned in the work of Mukashev et. al [22], a drone
healthy plants [18]. Similarly, in the study of marine species, detection and tracking system is developed using the YoloV3
aquatic drones can assist in recognizing different fish species algorithm and the CSRT algorithm provided in the OpenCV
using deep learning models such as googleNet [19] and library to detect and track humans. Tracking multiple objects
AlexNets [20], thereby providing statistical data for marine using videos captured by a drone is valuable for surveillance
life preservation efforts [21]. The interaction between humans and defense purposes, which is why Kim et. al [23], an inno-
and drones has recently received more attention in the vative algorithm for object detection and tracking has been
academic field. Drones can detect a person, and the individual developed, enhancing the Joint Detection and Tracking (JDT)
can send their current position and health status via a algorithm [24]. Obstacle detection and tracking using drones
smartwatch. Subsequently, the drone can track the detected constitute a broad field of research [25], with new studies
person using its own video feed and smartwatch data, continually showing promising results. However, in the area
allowing for the assessment of the physical condition of the of face detection, recognition, and tracking, there are still
person [22]. relatively few research efforts. In many countries, crime
Significant progress has been made in UAV tracking and violence are significant concerns, and the police play
and control recently. Ma et al. introduced an algorithm a crucial role in crime prevention. However, there is a lack

119466 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

of technology that can assist in this regard. Therefore, this discusses the confusion matrix. In the fifth section, we discuss
paper develops a system for face detection, recognition, and the results and the findings we have obtained from the
tracking using an autonomous drone. The main contribution experiments. Finally, we concluded with a summary of the
of this research is as follows: research and suggested future works. It is expected to reduce
crime and violence and increase civil security by using this
• We introduce a simple but novel algorithm, simple
research as a plan to develop sophisticated drones [26].
matching real-time tracking (SMRT), designed to match
the ID generated by Simple Online and real-time II. RELATED WORKS
tracking (SORT) with the identity generated by the Surveillance is an important aspect in our communities
Siamese network. This algorithm significantly enhances to safeguard the integrity of our citizens against crime,
tracking accuracy and efficiency, providing a major violence, and kidnapping. The Colca Canyon is one of
improvement over existing tracking methods that often the deepest canyons in the world and people usually get
struggle with identity consistency across frames. Details lost which causes complicated search and rescue tasks
are elaborated in section two of this study. due to the geography of the place [27], for this reason,
• We propose the design and implementation of a novel it is necessary to deploy the facial recognition system
autonomous drone capable of recognizing faces within in autonomous drones. Facial recognition is a technology
a database by leveraging deep learning algorithms that is used every day in different applications and some
for the detection, recognition, and innovative method research is using this technology to create more sophisticated
for continuous tracking of recognized faces. This applications for different environments which means real
combination provides a unique advantage over existing situations. However, the versatility of using drones raises
systems that typically do not integrate these capabilities concerns regarding potential uses for malicious purposes,
on a single, autonomous platform. thus, research is underway to develop systems for drone
• To integrate the Jetson TX2 and its associated compo- detection using other drones for defense purposes [28], [29].
nents seamlessly, we meticulously designed and assem- In this section, we are sharing related works concerning the
bled the drone using SolidWorks CAD software. The three key aspects we are focusing on in our research: facial
Intel RealSense T265 camera was strategically chosen detection, facial recognition, and tracking. Notably, we’ve
to capture precise positional and orientational data for integrated these aspects into a drone that we crafted in our
the drone with the Extended Kalman Filter algorithm, laboratory.
enhancing its ability to operate in GPS-denied envi-
ronments. This customized approach was imperative A. DRONE FACE DETECTION
to fulfill the unique project requirements and is a Detecting faces is the first step to beginning the facial
significant improvement over conventional systems that recognition system, however, detecting faces during a flight
rely heavily on GPS. has its challenges as the vibration of the camera is caused by
the rotation of the engine which can affect the recognition
This research presents a unique integrated solution that of the person in front of the drone. Besides, the distance
combines the detection, recognition, and continuous tracking between the person and the camera of the drone can affect the
of individuals through the use of an autonomous drone recognition as well. Hsu and Chen et. al [30] experimented
system. While current facial recognition and drone tracking to detect faces at different heights from the ground to the
systems have shown advancements separately, our proposal drone and different distances from the face of the person to
stands out by merging these capabilities into a single the drone. Fig. 2 illustrates the experiment for taking pictures
autonomous platform. This integration not only enhances with the stick that represents the drone, according to Hsu and
tracking accuracy but also significantly extends the coverage Chen et. al [30], the performance of detecting faces using
and adaptability of the system, overcoming the limitations deep learning methods such as Face++ [31] and Rekognition
of fixed-camera systems and current drone-based solutions. API [32] is better than some other traditional techniques [33].
This novel approach enables more efficient and effective In the same way, the authors mention that this is an
tracking of individuals in various situations, providing a clear empirical study to evaluate the different factors that may
advantage over existing methods in the field. This paper is affect face detection in drones [33]. Hence, the issue using
organized in the following way, In Section Two, we present a stick is that it does not simulate real drone conditions so
six pieces of research that are related to our research in two face detection may not be good, and facial recognition is not
main aspects, one is facial recognition used in drones, and the performed. Though the results for face detection are quite
second is drone applications. In the third section, we present good, the inference has not been tested in a drone or onboard
the system architecture, the drone implementation, and the computer so is it not possible to analyze the performance of
robot operating system (ROS) architecture, and then we the system in a real situation.
present in detail the Siamese network used in this research.
In addition, the fourth section shows the experiment of B. DRONE FACE RECOGNITION
the drone running in two different environments, not flying Facial recognition with drones for tasks such as moni-
the drone, indoor flights, and one outdoor test, as well as toring, and person identification is gaining prominence.

VOLUME 12, 2024 119467

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

FIGURE 2. An illustration depicting the data collection experiment made by [29]. Images of faces were taken at different distances and altitudes and
collected to experiment with facial recognition with drones.

Jurevičius et al. [34] in the work showed how a task to the other one called the real-life violence situation (RLVS)
recognize faces using drones as the video source is possible. dataset [44], but not mention the amount of data stored in
To detect and recognize a face they are using the package the database for facial recognition. Nevertheless, the entire
Dlib [35] which uses the histogram of oriented gradients system can recognize faces with 99.20% accuracy. The main
(HOG), and the resnet_v1 model [36] for recognition that issue is that the entire system is not fully or semi-autonomous
is included in the Dlib package. The database consists of since the drone must be controlled from a ground station,
13233 faces that were kept in the SQL database. The video also the drone cannot follow the person. Similarly, video
frame is captured by the Raspberry Pi camera mounted in and violent scene recognition are processed on a computer,
the DJI Mavic Pro drone. The problem of using a small so the real-time accuracy cannot be precisely determined by
single-board computer such as the Raspberry Pi to transmit the drone itself.
video usually has many delays and loss of important data Autonomous drones are being researched since the human
in the transmitted images, in addition, image processing and pilot cannot fly the drone every time it is required to do a
facial recognition are carried out on a remote server, which task and in search and rescue tasks, it is important to have
means that it depends of a continuous Wi-Fi connection. several drones working. Hence, a UAV for detecting people
Our approach is the implementation of the Siamese network and objects in cluttered indoor environments was developed
model in the same drone in real-time, this drastically reduces by Sandino et. al [45]. The drone uses an onboard computer
the time of sending and receiving images and also increases UP2 together with a Vision Processing Unit(VPU) to boost
exponentially the response of the drone to unwanted events. the computations, the research uses the Google MobileNet
Surveillance and violence detection are among the exciting SSD [41], which is deployed in the framework Caffe and
applications of facial recognition when implemented in tuned with the pre-trained weights from PASCAL VOC2012
drones. Srivastava et. al [37], proposed a new method to dataset [46]. Therefore, they use the Partially Observable
detect violent situations between people and identify the Markov Decision Process(POMDP) to model the navigation
individuals involved in the violent scene. They are using problem and solve it in real-time by using the Augmented
seven different Imagenet models VGG16 [10], VGG19 [10], Belief Trees(ABT) algorithm [46], [47]. Since this research
ResNet101V2 [38], DenseNet201 [39], InceptionV3 [40], has a good approach to navigating indoor environments with
MobileNet [41], and NASNet [42] plus three combinations obstacles, in future work, we can either use this approach or
of two models to analyze the best architecture to recognize enhance it. However, this research does not perform people
violence. Besides, they propose a new ResNet-28 architecture tracking, which is the main focus of our investigation.
to do facial recognition. Hence, both, violence detection and
facial recognition are not trained from scratch, instead, they C. DRONE FACE TRACKING
use transfer learning techniques to add layers and train the Several research studies have used drones with cameras to
last layers of the architecture. For violence detection they are capture video frames and then process them on a personal
using two databases, one called the hockey dataset [43], and computer to detect and recognize faces using different

119468 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

methods [48]. Tracking a face is another task that an features of two given inputs of faces, the recognized faces
autonomous drone must do in real-time. Hence, the DJI Tello are then tracked with our novel tracking algorithm. Given an
drone can be used for face detection and tracking, the DJI image captured by the drone, the Siamese network determines
Tello drone has a software development kit (SDK) where we whether the person belongs to a suspect listed in a database
can implement a Python script to detect faces and by reading for a crime or not, the face of the recognized suspect is then
the values of the sensor inside the drone it is possible to tracked using our proposed SORT and SMRT algorithms,
follow the face in front of the drone [49]. Priambodo et. al and the tracking algorithm calculates and outputs specific
[49] mention that the system uses a haar cascade classifier coordinates called setpoints (X, Y, Z) in a 3D space based on
to reduce the computational cost, hence, the DJI Tello drone the position of the face of the recognized suspect, and then the
cannot recognize faces, but it can detect and follow them. generated setpoints are sent to the flight controller, the flight
A different research related to autonomous UAVs is a controller receives the new setpoints and begins to track the
hunting drone. Wyder et al. [50] is developing a novel person in front of the drone and if there is no person in front
drone to detect, track, and follow another drone by using of the drone, it will start rotating over the z-axis in search of
a pre-trained Tiny Yolo model. Besides, they implement a a new face to detect. An illustrated overview of the proposed
linear regression model to predict the next position of the system is shown in Fig. 3. In this research, the subjects have
target drone. Hence, they used YOLO’s Darknet-53 [51] as given their consent to carry out the experiments.
a pre-trained model to train a tiny Yolo model, they collected
a total of 58,647 pictures as a database. Therefore, this drone A. SYSTEM ARCHITECTURE
is autonomous since they are using the Intel realsense T265 Fig. 1 shows the four modules of the system architecture:
tracking camera to obtain the position and orientation of the computer vision module, the motion generation module,
the drone and to communicate the flight controller with the the flight management unit module, and the driver module.
onboard computer using mavlink-ros bridge protocol [52]. Our drone system comprises a conventional USB camera
The results of this research are promising since it can achieve connected to the onboard single-board computer, alongside
its goal with a good performance and 77% accuracy in a an array of sensors dedicated to distinct subtasks during
cluttered environment. While this research focuses on drone autonomous flight. The computer vision module, integrated
tracking, our research centers on the implementation of into the onboard single-board computer, employs the Haar
autonomous drones that detect and recognize faces. Based on cascade function for face detection and a Siamese network
the position of the face relative to the camera frame, new XYZ for face recognition within a pre-established database of
coordinate setpoints are calculated for the drone to move. The crime suspects. Coordinating with the motion generator, this
novelty of our research lies in the SMRT algorithm and facial module executes algorithms to search for new faces and track
tracking using the face’s relative position within the camera recognized ones within the surroundings. Both the computer
frame. vision module and the motion generator are hosted on the
Object or person tracking can also be achieved through Jetson TX2 embedded within the drone. The driver module is
a human-machine system, Zhou and Liu [53] proposes a responsible for reading the drone’s IMU, magnetometer, and
comprehensive human-in-the-loop tracking framework with other sensors to obtain specific data, such as battery voltage
two main modules. The Local Tracking Module employs and other controller data. Additionally, it is responsible
the SiamRPN model, enhanced with a human-attention- for performing Simultaneous Localization and Mapping
guided approach to improve tracking accuracy around the (SLAM) to obtain the drone’s position and orientation. On the
human visual focus. The Human Attention Analysis Module other hand, the flight management module is responsible for
identifies Targets of Human Interest (TOHI) by analyzing collecting all data from the previous modules and, based on
eye movement patterns and accumulated attention time, that data, such as the local position, it controls the motor
enabling effective tracking correction within and outside electronic speed controllers (ESCs) to reach the required final
the visual focus area. Furthermore, in contrast to the position. For real-time monitoring and intervention, a remote
mentioned research, our research specifically focuses on face desktop station observes the drone’s autonomous activities.
detection, recognition, and tracking. Our goal is to develop This enables manual intervention should any anomalies arise
an autonomous drone capable of identifying individuals during flight operations.
independently, without human intervention, representing
a comprehensive approach towards autonomy in person 1) UAV HARDWARE DESIGN
identification. Therefore, the detection, recognition, and As illustrated in Fig. 4, the hardware components of the
tracking of objects using drones is an area that has been under drone consist of four platforms. The first platform houses
research due to its numerous applications. the main battery, power management board, and electronic
speed controllers (ESCs), the second platform accommodates
III. METHODOLOGY the four arms with the engines, the third platform hosts
In this work, we present an autonomous drone that recognizes the flight controller, the Jetson TX2, and the GPS, the last
faces within a database of faces using a Siamese network, platform is dedicated to the main camera, secondary battery,
a deep neural network for comparing the similarity between and telemetry radio. The onboard computer responsible

VOLUME 12, 2024 119469

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

FIGURE 3. Proposed deep learning-based face recognition tracking drone: 1) Images from the camera mounted on the drone are used as input for the
Siamese network to recognize faces. 2) The proposed face tracking algorithm tracks the face of the person in front of the drone and returns the
setpoints information. 3) Flight control is generated based on the coordinates of the face of the person and using ROS is sent to the flight controller.

for running the computer vision module and sending the intense light or darkness affecting visibility, the T265 camera
commands to the flight controller is the Jetson TX2. Featuring is positioned downward towards the landing pad, acting as
GPU architecture with 256 NVIDIA CUDA cores, a dual- the drone’s eyes. Additionally, video frames are captured by
core NVIDIA Denver 2 64-bit CPU, quad-core ARM cortex the ELP USB camera mounted on the drone, as presented in
A57 MPCore, 8GB 128-bit LPDDR4 memory, 32 GB storage Fig. 4. The final design configuration is shown in Fig. 5.
eMMC 5.1., the Jetson TX2 is mounted on the Orbitty carrier
board. This board provides connectivity options such as USB 2) ROS ARCHITECTURE
3.0, USB 2.0, HDMI, MicroSD, 3.3v UART, I2C, GPIO, and The software implementation is designed to be as
GbE port. The selection of the Jetson TX2 was driven by its autonomous as possible based on the system architecture
power efficiency, and affordability, as not all small computers presented in subsection A. It was necessary to choose a
can run a Siamese network for facial recognition. framework to run multiple algorithms that allow the drone
For autonomous flight capability, the Pixhawk4 flight to be autonomous as much as possible. The robot operating
controller was selected for its compatibility with the onboard system (ROS) framework provides the capacity to run each
computer and ability to modify position and orientation. Python script and interact with each other by publishing
While Pixhawk4 utilizes its GPS and IMU sensors for posi- and subscribing to topics, thus, we can run our nodes to
tion and orientation data, occasional signal loss is inevitable. do different tasks at the same time. MAVROS is a bridge
To address this issue, the Intel RealSense T265 camera was between the MAVlink protocol and the ROS framework,
integrated to provide reliable position and orientation values. MAVlink is a messaging protocol that communicates with
Equipped with two fisheye lens sensors, an IMU, and an Intel drones and between onboard drone components, MAVROS
Movidius Myriad 2 VPU, the T265 camera enables visual runs in the ROS framework and converts ROS messages into
SLAM processing on the VPU. In a heightened light intensity MAVlink messages to be sent to the flight controller.
environment as well as dark environments, the camera may Fig. 6 depicts the 6 nodes and the flow of messages
not be able to capture the visuals. To mitigate issues related to from one node to another. The facial recognition node is

119470 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

FIGURE 4. An image of our UAV built from scratch with its labeled components. (a) Front view of the drone showing: (1) T-motor MN3510 KV630;
(2) Second battery 11.1v 5100mAh; (3) Video camera 1080p; (4) T265 Intel realsense tracking camera. (b) Side view showing: (5) Telemetry radio;
(6) Pixhawk 4 flight controller; (7) Main battery 14.8v 5600 mAh.

single pose-type topic to the main node. The main node is the
primary node of the quadcopter, which receives the position
and orientation it needs to reach and publishes that message
to the flight controller using MAVROS.

B. THE OPERATION OF THE DRONE AND THE SYSTEM

The autonomous flight sequence begins with the initialization
of the ROS architecture. Subsequently, the drone transitions
from manual to offboard mode, autonomously arming the
vehicle and initiating takeoff procedures. Once airborne,
the ROS nodes responsible for facial recognition, tracking,
and search functionalities become active, publishing and
FIGURE 5. Autonomous drone implementation for facial recognition and subscribing to topics as required.
tracking. Subsequently, the USB camera captures frames, which
are processed by the computer vision module utilizing the
responsible for performing facial recognition and publishing OpenCV library for face detection. This module employs
four topics depending on whether the Siamese model is a Haar cascade algorithm to detect faces, resizes the
loaded, if no face is detected and needs to search, or if a frame to match the input size of the Siamese network
face is recognized and needs to halt, publishing the Cartesian (96 × 96), executes the Siamese model, and calculates the
coordinates of the recognized face’s bounding box. The Euclidean distance. Following this, the SORT algorithm
tracking node subscribes to the facial recognition node and is executed in conjunction with the SMRT algorithm to
handles person tracking, modifying the drone’s position enhance recognition and tracking. Upon facial recognition,
and orientation, and publishing the new drone position and the tracking node publishes the coordinates of the face’s
orientation to another node. The searching node subscribes bounding box to initiate face tracking. After completing
to the facial recognition node and is only activated when the tracking experiment, the drone autonomously initiates
no face is detected or recognized, publishing the new drone landing procedures.
orientation to rotate on its axis and continue searching for Upon detecting a face, the Siamese network identifies the
new faces. The takeoff and landing node is responsible for individual and transmits the bounding box coordinates to
taking off and landing the drone if no face is detected for a few the motion generator module. The motion generator then
minutes or if it receives a landing instruction via command. updates the local position values based on the detected
The Distributor node receives all position and orientation face’s position relative to the camera and forwards the
coordinates from the takeoff and landing, searching, and new setpoints to the flight controller. Both the computer
tracking nodes. After receiving this data, it publishes it to a vision module and the motion generator operate within the

VOLUME 12, 2024 119471

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

ROS framework. An external sensor is integrated into the computer vision technique for estimating the 3D pose and
drone system to navigate in GPS-denied environments. The velocity of a vehicle in motion relative to its initial local
Pixhawk flight controller incorporates various sensors such position. Using VIO, we can determine the position of the
as GPS, magnetometer, gyroscope, and air pressure sensors drone in 3D space with an Extended Kalman Filter algorithm.
to determine the drone’s position accurately. Implementing VIO requires the use of RGB cameras and
image processing libraries. In our research, we use the Intel
RealSense T265 camera, which supports ROS1 using a wrap-
per. The topics published by the nodes of the ros-T265 pack-
age include odometry. The ROS topic we use for odometry is
/camera/realsense2_camera/camera/odom/sample. The nec-
essary parameters to set to use external position information
with Extend Kalman Filter(EKF2) are described below:

• EKF2_AID_MASK: Configure the fusion of vision

position, vision velocity, vision yaw, and external vision
rotation based on the preferred fusion model.
• EKF2_HGT_MODE: Set to Vision to use visual data as
the main source for altitude measurement.
• EKF2_EV_DELAY: Adjust to account for the difference
between the measurement timestamp and the actual
capture time.
FIGURE 6. An illustration of the ROS proposed architecture for the
autonomous drone showing the various nodes and the published data to
• EKF2_EV_POS_X: Specify the location of the vision
the various interfaces. sensor relative to the vehicle’s body frame in the X axis.
• EKF2_EV_POS_Y: Specify the location of the vision
C. UAV FLIGHT TIME DURATION sensor relative to the vehicle’s body frame in the Y axis.
One of the most challenging issues when deploying a UAV in • EKF2_EV_POS_Z: Specify the location of the vision
real-world applications is the limited flight time. Since this sensor relative to the vehicle’s body frame in the Z axis.
project is in the research phase, several strategies have been
According to the tests conducted, the determined max-
implemented to address this challenge:
imum altitude at which the Intel RealSense T265 camera
• Energy-Efficient Components: The use of the Jetson
operates correctly is 50 meters. Above this altitude, the
TX2 and the Pixhawk 4 flight controller are known for
camera cannot accurately estimate the altitude. Similarly,
their low power consumption, which helps optimize the
in low-light conditions, the camera fails to estimate position
drone’s energy consumption, leading to increased flight
and orientation accurately. To address these challenges,
time.
a system that integrates both GPS and VIO cameras can be
• Battery Management: The drone has been tested with
implemented to improve the UAV’s position and orientation
two separate batteries, one for the components that
estimation.
make up the onboard computer, and another battery that
powers the flight controller along with the motors. This
increases the flight time. E. FACE-TRACKING METHOD
• Battery chemical composition: currently, some advanced In this section, we introduce our innovative face-tracking
batteries include graphene. The inclusion of graphene method that extends beyond basic facial recognition and
in LiPo batteries improves electrical conductivity, bounding box assignment. When identified individuals
increases energy storage capacity, and accelerates attempt to evade the system, the drone becomes a crucial tool,
charging and discharging, thereby improving the overall enabling continuous tracking while ensuring a safe distance
performance, duration, and efficiency of the battery. is maintained, thereby safeguarding the environment.
• The battery status monitoring function is being imple- To achieve this, we impose constraints on the drone’s
mented to be able to send remote orders to the drone to movement trajectories. These constraints are based on the
return to the base station for a battery swap. observation that the size of the detected face changes with
• The estimated flight time calculated in this research was the distance between the drone and the identified person.
between 6 to 8 minutes using a 14.8v and 6500mah Therefore, we use the bounding boxes around recognized
battery. The estimated flight time using a 14.8v and faces to gauge the proximity of the user to the drone.
10000mah battery was around 15 minutes. Algorithm 1 describes the tracking node of our system, which
is implemented in Python and adheres to the specifications
D. GPS-DENIED NAVIGATION described in this section.
To achieve flight in GPS-denied environments, the technique Our tracking algorithm evaluates proximity using three
of visual-inertial odometry (VIO) is required. VIO is a main criteria on the detected faces:

119472 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

1) If the bounding box area of the face exceeds 29,000 px

(size of 227 × 128).
2) If the bounding box area of the face is less than 5,000
px (size of 86 × 60).
3) If the bounding box area of the face falls within the
range defined by the two previous criteria.

FIGURE 8. First quadrant from 0 to π/2 with the equation to move

backward.

FIGURE 7. Trigonometric circle aligned with the drone coordinates system.

For controlling the drone in an optimal position for

recognizing and tracking faces, we set fixed altitude, and the
rotation of the drone for only yaw rotation while setting the
roll rotation and pitch rotation fixed. The yaw rotation uses
the trigonometric circle presented in Fig. 7 and the drone
located in the central position.
In our test flights, the movement of the drone is restricted to
move backward, forward, and rotate left, and right to follow
the target person. The angles are in radians but before sending FIGURE 9. Fourth quadrant from 3π/2 to 2π with the equation to move
the set-points to the flight controller, the angles are converted backward.
to quaternions.
The frame of the camera is 640 × 480 pixels. To ensure until an optimal distance between the subject and drone is
optimal facial recognition, we considered the third criterion attained. In (1), the θ is considered for the first quadrant, (2),
to be the best and safest condition for tracking the person. (3), and (4) are considered for the second, third, and fourth
In case the bounding box area of the face covers more than quadrant respectively. An empirical value of 0.09 is used in
29000px (size of 227×128), it means that a face is too close to the equation initializing and setting a secure distance.
the drone and the drone must perform a backward movement " 1 #
not to injure the person in front of the drone. If the bounding Xbackward −(d cos(θ) + 0.09)

1
vbackward = = (1)
box area is less than 5000px (size of 86 × 60), the recognition Y1backward −(d sin(θ) + 0.09)
may not be accurate therefore the drone needs to move closer. " 2 #
Xbackward d sin(θ) + 0.09

2
vbackward = = (2)
1) BACKWARD MOVEMENT −(d cos(θ) + 0.09)
Y2backward
Given the position of the drone given as P(x, y) at a fixed " 3 #
altitude, a movement from P to new position O(x, y) will Xbackward d cos(θ) + 0.09

3
vbackward = = (3)
occur in an instance where the drone is too close to a subject Y3backward d sin(θ) + 0.09
and move away in the opposite direction. This movement " 4 #
is considered a backward movement. To move the drone Xbackward −(d sin(θ) + 0.09)

4
vbackward = = (4)
backward, we consider the yaw angle, θ of the drone in the Y4 d cos(θ) + 0.09
backward
2D plane of (x, y), and which quadrant, θ is located, then
we can compute the new position O regarding the quadrant
using either of (1) - (4) to determine the new position of the 2) FORWARD MOVEMENT
drone as is shown in Fig. 8. Where d is the unit distance of The forward movement occurs when the bounding box area
0.02m moved by the drone in a backward direction repeatedly is less than 5000px means that a face is too far from the

VOLUME 12, 2024 119473

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

Algorithm 1 Real-Time Face Tracking Control with a ROS

1: Import necessary libraries and packages
2: rospy, ast, std_msgs (String, Float64)
3: geometry_msgs (Point, Pose)
4: gazebo_msgs (ModelStates)
5: nav_msgs (Odometry)
6: time (sleep), re
7: tf.transformations (quaternion_from_euler)
FIGURE 10. Conditions to rotate to the right or left. (a) If the center of the
face is near the left edge of the red rectangle the drone will rotate
8: numpy as np, math
0.02 rad to the left. (b) If the center of the face is near the right edge of 9: Define methods for callbacks:
the red rectangle the drone will rotate 0.02 rad to the right. 10: object_detection - Extract bbox and set flag3
11: coordinate_callback - Update c1 and area
drone and the drone must perform the forward movement to 12: face_found_callback - Set flag
be able to recognize the person in front of the drone. To move 13: face_match_callback - Set flag2
the drone forward we need to identify in which quadrant the 14: orientation_callback - Update orientation values
drone is located and it is only necessary to multiply by −1 (1) 15: kill_callback - Set kill_program
to (4), as a result, we have the equations to move forward as 16: orientation_t265_callback - Update pose orientation
shown from (5) to (8). The variable distance d has an initial and position
value of 0.1 meters which increases at a rate of 0.02 meters 17: Define movement methods: right, left, hold_position,
until the bounding box area is more than 5000px. backward, forward
" 1 # 18: Define method new_quaternion - Update orientation
Xforward d cos(α) + 0.09

1 values
vforward = = (5)
Y1forward d sin(α) + 0.09 19: Define class data_processing
" 2 # 20: Define class constructor __init__
Xforward −(d sin(β) + 0.09)

2 21: Initialize constants and flags
vforward = = ) (6)
Y2forward d cos(β) + 0.09 22: Initialize pose and orientation values
" 3
Xforward
# 23: Define ROS subscribers for face recognition and pose
3 −(d cos(θ) + 0.09) data
vforward = = (7)
Y3forward −(d sin(θ) + 0.09) 24: Define ROS publishers for pose and yaw angle
" 4
Xforward
# feedback
4 d sin(γ ) + 0.09 25: Define main control loop
vforward = = (8)
Y4forward −(d cos(γ ) + 0.09) 26: Check for kill program signal
27: If face detected:
3) HOVERING MOVEMENT 28: If face too close:
In case the bounding box area is between 5001px to 29000px 29: Log message and update pose to move backward
means it is a safe distance between the drone and the person. 30: Publish updated pose
Then, if the person moves to the right the drone will rotate 31: If face too far:
to the right, if the person moves to the left the drone moves 32: Log message and update pose to move forward
to the left. This action is done by modifying the yaw angle 33: Publish updated pose
with a rate of 0.02 rad. The frame of the camera is 640 × 34: If face within safety area:
480 pixels, which means the horizontal axis is from 0px to 35: Log message and hold position.
640px. In case the face is located near the left edge of the 36: Adjust yaw angle and pose based on the bounding
camera frame, which means less than 200px in the horizontal box center, then move to the right or left.
axis, the drone will rotate to the left, and in case the face is 37: Publish updated pose and yaw angle feedback
located near the right edge of the camera frame, which means 38: Define main function
more than 400px in the horizontal axis, the drone will rotate 39: Initialize ROS node
to the right as Fig. 10. 40: Create instance of data_processing
41: Keep node running with rospy.spin()
F. FACE DETECTION
To accomplish the facial recognition system, we are using and
combining four topics, the haar cascade classifier, siamese
network model, Inception V2 architecture, and FaceNet detection can be improved by modifying some thresholds so it
weights. OpenCV provides us with an easy way to detect can detect better a human face instead of some random image.
faces by using a haar cascade classifier. OpenCV provides a The whole documentation about how OpenCV works and
training method or pre-trained model that can be loaded from how to deploy it can be found on the OpenCV website [55].
the OpenCV installation folder [54]. This method of facial Fig. 11 shows face detection running on Ubuntu.

119474 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

FIGURE 12. Siamese network representation used in this research.

If f(X1) and f(X2) are the encoding vectors of the same person then the
Euclidean distance must be small. If f(X1) and f(X2) are the encoding
vectors of different persons then the Euclidean distance must be large.

FIGURE 11. An image showing the implementation of face detection

using OpenCV haar cascade classifier.

G. SIAMESE NETWORK
Convential facial recognition systems work in four main
steps: detection, alignment, representation, and classification.
A thousand face images are trained and then the final model
can classify who the person is. This method works well but it
has problems as if we want to add a new person to the database
we must train the network again. Fig. 12 shows a Siamese
model representation, it has two inputs that are the images we
want to compare. Each image is passed through a convolution
neural network to determine the 128-dimensional vector of
one input. Thus, we have two 128-dimensional vectors as an
output. Hence, we compute the Euclidean distance between
the two outputs. This method of recognizing faces is one of
the best options since it only requires a few images as inputs.
FIGURE 13. Database from our classmates to be recognized by the drone.
First, we collect data by taking 95 photos of 5 subjects
which in total is 475 photos as Fig. 13 shows. This database
is going to be one input of the Siamese network. Second, we reduced the input size to (96 × 96 × 3) to better
we implement a Python script to capture video and send the match the typical dimensions of facial images.
frame video as a second input of the Siamese network. It is • Initial Layers: We retained the initial structure
called the Siamese network because there are two inputs for of convolution followed by batch normalization
the same DNN and has one output which is the Euclidean and ReLU activation. Specifically, we employed a
distance which determines if the face in front of the drone Conv2D layer with 64 filters, a kernel size of
matches the faces in the database. (7 × 7), stride 2, and ‘same’ padding, followed by a
We have collected in total of 475 photos of our classmates, BatchNormalization layer and a ReLU activation.
these photos were taken from the camera installed in the • Initial Pooling: Similar to InceptionV2, we used a
drone before flying so we can obtain better face recognition MaxPooling2D layer with a pool size of (3 × 3),
accuracy. Fig. 14 shows us the height of testing and the test stride 2, and ‘same’ padding to reduce the dimension-
environment. ality of the extracted features.
To adapt the InceptionV2 architecture to our specific • Inception Blocks: We implemented several Inception
facial recognition needs, we made several key modifications blocks, although with specific configurations tailored to
to the original structure. The following summarizes the our task. These blocks include convolutions of different
implemented changes: sizes (e.g., (1 × 1), (3 × 3), (5 × 5)) and pooling layers,
• Input Size: The original InceptionV2 architecture uses combined in a way that maintains a balance between
an input size of (299 × 299 × 3). In our version, spatial feature extraction and computational efficiency.

VOLUME 12, 2024 119475

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

training a network from scratch to recognize faces because

it will take a lot of time and computation, instead, we do
transfer-learning using the weights of the FaceNet model [56]
that was trained with thousands of images from the Labeled
Faces in the Wild database [57]. FaceNet weight can be
downloaded from GitHub since it is an open source [58].
To load the FaceNet weights we need to implement a net-
work architecture using TensorFlow-Keras, thus, we imple-
ment the network architecture following the inception model
that has been published and can be found on GitHub. This
network architecture follows the Inception model which was
tested with image classification and detection. The inception
architecture we have used in this research can be found
in [58]. We have implemented an Inception network with
three inputs: anchor, negative, and positive, and a single
output of a 128-dimensional vector. After 100 epochs, a loss
of 0.0017 was achieved on the training data and 0.0388 on
the validation data. Figures 23 and 24 display the loss results
for the training and validation data, respectively. As shown,
FIGURE 14. Height of 1.2 meters from the landing pad. The highest flight the use of weights from a pre-trained network facilitates
height will be between 1 meter and 1.5 meters high, which makes the faster convergence of the loss, which indicates that the model
total altitude from the ground to the camera 1.80 meters.
achieves superior accuracy in recognizing the faces within
our database. The architecture is implemented using Keras
• Reduction Layers: Although the original architecture and TensorFlow. Besides, this implementation has the triplet
uses (1 × 1) convolutions to reduce dimensionality loss as a loss function. The Triplet loss equation is shown
before applying larger convolutions, our design makes in (9), where A is the anchor which means the database, P is
limited use of this technique due to the smaller input positive which means random images of the same person in
dimensions and the specificity of the facial recognition the database, and N is negative which means random images
task. from different people not included in the database; triplet loss
• Final Layers: Unlike InceptionV2, which uses final has these three parameters, anchor, positives, and negatives.
dense layers and softmax for classification, our version The Anchor and the positives must be the encoding of the
employs a dense layer followed by L2 normalization to same image person while negatives must be the encoding of
produce embeddings. These embeddings are essential random image faces as Fig. 15 shows.
for verification and recognition tasks in a Siamese
m
network setup. X 2 2
J= [||f(Ai ) − f(Pi )||2 − ||f(Ai ) − f(Ni )||2 +α] (9)
• Parameters and Complexity: Our version has a total | {z } | {z }
i=1
of 3,743,280 parameters, optimized to work efficiently
with smaller facial images, maintaining an adequate The goal of this research was to detect, recognize, and
balance between accuracy and computational efficiency. track the target person within the database. As a video input
These modifications allow the adapted network to retain sensor, we use an ELP USB camera connected to the Jetson
the structural advantages of the Inception architecture TX2, Jetson TX2 is an onboard computer installed in the
while being specifically tailored to the needs of our facial drone. Currently, there are various deep learning algorithms
recognition task in a Siamese network. to recognize faces such as vgg19 [10], we have tested vgg19
in an Ubuntu computer and the performance was enough
good with a 94% accuracy but since vgg19 is a heavy model,
Jetson TX2 cannot run a siamese network using vgg19 as
the main architecture, Thus, we have chosen the inception v2
model plus the weights of the FaceNet unified system [56].
To summarize, our customized Inception network has been
FIGURE 15. Triplet Loss representation. Maximizes the distance between modified to include three inputs: anchor, positive, and
the anchor and the negative and minimizes the distance between the negative. We then perform transfer learning using the weights
anchor and the positive [56].
from FaceNet and our own dataset. After training, we obtain
a model in TensorFlow-Keras format. This model, which now
H. FACIAL RECOGNITION has updated weights, is used in the Siamese network. The
As we mentioned before, we are using a Siamese network Siamese network is composed of the base network, which is
to obtain good accuracy in recognizing people. We are not our customized Inception network with the trained weights.

119476 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

The operation of the Siamese network is explained in the J. SIMPLE MATCHING REAL-TIME TRACKING - SMRT
subsection on the Siamese network. One of the primary goals of this research was to enable the
OpenCV has a haar cascade classifier to detect faces, drone to follow a specific person in front of it, provided that
we have implemented a Python code to run this classifier the individual is recognized within the database. To achieve
and we have modified the minimum neighbor’s parameter to this objective, we implemented the Simple Online and
obtain better facial detection. The model plus the weights are Realtime Tracking (SORT) algorithm, which is noted for its
loaded beforehand and then we create a database dictionary ease of use [59]. While this algorithm is capable of tracking
where we pass the whole database through the new model and various objects across frames, our application required
obtain the 128-dimensional vector of each picture within the tracking individuals by name rather than by ID. For example,
database. After that, we capture a photo within the bounding in frame n, the SORT algorithm may track an individual
box generated by the haar cascade classifier, and each photo as ID 2, and the facial recognition system might identify
taken by the camera is passed through the model to obtain the this person as Juan. However, in the subsequent frame n+1,
128-dimensional vector. Finally, we compute the Euclidean the SORT algorithm could continue tracking the same ID 2,
distance between the database encoding vector and the photo but the facial recognition system might label the person
encoding vector taken by the camera. Then, the Python differently or fail to recognize the individual, even though it
script selects the minimum Euclidean distance between the is still Juan. To address this issue, we developed an algorithm
encoding vector from the photo against the encoding vectors that matches the tracked ID with the facial recognition name,
from the database. Thus, if the minimum distance is less than as detailed in Algorithm 2.
0.66 means that the subject in front of the camera matches The functioning of the entire facial recognition system,
someone in the database, in case the minimum distance is including Algorithm 2, is described in Algorithm 3. First,
more than 0.66 means the subject in front of the camera is the ‘‘triplet loss’’ function is defined and a custom function
not in the database; the minimum distance can be changed to for stacking embeddings is registered. Subsequently, the
increase the number of subjects recognized. Real-time facial pre-trained model ‘‘siamesemodelv2.keras’’ is loaded and
recognition is running in the ROS framework, in case the compiled using the Adam optimizer and the ‘‘triplet loss’’
drone detects a face it will stop for a few seconds to catch function. A database of facial embeddings is created from
the face better, so it can crop and send the face image to the images stored in the specified directory. Next, video capture
Siamese network, if the face is not in the database the drone is initialized, and video output recording is configured. In a
will rotate a few radians ignoring the face in front of the drone, continuous loop, the system processes each video frame,
otherwise will follow the face. detects faces using a Haar classifier, and for each detected
face, performs facial recognition by comparing it with the
I. TRAINING DATASET database of embeddings. The SMRT algorithm is utilized to
All the subjects signed a consent form agreeing to the use of enable the drone to follow a specific person, provided that
their facial data for this research, but not for it to be made into the individual is recognized within the database. Finally, the
a public dataset. recognition result is displayed in real-time video and the
To perform transfer learning, we utilized the pre-trained processed video is stored until execution is terminated.
weights of the FaceNet model [56]. Our dataset is divided into If the Euclidean distance between the camera image and
three subsets: Anchor, Positive, and Negative. The Anchor the database is greater than 0.53, then the person is considered
subset contains 5 folders, each corresponding to one of the unknown. However, if the person was recognized in the
5 subjects. Each folder includes 48 images, resulting in a total previous frame, we need to ensure the consistency of the
of 240 images for the Anchor subset. Similarly, the Positive result. Let’s assume that the person in front of the camera
subset comprises the same 5 subjects with 48 images per is not recognized, so the average distance value would be
folder, totaling 240 images. For the Negative subset, we used greater than 0.53. This would cause the value in dictionary
the Labeled Faces in the Wild (LFW) dataset [57], which A to be 0. If the person’s ID is 1, since SORT assigns this ID,
originally contains 1,473 images. We randomly selected then the value of B in the dictionary would be ID 1, and the
240 images from the LFW dataset to ensure a balanced key in dictionary B would be ‘Unknown’. In the next frame,
training set. with the same unknown person, if the Euclidean distance is
The images were resized to 96 × 96 pixels to match the less than 0.53, then the value of B in the dictionary, which
input size required by the Siamese network. The Anchor is 1, is compared with the ID generated by SORT, which
and Positive images were captured in controlled indoor would also be 1 (since we are only detecting one person and
environments to maintain consistent lighting and background ignoring the others). Thus, the result would be the same as
conditions, enhancing the uniformity of the dataset. Each the previous frame: the value of B in the dictionary would
subject was photographed from various angles and with be ID 1, and the key would be ‘Unknown’. On the other
different expressions to create a comprehensive and varied hand, if the person is recognized and the average distance is
dataset. This meticulous preparation ensures that the model less than 0.53, the value of B in the dictionary, which is 1,
is robust and can generalize well to new data. is compared with the new ID generated by SORT, which

VOLUME 12, 2024 119477

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

Algorithm 2 Simple Matching Real-Time Tracking Algorithm 3 Facial Recognition and Tracking Using Siamese
Require: avg - Average Euclidian distance between image Model With SMRT Algorithm
path encoding and the encodings from the database. Require: avg_val - Average Euclidean distance between
Require: A.value, B.value - Dictionaries to store the recog- image path encoding and the encodings from the
nized values for the current and previous frame. database.
Require: id - tracking ID from SORT algorithm. Require: A_dict, B_dict - Dictionaries to store the recog-
Ensure: A.key, B.key - final determined identity keys. nized values for the current and previous frame.
Ensure: A.value, B.value - final determined identity values. Require: id_N - Tracking ID from SORT algorithm.
1: if avg > 0.53 then Ensure: A_dict.key, B_dict.key - Final determined identity
2: if A.value == id then keys.
3: A.key ← Identity Ensure: A_dict.value, B_dict.value - Final determined
4: A.value ← idA identity values.
5: else 1: Initialize video capture and output configuration.
6: B.key ← Unknown 2: Initialize face detector and tracker.
7: B.value ← idB 3: Initialize identity tracking variables.
8: end if 4: while True do
9: else 5: Read a frame from the video capture.
10: if B.value == id then 6: if frame is read successfully then
11: B.key ← Unknown 7: Detect faces in the grayscale frame using the Haar
12: B.value ← idB classifier.
13: else 8: Initialize an empty list for detections.
14: A.key ← Identity 9: if faces are detected then
15: A.value ← idA 10: for each detected face do
16: end if 11: Extract and resize the region of interest (ROI).
17: end if 12: Perform facial recognition using the model to
obtain the minimum distance and identity.
13: Save the minimum distance in a CSV file.
would be 2 (since it is a new person). This would result in the 14: Round the avg_val.
value of A in the dictionary being 2, and the key in dictionary 15: end for
A being the identity given by the facial recognition system. 16: Update the tracker with detections.
Similarly, in the following frame with the same person, but 17: for each box in the updated tracker do
when the average distance is greater than 0.53, the value 18: Extract coordinates and ID.
of A in the dictionary would be 2 and would match the ID 19: Call SMRT _Algorithm1(avgv al, A.value,
generated by SORT, which is 2. This would result in the value B.value) with the current parameters to update
of A in the dictionary being 2, and the key being the identity the tracking state.
given by the facial recognition system. This ensures that even 20: end for
though the recognized person moves and the average distance 21: else
changes, the drone can follow the known person in front of it. 22: Display "No faces detected" on the frame.
23: end if
IV. EXPERIMENTS 24: if frame is read successfully then
In this section, we detail the experiment conducted to 25: Write the frame to the video output.
evaluate the recognition and tracking of faces in three 26: end if
different environments during real flight test mode. These 27: Display the frame.
environments include flying indoors, characterized as a 28: end if
GPS-denied environment; flying outdoors, distinguished 29: if exit condition is met (e.g., ’q’ key is pressed) then
as a GPS-enabled environment; and no-flying mode. The 30: Break the loop.
following subsection describes the environment setup and the 31: end if
system setup. 32: end while
33: Release video capture and output resources.
A. ENVIRONMENT SETUP 34: Destroy all windows.
Three environments are used in this research to analyze the
drone behavior, response time, and accuracy of the facial
recognition system. Fig. 16 shows the first environment, the
drone is located on the desk. This environment is just to obtain
the accuracy of the facial recognition system in the ideal caused by the drone or some other disturbance that can affect
scenario. The ideal scenario refers to not having vibration the facial recognition system.

119478 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

FIGURE 17. The second environment: the drone is seen in a position

before attaining the fixed altitude in a closed area where GPS does not
work.

FIGURE 16. The first environment: the drone is seen in a position with no
disturbance that would affect the face recognition system.

The second environment is set up as shown in Fig. 17.

In this environment, the GPS signal does not work because
it is a closed environment, and we need to rely on visual
odometry from the T265 camera. The searching-tracking
mode is the complete test we have done, in this mode, the
drone rotates searching for faces and then must stop when
a face is in front of the drone and track the face only if
the face is within the database otherwise, it must rotate to FIGURE 18. The third environment: the drone is seen in a position before
attaining the fixed altitude in an open area where GPS does work.
look for other faces, as well as the drone, moves backward
and forward to maintain a safety distance from recognized
faces. the MAVLink messages in the QGroundControl software on
Fig. 18 shows the third test environment, in this experi- the local PC. Table 1 displays the pin-out of the Jetson TX2
ment, we test the performance of the system in situations carrier board.
where natural lights can affect the system. In all three Table 1 shows the pin map of the extension connector,
environments set up, the facial recognition system experiment we are using the UART1 port to connect the carrier board with
was performed with 5 participants. During the test in the first the pixhawk4. The flight controller Pixhawk is connected to
environment, the participants stood in front of the camera of the jetson TX2 via UART and Table 2 shows the connection
the drone. In the setups for the second and third environments, between the pins of the jetson TX2 and the Pixhawk.
the participants positioned themselves in front of the drone A complete guide on how to connect the Pixhawk and the
while it was flying. Subsequently, the drone operator, in this Jetson TX2 devkit can be found in [60]. The last step before
case the author of this research, gave instructions to the flying the drone is to modify a few parameters in the Pixhawk
participant to walk towards the right until reaching point B firmware. Table 3 shows the parameters to be modified and
(explained in section V-D), and then proceed to point C. its values. After this setup, the drone is ready to fly in the
The participant walked while looking directly at the drone onboard mode.
camera. For the third environment, GPS was not used;
instead, we relied solely on the T265 sensor. However, in the V. EXPERIMENTAL RESULTS
initial experiments, we observed that the sensor struggled This section presents the findings of our research, organized
to obtain its position due to the sandy terrain. To address into three subsections corresponding to the first, second,
this, we added markers of various colors and shapes so that and third experimental environments. Besides, we show
the T265 camera could use the patterns on the markers as the results of the SMRT algorithm, and without using the
reference points. algorithm in each subsection.

B. SYSTEM SETUP A. FIRST ENVIRONMENT

The autonomous drone requires a specific setup before To replicate the ideal conditions for facial recognition without
takeoff. Since we are using an onboard computer, we need interference from drone vibrations, we positioned the drone
to connect it to the flight controller and modify certain on a desk and executed the facial recognition system. In this
parameters, as shown in Table 3. Additionally, we must set setup, we assessed the performance of the facial recognition
the parameter MAV_1_FORWARD to 1 in order to observe system.

VOLUME 12, 2024 119479

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

TABLE 1. Pinout of the expansion IO connector.

TABLE 2. Pixhawk telemetry to Jetson TX2 UART0 pin mapping.

FIGURE 19. Confusion matrix. Siamese facial recognition system using

SMRT algorithm in the first environment.

1) SIAMESE FACIAL RECOGNITION SYSTEM recognition system using the SMRT algorithm and compare
The drone captured a total of 170 images of each person, them against the values of the Siamese facial recognition
totaling 850 images. The model has been trained with these without using the novel algorithm as shown in Table 6.
images using transfer learning techniques, after running
B. SECOND ENVIRONMENT
the model in real-time, it achieves an overall accuracy of
The second environment is designed to evaluate the facial
approximately 98.21%, meaning the model can recognize the
recognition system while the drone is flying. In this way, the
person in front of the camera of the drone with a 98.21%
vibration of the drone and the task of tracking the face of the
accuracy, This accuracy was calculated as the division
person can disrupt the facial recognition system. The drone
between the total number of correct predictions and the total
is located in a classroom where GPS signal is not available,
number of captured images. Table 4 shows us the results for
and automatic takeoff is initiated followed by the execution
each person. The Siamese network model proves to be a good
of the facial recognition system.
tool for face recognition, offering acceptable precision, recall,
and F1-Score results.
1) SIAMESE FACIAL RECOGNITION SYSTEM
Similarly to the previous experiment, a total of 200 images
2) SIAMESE FACIAL RECOGNITION SYSTEM USING SMRT were captured for each person, resulting in 1000 images in
ALGORITHM total. The model was trained with these images using transfer
After running the model in real-time with the Siamese learning techniques. Upon executing the model in real time,
network combined with the SMRT algorithm, the model it achieved an overall accuracy of approximately 97.72%.
can recognize the person in front of the camera attached Table 7 presents the individual results for each person. The
to the drone with a 99.62% accuracy. This demonstrates Siamese network model proves to be a valuable tool for
an improvement over the initial expectations, highlighting face recognition, delivering acceptable precision, recall, and
the effectiveness of the SMRT algorithm in enhancing the F1-Score outcomes.
recognition of the model capabilities. Table 5 shows us the
results for each person. The combination of the Siamese 2) SIAMESE FACIAL RECOGNITION SYSTEM USING SMRT
network model and the SMRT algorithm proves to be a ALGORITHM
powerful tool for face recognition, offering robust precision, After running the model in real-time with the Siamese
recall, and F1-Score results across different individuals. network combined with the SMRT algorithm, it achieves
The high overall accuracy and the detailed performance an overall accuracy of approximately 99.32%. This
metrics for each individual, underscore the ability of the demonstrates an improvement over the initial expectations,
model to identify persons in diverse conditions reliably. highlighting the effectiveness of the SMRT algorithm in
From the confusion matrix in Fig. 19, we can estimate the enhancing the model’s recognition capabilities. Table 8 shows
accuracy, precision, recall, and F1-score of the Siamese facial us the results for each person. The combination of the

119480 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

TABLE 3. Mavlink parameter settings.

TABLE 4. Performance metrics for the siamese model. TABLE 7. Performance metrics for the siamese model.

TABLE 5. Performance metrics for the siamese model + SMRT algorithm. TABLE 8. Performance metrics for the siamese model + SMRT algorithm.

Siamese network model and the SMRT algorithm proves to be experiment was limited to two individuals for safety reasons,
a powerful tool for face recognition, offering robust precision, as it was conducted in a real-world scenario with winds that
recall, and F1-Score results across different individuals. could potentially cause the drone to move towards a person.
The high overall accuracy and the detailed performance
metrics for each individual underscore the ability of the 1) SIAMESE FACIAL RECOGNITION SYSTEM
model to identify persons in diverse conditions reliably. The drone captured a total of 130 images of each person,
From the confusion matrix in Fig. 20, we can estimate the totaling 260 images. The model has not been trained with
accuracy, precision, recall, and F1-score of the Siamese facial these images, so a low accuracy was expected. After running
recognition system using the SMRT algorithm and compare the model in real-time, it achieves an overall accuracy of
them against the values of the Siamese facial recognition approximately 90.0%. Table 10 shows us the results for each
without using the novel algorithm as shown in Table 9, in the person. The Siamese network model proves to be a good
siamese facial recognition system, precision is higher than tool for face recognition, offering acceptable precision, recall,
accuracy, which in turn indicates that the model is good at and F1-Score results. However, the variability in performance
identifying positive cases for one or more classes but not as among different individuals highlights the need to train the
good at correctly classifying certain specific classes. model with our data.

C. THIRD ENVIRONMENT 2) SIAMESE FACIAL RECOGNITION SYSTEM USING SMRT

The third environment is located in an open area with natural ALGORITHM
lighting. The purpose of this experiment was to evaluate After running the model in real-time with the Siamese
the performance of the facial recognition system under network combined with the SMRT algorithm, the model
natural lighting and real weather conditions. On the day of does not show the expected results, as it can recognize
the experiment, the sky was partly cloudy with light rain. the person in front of the drone’s camera with an 81.15%
We chose to conduct this experiment with only two subjects: accuracy. Table 11 shows us the results for each person. The
the first subject being the author of this investigation, and the combination of the Siamese network model and the SMRT
second subject being the laboratory research assistant. The algorithm proves to be a powerful tool for face recognition,

TABLE 6. Performance metrics for our siamese model against siamese TABLE 9. Performance metrics for our siamese model against siamese
model+SMRT algorithm in the first environment. model+SMRT algorithm in the second environment.

VOLUME 12, 2024 119481

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

TABLE 10. Performance metrics for the siamese model.

TABLE 11. Performance metrics for the siamese model + SMRT algorithm.

FIGURE 20. Confusion matrix. Siamese facial recognition system using

SMRT algorithm in the second environment.

but this will depend on other factors such as the number of

images obtained by the drone, the quality of the camera, the
light intensity on a cloudy day, etc. From the confusion matrix
in Fig. 21, we can estimate the accuracy, precision, recall, and
F1-score of the Siamese facial recognition system using the
SMRT algorithm and compare them against the values of the
Siamese facial recognition without using the novel algorithm FIGURE 21. Confusion matrix. Siamese facial recognition system using
as shown in Table 12. SMRT algorithm in the third environment.

TABLE 12. Performance metrics for our siamese model against siamese
D. STATE OF THE ART IN FACIAL RECOGNITION model+SMRT algorithm in the third environment.
In the results presented, we focused more on the first two
testing environments since the number of images, light
intensity, and environment are parameters we can control.
Using the SMRT algorithm enhances facial recognition, espe-
cially during drone tracking. In this subsection, We compare
our method with other state-of-the-art facial recognition
models employing Siamese networks and present our results
from the second environment. Table13 presents several calculated. Table 14 shows the response times from one point
facial recognition models. Most of these models have been to another. It can be observed that for short distances, the time
trained on thousands of data points. In our case, we use is long, which may be due to the need for improvement in
transfer learning, which means we can use the weights the SMRT + SORT algorithms to increase their efficiency.
of an already trained network, such as FaceNet [56], and Additionally, the time from point B to point A is slightly
only train the final layers of the Inception network by different because the person’s face is not detected correctly
updating their weights. This approach allows us to achieve for a few seconds. The same occurs from point A to point C,
our goal of recognizing only the individuals who are in our where the drone fails to detect the face, and the person needs
database. to move slightly to be detected. These errors can be addressed
in future research by training the Siamese network with more
E. FACE TRACKING TIME data and improving the tracking algorithm. Furthermore, it is
The drone tracks faces in front of the camera. To display necessary to analyze whether the processing of the SMRT
the tracking results, we conducted an experiment as shown algorithm is performed on the CPU or GPU of the Jetson
in Fig. 22, where the person stands in front of the drone at TX2. Finally, it can be appreciated that the drone is capable
point A, then moves to point B, returns to point A, and finally of recognizing a person in a database and tracking their
moves to point C. The elapsed time between each point is movement.

119482 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

FIGURE 22. Face tracking experiment.

VI. PERFORMANCE EVALUATION FIGURE 24. Comparison of validation loss results between the pretrained
model and the non-pretrained model.
To address the computational cost, inference time, and
running time of our proposed method, we conducted several
evaluations. frame. The inference time for our model was approximately
0.12 milliseconds per frame, which is sufficient for real-time
facial recognition applications.

C. RUNNING TIME
The running time encompasses the total time taken for the
entire process, including facial detection, facial recognition,
the use of the SMRT algorithm, and data storage. The tracking
time is presented in Table 14. The average execution time,
excluding tracking, was approximately 0.24 milliseconds,
demonstrating the system’s capability to operate effectively
in real-time scenarios. The tracking time is higher because the
speed of the movement of the subject in front of the camera
is slow, allowing for better control of the UAV in case of an
emergency.

VII. DISCUSSION
The facial recognition system consists of three important
components: facial detection, facial recognition, and face
FIGURE 23. Comparison of loss results between the pretrained model
and the non-pretrained model.
tracking. Each task is essential for the operation of the system.
For facial detection, we have utilized the Haar Cascade
A. COMPUTATIONAL COST
algorithm from OpenCV. Although this algorithm is not very
The computational cost is evaluated in terms of FLOPS effective compared to others using deep learning, it remains
(Floating Point Operations Per Second). Our facial recog- useful for conserving computational resources on the Jetson
nition system comprises a Siamese network for training. TX2.
For inference, we only use the InceptionV2 model with Three experiments have been conducted where it is
the weights from the trained Siamese model. Therefore, necessary to measure the accuracy of the facial recognition
we have measured the FLOPS for each layer of the inference system and the time it takes to track a person. In the first
InceptionV2 network, as shown in Table 15. The total environment, the drone was positioned above the desk. This
computational cost of the inference model is approximately is because we need to simulate a setting with ideal flight
0.48 billion FLOPS, indicating its efficiency and feasibility conditions, free from vibrations or other disturbances that
for real-time applications on the Jetson TX2 platform. could affect facial recognition.
During the first experiment, the drone captured 850 images
B. INFERENCE TIME of the person in front of it. The facial recognition system was
Inference time refers to the time it takes for the model to trained using the weights of FaceNet. As a result, the Siamese
process an input and produce an output. In our experiments, network achieved an accuracy of 98.21%. Subsequently,
the inference time is measured by running the model on we incorporated our new algorithm SORT+SMRT, achieving
the Jetson TX2 and calculating the time taken to process a an accuracy of 99.62%.

VOLUME 12, 2024 119483

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

TABLE 13. Performance comparison of different methods.

TABLE 14. Tracking time between specific points. removal and reducing algorithm execution time, as well as
optimizing how the SMRT algorithm is integrated.
For the third experiment, we conducted our research on a
football field. For safety reasons, we chose to conduct this
experiment with only two participants: the research author
and a lab partner. The objective was to analyze the behavior of
This improvement suggests that integrating the SMRT the facial recognition system in a real environment and under
algorithm with the Siamese network significantly enhances real conditions. The day was cloudy, causing significant
its facial recognition capabilities, potentially eliminating the variations in light intensity. Additionally, there was light rain,
need for extensive additional training with new data to which could have affected facial recognition. It is important
achieve high levels of accuracy. This is because the SMRT to note that part of the facial recognition system includes
algorithm tracks the name of the person in front of the camera the drone taking precautions if a person is very close to
and, even if the Siamese model predicts a different person it, as explained in Section III-C. The system also features
in front of the camera, the SMRT algorithm will continue to automatic landing.
assign the previous name unless the person disappears from During takeoff, there were no strong winds, allowing
the camera frame. the drone to maintain its position. Throughout the flight,
For the second experiment, the environment is a classroom the drone captured 130 images of each person, totaling
where the drone is flown safely. Two parameters are 260 images. The obtained result was an accuracy level of
measured: first, the accuracy of facial recognition while the approximately 90%. However, compared to the previous two
drone is flying and tracking a person, and second, the time experiments, the model exhibited decreased accuracy due
elapsed between point A and point C. to various external factors such as vibrations, changes in
A total of 1000 images were captured during the drone’s natural light intensity, and light winds that could have shifted
flight in the second environment, achieving an accuracy the drone during flight. Simultaneously, the SORT+SMRT
of 97.72%. The accuracy was lower compared to the first algorithm was executed, achieving an accuracy level of
environment, demonstrating that external factors such as 81.15%, which did not meet our expectations.
drone vibration, rotation, light intensity, etc., significantly This is attributed to the operation of the siamese network,
affect facial recognition. which has two inputs: a database and images captured by
Next, we applied the SORT+SMRT algorithm to analyze the camera. Given that the database was created in an
how much the accuracy improves. We obtained an accuracy environment with artificial light, the accuracy level is better
of 99.32%, showing an improvement of almost 2%. This is indoors than outdoors. The lower accuracy of the SMRT
interesting because we were able to increase the accuracy algorithm may be due to using two previous images before the
of the facial recognition system twice more than the first current one to obtain a better reference. If these two previous
environment and achieve similar and better results than images were classified as belonging to a different person, the
other models. The SMRT algorithm can still be improved SMRT algorithm may maintain this incorrect classification,
by incorporating image processing techniques such as noise even if the siamese network correctly identifies the person

119484 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

TABLE 15. Details of the facial recognition model Layers.

again. Subsequently, when the siamese network briefly stops along with the SORT and SMRT algorithms. In summary,
detecting and then resumes, or when the SORT algorithm we have identified two key areas requiring improvement:
loses track of the face, the SMRT algorithm may reassign the first, changing the face detection from Haar cascade to
correct person’s name. another deep learning-based algorithm; second, optimizing
Both the siamese network and the SORT and SMRT algo- the execution time of the siamese network + SORT +
rithms work together for precise tracking and identification SMRT. Additionally, we plan to train the Siamese neural
of individuals during drone flights. network with more diverse data, including different light
The precision level of this latest experiment could be intensities and environments, which will also incorporate
enhanced by adding more photos of people in natural lighting deep learning for facial detection, human pose recognition,
environments or applying data augmentation techniques to tracking control accuracy measurement, and robust tracking
obtain a variety of images. Despite this, the achieved preci- algorithm.
sion level is close to that obtained by other facial recognition In conclusion, our work demonstrates the potential to
methods, as shown in Table 12. It is important to highlight contribute to Latin American society, which faces high crime
that our research presents results from experiments conducted rates, through the use of drones capable of detecting, recog-
while the drone was flying and tracking individuals, unlike nizing, and tracking wanted individuals. Our implementation
other studies that only show images captured by a drone of the Siamese network + SORT + SMRT contributes
without considering its behavior during flight. to achieving the system’s ultimate goal. It is important to
Next, we present the results of the time it takes for the drone mention that this research is conducted to contribute to
to travel from point A to point B, from point B back to point A, society and strictly prohibits its use for purposes that threaten
and finally from point A to point C. We observed that the time the lives of living beings.
required to travel from A to B was approximately 20 seconds,
primarily because the person in front of the drone was moving VIII. CONCLUSION
at a similar speed to the drone. If the person moved too In this paper, we have developed an autonomous drone
quickly, the drone could not keep up, as seen when the person capable of recognizing a person’s face and following them
moved from B to A in 17.4 seconds, a faster time because the in GPS-denied environments. The facial recognition system
person was facing the drone while moving. However, when includes our new algorithm SMRT, which enhances facial
traveling from point A to point B, the drone momentarily recognition accuracy. Our proposed method achieves an
lost track of the person’s face and could not detect it until accuracy of 94.45% using the SMRT algorithm, which
the person slightly moved their head, after which the drone is acceptable compared to other conventional algorithms
quickly recognized and resumed tracking. The same occurred given that the Siamese network is untrained. Field test
when traveling from point A to point C; the drone lost track results indicate that the proposed method performs well in
of the person’s face because they were moving faster than the indoor environments with artificial lighting, although the
drone and went out of the camera’s field of view. The person dataset lacks diversity. The drone has demonstrated the
had to step back for the drone to detect their face again and ability to perform autonomous flights and autonomous person
resume tracking. tracking. The benefits obtained from this research allow us to
One reason the drone loses sight of faces is due to implement a new version of our drone with gait recognition
using the Haar cascade algorithm from OpenCV for facial and human pose estimation for improved tracking capability.
detection, which is only effective with frontal faces and The implementation of the facial recognition system in drones
cannot detect rotated faces. Another reason for prolonged issues a deeper understanding of the potential use of drones
times is the execution time required for the Siamese network to reduce crime and violence in the world.
VOLUME 12, 2024 119485
D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

REFERENCES [22] A. Mukashev, L.-D. Van, S. Sharma, M. F. Tandia, and Y.-C. Tseng,
‘‘Person tracking by fusing posture data from UAV video and wearable
[1] G. Sánchez-Rentería, F. J. Bonilla-Escobar, A. Fandiño-Losada, and
sensors,’’ IEEE Sensors J., vol. 22, no. 24, pp. 24150–24160, Dec. 2022,
M. I. Gutiérrez-Martinez, ‘‘Observatorios de convivencia y seguridad
doi: 10.1109/JSEN.2022.3218373.
ciudadana: Herramientas para la toma de decisiones y gobernabilidad,’’
[23] K. Kim, J. Kim, H.-G. Lee, J. Choi, J. Fan, and J. Joung, ‘‘UAV
Revista Peruana de Medicina Experim. Salud Pública, vol. 33, no. 2,
chasing based on YOLOv3 and object tracker for counter UAV
p. 362, Jun. 2016, doi: 10.17843/rpmesp.2016.332.2203.
systems,’’ IEEE Access, vol. 11, pp. 34659–34673, 2023, doi:
[2] Homicide-Estimates | DataUNODC. Accessed: May 18, 2022. [Online].
10.1109/ACCESS.2023.3264603.
Available: https://dataunodc.un.org/content/homicide-estimates [24] T. Keawboontan and M. Thammawichai, ‘‘Toward real-time UAV multi-
[3] A. Izquierdo, C. Pessino, and G. Vuletin, Better Spending for Better target tracking using joint detection and tracking,’’ IEEE Access, vol. 11,
Lives: How Latin America and the Caribbean Can Do More with Less pp. 65238–65254, 2023, doi: 10.1109/ACCESS.2023.3283411.
| Publications. Accessed: Aug. 24, 2024. [Online]. Available: https:// [25] M. Alhafnawi, H. A. B. Salameh, A. Masadeh, H. Al-Obiedollah,
publications.iadb.org/en/publications/english/viewer/Better-Spending-for M. Ayyash, R. El-Khazali, and H. Elgala, ‘‘A survey of indoor and outdoor
-Better-Lives-How-Latin-America-and-the-Caribbean-Can-Do-More- UAV-based target tracking systems: Current status, challenges, technolo-
with-Less.pdfLives-How-Latin-America-and-the-Caribbean-Can-Do- gies, and future directions,’’ IEEE Access, vol. 11, pp. 68324–68339, 2023,
More-with-Less.pdf doi: 10.1109/ACCESS.2023.3292302.
[4] A. Devi and A. Marimuthu, ‘‘An efficient self-updating face recognition [26] D. Herrera and H. Imamura, ‘‘Design of facial recognition system
system for plastic surgery face,’’ ICTACT J. Image Video Process., vol. 7, implemented in an unmanned aerial vehicle for citizen security in Latin
no. 1, pp. 1307–1317, Aug. 2016, doi: 10.21917/ijivp.2016.0191. America,’’ ITM Web Conf., vol. 27, May 2019, Art. no. 04002, doi:
[5] Counseling With Artificial Intelligence—Counseling Today. Accessed: 10.1051/itmconf/20192704002.
Nov. 1, 2018. [Online]. Available: https://ct.counseling.org/2018/ [27] El Cañón Del Colca Registra El Mayor Número De Turistas Perdidos
01/counseling-artificial-intelligence/ en Arequipa | RPP Noticias. Accessed: Nov. 15, 2023. [Online]. Avail-
[6] Real-Time Facial Recognition Technology | Oosto. Accessed: able: https://rpp.pe/peru/actualidad/el-canon-del-colca-registra-el-mayor-
Nov. 14, 2023. [Online]. Available: https://oosto.com/ numero-de-turistas-perdidos-en-arequipa-noticia-1166600?ref=rpp
[7] Your Face is, or Will be, Your Boarding Pass—The New York Times. [28] E. Çintas, B. Özyer, and E. Simsek, ‘‘Vision-based moving UAV
Accessed: Oct. 16, 2023. [Online]. Available: https://www.nytimes. tracking by another UAV on low-cost hardware and a new ground
com/2021/12/07/travel/biometrics-airports-security.html control station,’’ IEEE Access, vol. 8, pp. 194601–194611, 2020, doi:
[8] N. Delbiaggio, ‘‘A comparison of facial recognition’s algorithms,’’ 10.1109/ACCESS.2020.3033481.
M.S. thesis, Degree Programme Bus. Inf. Technol., Haaga-Helia Univ. [29] J. Li, D. H. Ye, M. Kolsch, J. P. Wachs, and C. A. Bouman, ‘‘Fast and
Appl. Sci., Helsinki, Finland, 2017. [Online]. Available: https://www. robust UAV to UAV detection and tracking from video,’’ IEEE Trans.
theseus.fi/bitstream/handle/10024/132808/Delbiaggio_Nicolas.pdf?seque Emerg. Topics Comput., vol. 10, no. 3, pp. 1519–1531, Jul. 2022, doi:
nce=1 10.1109/TETC.2021.3104555.
[30] H.-J. Hsu and K.-T. Chen, ‘‘DroneFace: An open dataset for drone
[9] K. Simonyan and A. Zisserman. (2015). Very Deep Convolutional
research,’’ in Proc. 8th ACM Multimedia Syst. Conf., Jun. 2017,
Networks for Large-Scale Image Recognition. Accessed: Jun. 1, 2022.
pp. 187–192, doi: 10.1145/3083187.3083214.
[Online]. Available: http://www.robots.ox.ac.uk/
[31] Face++—Face++ Cognitive Services. Accessed: Nov. 15, 2023.
[10] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for [Online]. Available: https://www.faceplusplus.com/
large-scale image recognition,’’ 2014, arXiv:1409.1556. [32] Image Recognition Software, Ml Image & Video Analysis—Amazon
[11] F. Schroff, D. Kalenichenko, and J. Philbin, ‘‘FaceNet: A unified Rekognition—AWS. Accessed: Nov. 15, 2023. [Online]. Available:
embedding for face recognition and clustering,’’ 2015, arXiv:1503. https://aws.amazon.com/rekognition/
03832. [33] H.-J. Hsu and K.-T. Chen, ‘‘Face recognition on drones: Issues and
[12] Z. He, ‘‘Deep learning in image classification: A survey report,’’ in Proc. limitations,’’ in Proc. 1st Workshop Micro Aerial Vehicle Netw., Syst., Appl.
2nd Int. Conf. Inf. Technol. Comput. Appl. (ITCA), Dec. 2020, pp. 174–177, Civilian Use, May 2015, pp. 39–44, doi: 10.1145/2750675.2750679.
doi: 10.1109/ITCA52113.2020.00043. [34] R. Jurevičius, N. Goranin, J. Janulevičius, J. Nugaras, I. Suzdalev, and
[13] D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck, ‘‘Deep learn- A. Lapusinskij, ‘‘Method for real time face recognition application in
ing for identifying metastatic breast cancer,’’ 2016, arXiv:1606.05718. unmanned aerial vehicles,’’ Aviation, vol. 23, no. 2, pp. 65–70, Dec. 2019,
[14] G. Koch, R. Zemel, and R. Salakhutdinov, ‘‘Siamese neural networks for doi: 10.3846/aviation.2019.10681.
one-shot image recognition,’’ in Proc. ICML, 2015, pp. 1–8. [35] Dlib C++ Library. Accessed: Nov. 15, 2023. [Online]. Available:
[15] S. Sambolek and M. Ivasic-Kos, ‘‘Automatic person detection in search http://dlib.net
and rescue operations using deep CNN detectors,’’ IEEE Access, vol. 9, [36] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
pp. 37905–37922, 2021, doi: 10.1109/ACCESS.2021.3063681. recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[16] U. Azmat, S. S. Alotaibi, M. Abdelhaq, N. Alsufyani, M. Shorfuzzaman, Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
A. Jalal, and J. Park, ‘‘Aerial insights: Deep learning-based human [37] A. Srivastava, T. Badal, P. Saxena, A. Vidyarthi, and R. Singh, ‘‘UAV
action recognition in drone imagery,’’ IEEE Access, vol. 11, surveillance for violence detection and individual identification,’’ Auto-
pp. 83946–83961, 2023, doi: 10.1109/ACCESS.2023.3302353. mated Softw. Eng., vol. 29, no. 1, May 2022, doi: 10.1007/s10515-022-
00323-3.
[17] F. Schiano, D. Natter, D. Zambrano, and D. Floreano, ‘‘Autonomous
[38] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
detection and deterrence of pigeons on buildings by drones,’’ IEEE Access,
recognition,’’ 2015, arXiv:1512.03385.
vol. 10, pp. 1745–1755, 2022, doi: 10.1109/ACCESS.2021.3137031.
[39] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely
[18] C.-J. Chen, Y.-Y. Huang, Y.-S. Li, Y.-C. Chen, C.-Y. Chang, and Connected Convolutional Networks. Accessed: Jan. 4, 2024. [Online].
Y.-M. Huang, ‘‘Identification of fruit tree pests with deep learning on Available: https://github.com/liuzhuang13/DenseNet.
embedded drone to achieve accurate pesticide spraying,’’ IEEE Access, [40] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking
vol. 9, pp. 21986–21997, 2021, doi: 10.1109/ACCESS.2021.3056082. the inception architecture for computer vision,’’ in Proc. IEEE Conf.
[19] G. Zeng, Y. He, Z. Yu, X. Yang, R. Yang, and L. Zhang, ‘‘Preparation of Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826.
novel high copper ions removal membranes by embedding organosilane- [41] A. G. Howard et al., ‘‘MobileNets: Efficient convolutional neural
functionalized multi-walled carbon nanotube: Preparation of novel high networks for mobile vision applications,’’ Apr. 2017. [Online]. Available:
copper ions removal membranes,’’ J. Chem. Technol. Biotechnol., vol. 91, http://arxiv.org/abs/1704.04861
no. 8, pp. 2322–2330, Aug. 2016, doi: 10.1002/jctb.4820. [42] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, ‘‘Learning transferable
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification architectures for scalable image recognition,’’ 2017, arXiv:1707.07012.
with deep convolutional neural networks,’’ in Proc. NIPS, 2012, pp. 1–9. [43] E. B. Nievas, O. D. Suarez, G. B. García, and R. Sukthankar, ‘‘Violence
[Online]. Available: http://code.google.com/p/cuda-convnet/ detection in video using computer vision techniques,’’ in Computer Analy-
[21] L. Meng, T. Hirayama, and S. Oyanagi, ‘‘Underwater-drone with sis of Images and Patterns (Lecture Notes in Computer Science (including
panoramic camera for automatic fish recognition based on deep subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
learning,’’ IEEE Access, vol. 6, pp. 17880–17886, 2018, doi: Bioinformatics)), vol. 6855, 2011, pp. 332–339, doi: 10.1007/978-3-642-
10.1109/ACCESS.2018.2820326. 23678-5_39.

119486 VOLUME 12, 2024

D. A. Herrera Ollachica et al.: Autonomous UAV Implementation for Facial Recognition and Tracking

[44] M. M. Soliman, M. H. Kamal, M. A. E.-M. Nashed, Y. M. Mostafa, [67] X. Wei, H. Wang, B. Scotney, and H. Wan, ‘‘Minimum margin loss for deep
B. S. Chawky, and D. Khattab, ‘‘Violence recognition from videos face recognition,’’ Pattern Recognit., vol. 97, Jan. 2020, Art. no. 107012.
using deep learning techniques,’’ in Proc. 9th Int. Conf. Intell. [68] J. Sun, W. Yang, R. Gao, J.-H. Xue, and Q. Liao, ‘‘Inter-class angular
Comput. Inf. Syst. (ICICIS), Dec. 2019, pp. 80–85, doi: 10.1109/ICI- margin loss for face recognition,’’ Signal Process., Image Commun.,
CIS46948.2019.9014714. vol. 80, Feb. 2020, Art. no. 115636.
[45] J. Sandino, F. Vanegas, F. Maire, P. Caccetta, C. Sanderson, and [69] Y. Wu, Y. Wu, R. Gong, Y. Lv, K. Chen, D. Liang, X. Hu, X. Liu, and J. Yan,
F. Gonzalez, ‘‘UAV framework for autonomous onboard navigation and ‘‘Rotation consistent margin loss for efficient low-bit face recognition,’’ in
people/object detection in cluttered indoor environments,’’ Remote Sens., Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
vol. 12, no. 20, p. 3386, Oct. 2020, doi: 10.3390/rs12203386. pp. 6865–6875.
[46] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, [70] H. Ling, J. Wu, J. Huang, J. Chen, and P. Li, ‘‘Attention-based
‘‘The PASCAL visual object classes (VOC) challenge,’’ Int. J. Comput. convolutional neural network for deep face recognition,’’ Multimedia Tools
Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010, doi: 10.1007/s11263-009- Appl., vol. 79, nos. 9–10, pp. 5595–5616, Mar. 2020.
0275-4. [71] B. Wu and H. Wu, ‘‘Angular discriminative deep feature learning for face
[47] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, verification,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
and A. Zisserman, ‘‘The PASCAL visual object classes challenge: A (ICASSP), May 2020, pp. 2133–2137.
retrospective,’’ Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, Jan. 2015, [72] B. Ma, Z. Liu, W. Zhao, J. Yuan, H. Long, X. Wang, and Z. Yuan, ‘‘Target
doi: 10.1007/s11263-014-0733-5. tracking control of UAV through deep reinforcement learning,’’ IEEE
[48] N. Davis, F. Pittaluga, and K. Panetta, ‘‘Facial recognition using human Trans. Intell. Transp. Syst., vol. 24, no. 6, pp. 5983–6000, Jun. 2023, doi:
visual system algorithms for robotic and UAV platforms,’’ in Proc. IEEE 10.1109/TITS.2023.3249900.
Conf. Technol. Practical Robot Appl. (TePRA), Apr. 2013, pp. 1–5. [73] B. Ma, Z. Liu, F. Jiang, W. Zhao, Q. Dang, X. Wang, J. Zhang, and L. Wang,
[49] A. S. Priambodo, F. Arifin, A. Nasuha, and A. Winursito, ‘‘Face ‘‘Reinforcement learning based UAV formation control in GPS-denied
tracking for flying robot quadcopter based on Haar cascade classifier environment,’’ Chin. J. Aeronaut., vol. 36, no. 11, pp. 281–296, Nov. 2023,
and PID controller,’’ J. Phys., Conf. Ser., vol. 2111, no. 1, Nov. 2021, doi: 10.1016/j.cja.2023.07.006.
Art. no. 012046, doi: 10.1088/1742-6596/2111/1/012046.
[50] P. M. Wyder et al., ‘‘Autonomous drone hunter operating by deep
learning and all-onboard computations in GPS-denied environments,’’ DIEGO A. HERRERA OLLACHICA (Member,
PLoS One, vol. 14, no. 11, 2019, Art. no. e0225092, doi: 10.1371/jour- IEEE) received the B.S. degree in mechatronic
nal.pone.0225092. engineering from the Technological University of
[51] J. Redmon and A. Farhadi, ‘‘YOLO9000: Better, faster, stronger,’’ in
Peru, Lima, Peru, in 2017, and the M.S. degree in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 6517–6525, doi: 10.1109/CVPR.2017.690. information system science from Soka University,
[52] Mavros—ROS Wiki. Accessed: Nov. 15, 2023. [Online]. Available: Tokyo, Japan, in 2020, where he is currently
http://wiki.ros.org/mavros pursuing the Ph.D. degree in information system
[53] T. Zhou and Y. Liu, ‘‘Long-term person tracking for unmanned aerial science.
vehicle based on human–machine collaboration,’’ IEEE Access, vol. 9, From 2016 to 2018, he was a Research and
pp. 161181–161193, 2021, doi: 10.1109/ACCESS.2021.3132077. Development Engineer at LabTop Peru Inc., Lima.
[54] OpenCV. Cascade Classifier—OpenCV 3.4 Documentation. Accessed: From 2020 to 2023, he was a Research Assistant for the JICA-JST
Apr. 13, 2024. [Online]. Available: https://docs.opencv.org/3.4/db/ SATREPS-EARTH project in Tokyo. His research interests include drones,
d28/tutorial_cascade_classifier.html artificial intelligence, deep learning, and robotics applied to help society.
[55] OpenCV Contributors. OpenCV Documentation. Accessed: Apr. 13, 2024.
He was awarded as the Best Oral Presentation at the 6th International
[Online]. Available: https://docs.opencv.org/4.5.5/
[56] F. Schroff, D. Kalenichenko, and J. Philbin, ‘‘FaceNet: A unified Postgraduate Conference on Biotechnology, in 2023, at the National
embedding for face recognition and clustering,’’ in Proc. IEEE Conf. University of Singapore.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 815–823.
[57] Univ. Massachusetts, Boston, MA, USA. LFW Face Database:
Main. Accessed: May 31, 2022. [Online]. Available: http://vis- BISMARK K. ASIEDU ASANTE (Member, IEEE)
www.cs.umass.edu/lfw/ received the B.S. degree in computer science
[58] D. Herrera. Face Recognition Inception KERAS—GitHub Repository. and physics and the M.Phil. degree in computer
Accessed: Jul. 5, 2022. [Online]. Available: https://github.com/ science from the University of Ghana, in 2012 and
DiegoHerrera1890/facial-recognition-system-implemented-in-an-unmann 2017, respectively, and the Ph.D. degree in infor-
ed-aerial-vehicle/tree/master/Face_recognition_Inception_KERAS mation system science engineering from Soka
[59] A. Bewley. Sort/Sort.py at Master—GitHub Repository. Accessed: University, in 2024.
Apr. 13, 2024. [Online]. Available: https://github.com/abewley/sort/ In 2024, he assumed the position of an Assistant
blob/master/sort.py
Professor at Soka University, specifically with
[60] D. Herrera. Pixhawk Connected to Jetson Tx2 Devkit—GitHub
Repository. Accessed: Apr. 13, 2024. [Online]. Available: https://
the Department of Information System Science
github.com/DiegoHerrera1890/Pixhawk-connected-to-Jetson-Tx2-devkit Engineering. During his Ph.D. research, he published papers on speech
[61] S.-C. Chong, A. B. J. Teoh, and T.-S. Ong, ‘‘Unconstrained face enhancement and obstacle avoidance strategies for the visually impaired.
verification with a dual-layer block-based metric learning,’’ Multimedia His research interests include artificial intelligence and deep learning, with
Tools Appl., vol. 76, no. 2, pp. 1703–1719, Jan. 2017. a focus on applying these technologies to address human and environmental
[62] C. Xiong, L. Liu, X. Zhao, S. Yan, and T.-K. Kim, ‘‘Convolutional fusion challenges.
network for face verification in the wild,’’ IEEE Trans. Circuits Syst. Video
Technol., vol. 26, no. 3, pp. 517–528, Mar. 2016.
[63] J. Zhang, X. Jin, Y. Liu, A. K. Sangaiah, and J. Wang, ‘‘Small sample face HIROKI IMAMURA (Member, IEEE) received
recognition algorithm based on novel Siamese network,’’ J. Inf. Process. the B.S. degree in engineering from Soka
Syst., vol. 14, no. 6, pp. 1464–1479, 2018, doi: 10.3745/JIPS.02.0101. University, Japan, in 1997, and the M.S. and Ph.D.
[64] M. Heidari and K. Fouladi-Ghaleh, ‘‘Using Siamese networks with degrees in information science from JAIST, Japan,
transfer learning for face recognition on small-samples datasets,’’ in in 1999 and 2023, respectively.
Proc. Int. Conf. Mach. Vis. Image Process. (MVIP), 2020, pp. 1–4, doi:
From 2003 to 2009, he was an Assistant
10.1109/MVIP49855.2020.9116915.
[65] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, Professor with Nagasaki University, Japan.
‘‘CosFace: Large margin cosine loss for deep face recognition,’’ in Proc. From 2009 to 2020, he was an Associate Professor
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5265–5274. at Soka University, where he has been a Professor,
[66] H. Ben Fredj, S. Bouguezzi, and C. Souani, ‘‘Face recognition in since 2020. His research interests include image
unconstrained environment with CNN,’’ Vis. Comput., vol. 37, no. 2, processing, artificial intelligence, and XR.
pp. 217–226, 2021.

VOLUME 12, 2024 119487

ijatcse011242023
No ratings yet
ijatcse011242023
8 pages
3
No ratings yet
3
6 pages
Face Recognition Based Surveillance System Using FaceNet and MTCNN On Jetson TX2
No ratings yet
Face Recognition Based Surveillance System Using FaceNet and MTCNN On Jetson TX2
6 pages
Facial Recognition and AI in Drone Technology
No ratings yet
Facial Recognition and AI in Drone Technology
15 pages
References
No ratings yet
References
17 pages
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Face Detection and Recognition for Criminal
No ratings yet
Face Detection and Recognition for Criminal
5 pages
Fin Irjmets1686037369
No ratings yet
Fin Irjmets1686037369
5 pages
Hybrid Motion and Face Recognition With Detection For Criminal Identifications
No ratings yet
Hybrid Motion and Face Recognition With Detection For Criminal Identifications
6 pages
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computers and Electrical Engineering: Chenglin Yu, Hailong Pei
No ratings yet
Computers and Electrical Engineering: Chenglin Yu, Hailong Pei
13 pages
a-survey-of-face-detection-and-recognition-system
No ratings yet
a-survey-of-face-detection-and-recognition-system
15 pages
Drone
No ratings yet
Drone
4 pages
Low-Cost Camera-Based Smart Surveillance MTCNN
No ratings yet
Low-Cost Camera-Based Smart Surveillance MTCNN
16 pages
Advanced Face Detection Using Machine Learning and AI-based Algorithm
No ratings yet
Advanced Face Detection Using Machine Learning and AI-based Algorithm
6 pages
Deep Learning For Face Recognition: A Critical Analysis: Andrew Jason Shepley
No ratings yet
Deep Learning For Face Recognition: A Critical Analysis: Andrew Jason Shepley
27 pages
Journal Paper-2
No ratings yet
Journal Paper-2
11 pages
Teoh 2021 J. Phys. Conf. Ser. 1755 012006
No ratings yet
Teoh 2021 J. Phys. Conf. Ser. 1755 012006
10 pages
Face Recognition Based on Deep Learning a Comprehe
No ratings yet
Face Recognition Based on Deep Learning a Comprehe
19 pages
Stranger Detection: Yada Arun Kumar
No ratings yet
Stranger Detection: Yada Arun Kumar
9 pages
Face Recognition On UAV AI Drone
No ratings yet
Face Recognition On UAV AI Drone
6 pages
Real_Time_Face_Recognition_System_at_the_Edge
No ratings yet
Real_Time_Face_Recognition_System_at_the_Edge
14 pages
Multi TargetFacial Recognition System Us PDF
No ratings yet
Multi TargetFacial Recognition System Us PDF
7 pages
Using Technology and Algorithms for Face Detection and Recognition Using Digital Image Processing 14328
No ratings yet
Using Technology and Algorithms for Face Detection and Recognition Using Digital Image Processing 14328
14 pages
S_57_Popereshnyak_Skoryk
No ratings yet
S_57_Popereshnyak_Skoryk
11 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
edit(1)
No ratings yet
edit(1)
16 pages
Face Rec
No ratings yet
Face Rec
6 pages
Ijsrp-P14508 (1) Paper4
No ratings yet
Ijsrp-P14508 (1) Paper4
19 pages
Synp Face
No ratings yet
Synp Face
16 pages
Kortli 2020
No ratings yet
Kortli 2020
6 pages
Face Recognition Systems
No ratings yet
Face Recognition Systems
5 pages
report[1]
No ratings yet
report[1]
29 pages
CCTV Surveillance
No ratings yet
CCTV Surveillance
9 pages
A Project Report
No ratings yet
A Project Report
22 pages
Rescued Document 1
No ratings yet
Rescued Document 1
8 pages
Facial Recognition in Web Camera Using Deep Learning Under Google COLAB
No ratings yet
Facial Recognition in Web Camera Using Deep Learning Under Google COLAB
5 pages
Surveillance System using opencv-report
No ratings yet
Surveillance System using opencv-report
17 pages
A Real-Time Framework For Human Face Detection and
No ratings yet
A Real-Time Framework For Human Face Detection and
12 pages
Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
No ratings yet
Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
9 pages
Mingtsung 2020 J. Phys. Conf. Ser. 1684 012126
No ratings yet
Mingtsung 2020 J. Phys. Conf. Ser. 1684 012126
14 pages
information-16-00107
No ratings yet
information-16-00107
41 pages
s11042-020-09964-6
No ratings yet
s11042-020-09964-6
21 pages
Presentation Changes
No ratings yet
Presentation Changes
12 pages
Arduino Based FCS
No ratings yet
Arduino Based FCS
69 pages
Robust Multi Sensor Facial Recognition in Real Time Using Nvidia Deepstream IJERTV11IS010096
No ratings yet
Robust Multi Sensor Facial Recognition in Real Time Using Nvidia Deepstream IJERTV11IS010096
6 pages
JETIR2206199
No ratings yet
JETIR2206199
17 pages
Facial Recognition in The Opening of A Door Using Deep Learning
No ratings yet
Facial Recognition in The Opening of A Door Using Deep Learning
6 pages
CHAPTER ONE
No ratings yet
CHAPTER ONE
39 pages
33face Recognition Using Neural Networks Sona College
No ratings yet
33face Recognition Using Neural Networks Sona College
4 pages
HAND Gesture
No ratings yet
HAND Gesture
9 pages
[email protected]
No ratings yet
[email protected]
6 pages
Face Recognition Approach Via Deep and Machine Lea
No ratings yet
Face Recognition Approach Via Deep and Machine Lea
13 pages
Mini Project
No ratings yet
Mini Project
10 pages
Face Recognition
No ratings yet
Face Recognition
50 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
AI in Face Detection
No ratings yet
AI in Face Detection
2 pages
Face_detection_method_based_on_improved_YOLO-v4_ne
No ratings yet
Face_detection_method_based_on_improved_YOLO-v4_ne
12 pages
3606-Article Text-6772-1-10-20210422
No ratings yet
3606-Article Text-6772-1-10-20210422
14 pages
FaceRecognitionandFaceDetectionBenefits
No ratings yet
FaceRecognitionandFaceDetectionBenefits
7 pages
901325_chapter 10
No ratings yet
901325_chapter 10
56 pages
Twenty-Five_Years_of_Signal_Processing_Advances_for_Multiantenna_Communications_From_theory_to_mainstream_technology
No ratings yet
Twenty-Five_Years_of_Signal_Processing_Advances_for_Multiantenna_Communications_From_theory_to_mainstream_technology
11 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
28 pages
Analysis of Composite Metallic and Dielectric Structures - WIPL-D Code
No ratings yet
Analysis of Composite Metallic and Dielectric Structures - WIPL-D Code
8 pages
Effective Robotics Programming with ROS Anil Mahtani 2024 Scribd Download
100% (2)
Effective Robotics Programming with ROS Anil Mahtani 2024 Scribd Download
55 pages
Romeo7
No ratings yet
Romeo7
13 pages
ROS Spinning, Threading, Queuing: Effective Use of Multi Spinner Threads, Different Queues in ROS
No ratings yet
ROS Spinning, Threading, Queuing: Effective Use of Multi Spinner Threads, Different Queues in ROS
11 pages
(Navigators) : Addverb Advantage
No ratings yet
(Navigators) : Addverb Advantage
1 page
Ros Based Autonomous Robot With Opencv: Danish Faraaz S Desmond Savio Rozario Dr. K. Arun Vasantha Geethan
No ratings yet
Ros Based Autonomous Robot With Opencv: Danish Faraaz S Desmond Savio Rozario Dr. K. Arun Vasantha Geethan
4 pages
Amr Ramadan - CV
No ratings yet
Amr Ramadan - CV
3 pages
Tools For Dynamics Simulation of Robots: A Survey Based On User Feedback
No ratings yet
Tools For Dynamics Simulation of Robots: A Survey Based On User Feedback
15 pages
ROS-based SLAM and Navigation For A Gazebo-Simulated Autonomous Quadrotor
No ratings yet
ROS-based SLAM and Navigation For A Gazebo-Simulated Autonomous Quadrotor
5 pages
Bumblebee Robosub Paper 2022
No ratings yet
Bumblebee Robosub Paper 2022
9 pages
FYP Proposal Presentation Final
No ratings yet
FYP Proposal Presentation Final
14 pages
Cloud Robotics (Tech Seminar)
100% (2)
Cloud Robotics (Tech Seminar)
19 pages
s7ppt1.pptx
No ratings yet
s7ppt1.pptx
27 pages
Ros Overview Slides - Key
No ratings yet
Ros Overview Slides - Key
58 pages
Ei For Amr - Developer Guide - 2022.3.1 767160 768295
No ratings yet
Ei For Amr - Developer Guide - 2022.3.1 767160 768295
256 pages
(Ebook) Programming Robots with ROS by Morgan Quigley, Brian Gerkey, William D. Smart ISBN 9781449323899, 1449323898 - The ebook is available for quick download, easy access to content
100% (1)
(Ebook) Programming Robots with ROS by Morgan Quigley, Brian Gerkey, William D. Smart ISBN 9781449323899, 1449323898 - The ebook is available for quick download, easy access to content
52 pages
ROS Cheat Sheet Indigo
No ratings yet
ROS Cheat Sheet Indigo
1 page
Kalb Report - Cse
No ratings yet
Kalb Report - Cse
20 pages
Deedy Resume Reversed
No ratings yet
Deedy Resume Reversed
2 pages
ECE 470 Introduction To Robotics Lab Manual: Jonathan K. Holm Jifei Xu Yinai Fan
No ratings yet
ECE 470 Introduction To Robotics Lab Manual: Jonathan K. Holm Jifei Xu Yinai Fan
77 pages
Top Read Computer Science and Informatio
No ratings yet
Top Read Computer Science and Informatio
54 pages
Development of An Intelligent Vital Sign Monitoring Robot System
No ratings yet
Development of An Intelligent Vital Sign Monitoring Robot System
21 pages
EE50237 - Robotics Software 4a
No ratings yet
EE50237 - Robotics Software 4a
13 pages
TLTK01
No ratings yet
TLTK01
6 pages
PythonRobotics A Python Code Collection of Robotic
No ratings yet
PythonRobotics A Python Code Collection of Robotic
8 pages
Design and Development of Pipe-Inspection Robot
No ratings yet
Design and Development of Pipe-Inspection Robot
10 pages
ETF - Latex - Template - BSC - Zavrsni - Rad - (1) - Radna - Verzija
No ratings yet
ETF - Latex - Template - BSC - Zavrsni - Rad - (1) - Radna - Verzija
39 pages
Robotics Course CURRICULUM
No ratings yet
Robotics Course CURRICULUM
2 pages
Design and Implementation of A Robotic Arm Using Ros and Moveit!
No ratings yet
Design and Implementation of A Robotic Arm Using Ros and Moveit!
6 pages
ROS-LLM: A ROS Framework For Embodied AI With Task Feedback and Structured Reasoning
No ratings yet
ROS-LLM: A ROS Framework For Embodied AI With Task Feedback and Structured Reasoning
26 pages
Simulation Exercises With ROS
No ratings yet
Simulation Exercises With ROS
11 pages

Autonomous_UAV_Implementation_for_Facial_Recognition_and_Tracking_in_GPS-Denied_Environments (1)

Uploaded by

Autonomous_UAV_Implementation_for_Facial_Recognition_and_Tracking_in_GPS-Denied_Environments (1)

Uploaded by

Received 26 July 2024, accepted 13 August 2024, date of publication 22 August 2024, date of current version 4 September 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3447899

Autonomous UAV Implementation for

I. INTRODUCTION as the policies aimed at reducing crime in Latin America often

VOLUME 12, 2024 119465

119466 VOLUME 12, 2024

VOLUME 12, 2024 119467

119468 VOLUME 12, 2024

VOLUME 12, 2024 119469

119470 VOLUME 12, 2024

B. THE OPERATION OF THE DRONE AND THE SYSTEM

VOLUME 12, 2024 119471

• EKF2_AID_MASK: Configure the fusion of vision

119472 VOLUME 12, 2024

1) If the bounding box area of the face exceeds 29,000 px

FIGURE 8. First quadrant from 0 to π/2 with the equation to move

FIGURE 7. Trigonometric circle aligned with the drone coordinates system.

For controlling the drone in an optimal position for

VOLUME 12, 2024 119473

Algorithm 1 Real-Time Face Tracking Control with a ROS

119474 VOLUME 12, 2024

FIGURE 12. Siamese network representation used in this research.

FIGURE 11. An image showing the implementation of face detection

VOLUME 12, 2024 119475

training a network from scratch to recognize faces because

119476 VOLUME 12, 2024

VOLUME 12, 2024 119477

119478 VOLUME 12, 2024

FIGURE 17. The second environment: the drone is seen in a position

The second environment is set up as shown in Fig. 17.

B. SYSTEM SETUP A. FIRST ENVIRONMENT

VOLUME 12, 2024 119479

TABLE 1. Pinout of the expansion IO connector.

TABLE 2. Pixhawk telemetry to Jetson TX2 UART0 pin mapping.

FIGURE 19. Confusion matrix. Siamese facial recognition system using

119480 VOLUME 12, 2024

TABLE 3. Mavlink parameter settings.

C. THIRD ENVIRONMENT 2) SIAMESE FACIAL RECOGNITION SYSTEM USING SMRT

VOLUME 12, 2024 119481

TABLE 10. Performance metrics for the siamese model.

FIGURE 20. Confusion matrix. Siamese facial recognition system using

but this will depend on other factors such as the number of

119482 VOLUME 12, 2024

FIGURE 22. Face tracking experiment.

VOLUME 12, 2024 119483

TABLE 13. Performance comparison of different methods.

119484 VOLUME 12, 2024

TABLE 15. Details of the facial recognition model Layers.

119486 VOLUME 12, 2024

VOLUME 12, 2024 119487

You might also like