0% found this document useful (0 votes)
5 views

data-07-00053

The document presents a dataset of 15,070 traffic images captured from UAVs in Spain, aimed at training machine vision algorithms for traffic management. The dataset includes labeled vehicles and is designed to enhance the performance of intelligent vehicle management systems, particularly in complex traffic scenarios like roundabouts. It is publicly accessible and formatted for compatibility with popular neural network models, addressing a gap in resources for traffic management applications.

Uploaded by

Miloš Basarić
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

data-07-00053

The document presents a dataset of 15,070 traffic images captured from UAVs in Spain, aimed at training machine vision algorithms for traffic management. The dataset includes labeled vehicles and is designed to enhance the performance of intelligent vehicle management systems, particularly in complex traffic scenarios like roundabouts. It is publicly accessible and formatted for compatibility with popular neural network models, addressing a gap in resources for traffic management applications.

Uploaded by

Miloš Basarić
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

data

Data Descriptor
Dataset: Traffic Images Captured from UAVs for Use in Training
Machine Vision Algorithms for Traffic Management
Sergio Bemposta Rosende 1 , Sergio Ghisler 2 , Javier Fernández-Andrés 3 and Javier Sánchez-Soriano 4, *

1 Department of Science, Computing and Technology, Universidad Europea de Madrid,


28670 Villaviciosa de Odón, Spain; [email protected]
2 Stirling Square Capital Partners LLP, London SW3 4LY, UK; [email protected]
3 Department of Industrial and Aerospace Engineering, Universidad Europea de Madrid,
28670 Villaviciosa de Odón, Spain; [email protected]
4 Higher Polytechnic School, Universidad Francisco de Vitoria, 28223 Pozuelo de Alarcón, Spain
* Correspondence: [email protected]

Abstract: A dataset of Spanish road traffic images taken from unmanned aerial vehicles (UAV) is
presented with the purpose of being used to train artificial vision algorithms, among which those
based on convolutional neural networks stand out. This article explains the process of creating the
complete dataset, which involves the acquisition of the data and images, the labeling of the vehicles,
anonymization, data validation by training a simple neural network model, and the description of the
structure and contents of the dataset (which amounts to 15,070 images). The images were captured
by drones (but would be similar to those that could be obtained by fixed cameras) in the field of
intelligent vehicle management. The presented dataset is available and accessible to improve the
performance of road traffic vision and management systems since there is a lack of resources in this
specific domain.
Citation: Bemposta Rosende, S.;
Ghisler, S.; Fernández-Andrés, J.; Dataset: https://zenodo.org/record/5776219.
Sánchez-Soriano, J. Dataset: Traffic
Images Captured from UAVs for Use Dataset License: Creative Commons Attribution 4.0 International.
in Training Machine Vision
Algorithms for Traffic Management. Keywords: dataset; UAV; intelligent vehicle; machine leaning; computer vision; convolutional neural
Data 2022, 7, 53. https://doi.org/ network; model deployment; autonomous driving; roundabouts; deep learning; traffic management
10.3390/data7050053

Academic Editors: Joaquín


Torres-Sospedra and Kamran Sedig
1. Introduction
Received: 20 December 2021
The use of UAVs began in the early 21st century for military purposes [1–3], but over
Accepted: 20 April 2022
Published: 25 April 2022
time, they have started to be used in other sectors [4]. One of the most challenging problems
in traffic control is complex maneuvers, such as at traffic roundabouts and intersections [5,6].
Publisher’s Note: MDPI stays neutral UAVs can be used to cooperate with autonomous vehicles [7] or traffic infrastructures [8]
with regard to jurisdictional claims in in these complex maneuvers, providing them with information that would not be available
published maps and institutional affil-
without such aerial vision [9].
iations.
Other works in the field of intelligent transport are the semantic segmentation of
RGB-thermal images [10] as well as the identification of salient objects [11]. In these works,
traditional RGB images are combined with thermal images as well as depth images to take
advantage of all the available environmental information.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
Neural networks have become the state-of-the-art technology for pattern detection [12],
This article is an open access article
so it is easy to find a wide variety of networks for different purposes [13].
distributed under the terms and Other applications and uses of deep convolutional neural networks are proposed by
conditions of the Creative Commons the work in [14]. They work with stereoscopic images in three dimensions, but the need
Attribution (CC BY) license (https:// to use robust and complete datasets remains. Despite this, there is no model capable of
creativecommons.org/licenses/by/ detecting objects in images taken with UAVs or from fixed cameras installed at altitude,
4.0/). due, in part, to the difficulty of detecting small objects [15–17].

Data 2022, 7, 53. https://doi.org/10.3390/data7050053 https://www.mdpi.com/journal/data


Data 2022, 7, 53 2 of 10

There are several datasets with aerial images captured from drones such as Vis-
Drone [18], one of the most widely used. This dataset is composed of videos and images
of all kinds of situations, totaling about 250,000 images of 11 different classes, including
some vehicles. The images range in size from 1344 × 746 pixels to 2688 × 1512 pixels.
All the images were captured in urban and highway environments in China. It does not
contain roundabouts or split roundabouts, which are abundant in Europe and especially
on Spanish roads. In addition, all of the signs and road signs are in Chinese.
For these reasons, the dataset described in this work has been created. This dataset will
help in the training of neural networks and artificial intelligence systems in areas such as
the identification of vehicles and other relevant actors in traffic management and intelligent
vehicles. More specifically, this dataset can be used for several purposes, among which the
most important are:
1. For the training of algorithms to monitor complex infrastructures such as roundabouts
and junctions, etc. The information obtained can be used for V2V and V2I arbitration
and communication systems currently under development in the automotive industry,
which will improve safety in shared traffic.
2. For the training of algorithms that can identify different types of vehicles and can be
implemented in autonomous UAVs used for traffic control and safety.
3. For the training of algorithms developed for the management of traffic violations
monitored using UAVs, increasingly used by traffic agencies in many countries.
4. For the training of algorithms developed for emergency services to use UAVs for the
rapid response to a traffic accident and to minimize the number of victims by the
prompt assistance of emergency services.
5. For the training of algorithms developed for organizations in charge of designing and
managing new road infrastructures to enable a better design of these infrastructures
to improve safety and minimize traffic congestion.

2. Data Description
Our dataset is composed of 15,070 images in png format accompanied by as many
files with txt extension with the description of the elements identified in each of the images.
In total, there are 30,140 files comprising the images and descriptions. The images were
taken in six different locations of urban and interurban roads, those with motorways being
discarded. In these images, 155,328 vehicles have been labeled, including cars (137,602)
and motorcycles (17,726). These data can be seen in Table 1 in more detail.

Table 1. Details of the obtained labels.

Scenes Frames Targets Cars Motorbikes


Regional road 4500 24,858 14,577 10,281
Urban intersection 2462 10,759 10,759 0
Rural road 1292 746 746 0
Split roundabout 2297 3107 3107 0
Roundabout (far) 1814 71,819 64,844 6975
Roundabout (near) 3997 44,039 43,569 470
Total 15,070 155,328 137,602 17,726

The dataset was created in You Only Look Once (YOLO) format [19] due to its
widespread popularity, as well as the ease with which it can be adapted or converted
to other formats due to its characteristics. In this format, the images and their annotations
are named the same way (consecutive integer values, starting at 0) using the extension png
Data 2022, 7, 53 3 of 10
Data 2022, 7, x FOR PEER REVIEW 3 of 10
Data 2022, 7, x FOR PEER REVIEW 3 of 10

for the images and txt for the annotations associated with that image. In the txt files, the
for the images
following and txt
notation for the annotations associated with that image. In the txt files, the
for the images andistxtused
for to
thedefine five fields:
annotations associated with that image. In the txt files, the
following notation is used to define five fields:
following notation is used to define five fields:
<object-class> <x> <y> <width> <height>
<object-class> <x> <y> <width> <height>
<object-class> <x> <y> <width> <height>
Specifically,
Specifically,eacheachofofthe
thefields
fieldscontains
containsthe thefollowing:
following:
Specifically, each of the fields contains the following:
• Object
Object class: Integer number varying between00and
class: Integer number varying between andN-Classes-1.
N-Classes-1.The Thetwotwoclasses
classesthat that
 have
Object class:
been Integer number
incorporated in thevarying
model between
are as 0 and 0.
follows: N-Classes-1.
Cars, 1. The two classes that
Motorcycles.
have been incorporated in the model are as follows: 0. Cars, 1. Motorcycles.
• x,have been incorporated
x,y:y:Decimal
Decimalvalues
in thethe
valuesrelative
relativetoto
model
the
areofasthe
center
center
follows: 0. Cars,
rectangle
of the rectangle
1. Motorcycles.
containing
containing thethe
labeled
labeled object.
ob-
 They
x, y: Decimal
vary in values
thein
range relative to
[0.0 to[0.0 the center
1.0].to 1.0]. of the rectangle containing the labeled ob-
ject. They vary the range
ject. They
• Width, vary in the range [0.0 to 1.0]. to the width and height of the rectangle con-
Width,height:
height:Decimal
Decimalvalues
valuesrelative
relative to the width and height of the rectangle con-
 taining
Width, the height: Decimal values relative to the width and height of the rectangle con-
taining thelabeled
labeledobject.
object.They
Theyvary
varyininthetherange
range[0.0
[0.0toto1.0].
1.0].
taining the labeled object. They vary in the range [0.0 to 1.0].
Figures
Figures11and and22show
showthe thelabeling
labelingof ofvehicles.
vehicles.In Inthe
thecase
caseofofthe
thefirst
firstone,
one,ititisisaasplit
split
Figureswith
roundabout 1 and a 2 show the
complex labeling
situation. of second
The vehicles. oneIn was
the case of theatfirst
captured a one, it is awith
roundabout split
roundabout with a complex situation. The second one was captured at a roundabout with
aroundabout
amultitude
with a complex
multitudeofofvehicles
vehiclesparked
situation.
parkedon onthe
The second
themargins
marginsand
one was
andsome
captured at a roundabout with
somecirculating
circulatinginside.
inside.
a multitude of vehicles parked on the margins and some circulating inside.

Figure1.1.Three
Figure Threevehicles
vehiclesatataasplit
splitroundabout.
roundabout.
Figure 1. Three vehicles at a split roundabout.

Figure 2. Roundabout with cars and motorcycles circulating around it, as well as a multitude of
Figure2.2. Roundabout
Roundabout with cars
cars and motorcycles circulating around it, as well as a multitude of
vehicles parked on thewith
Figure margins. and motorcycles circulating around it, as well as a multitude of
vehicles parked on the margins.
vehicles parked on the margins.
3. Methods
3. Methods
3. Methods
For the construction of the dataset, it was necessary to record videos to obtain our
For the
For the construction
construction ofof the
the dataset,
dataset, itit was
was necessary
necessary to to record
record videos
videos to to obtain
obtain our
our
own image bank. According to the research in [1,20], which studied the requirements for
ownimage
own imagebank.
bank. According
According to to the
the research
research in in [1,20],
[1,20],which
whichstudied
studied the
the requirements
requirementsfor for
recording a dataset of trajectories, we can extrapolate these requirements to the concrete
recordingaadataset
recording datasetofoftrajectories,
trajectories,we wecan
canextrapolate
extrapolate these
these requirements
requirements to tothe
theconcrete
concrete
objective of recognizing objects in images taken by drones.
objective
objectiveof ofrecognizing
recognizingobjects
objectsin inimages
imagestaken
takenby bydrones.
drones.
The dataset must contain many images, as well as many labeled objects within these
The
Thedataset
datasetmust
mustcontain
containmany
manyimages,
images,as aswell
wellasasmany
manylabeled
labeledobjects
objectswithin
withinthese
these
images. On the other hand, the images for model training must have been taken in differ-
images.
images.On Onthetheother
otherhand,
hand,the
theimages
imagesfor formodel
model training
training must
must have been
have beentaken in different
taken in differ-
ent locations, with different visibility conditions to help the model avoid overfitting. Over-
locations, withwith
ent locations, different visibility
different conditions
visibility to help
conditions tothe
helpmodel avoidavoid
the model overfitting. Overfitting
overfitting. Over-
fitting is one of the biggest obstacles in artificial intelligence; it occurs when the model
isfitting
one of is one of the biggest obstacles in artificial intelligence; it occurs when the learns
the biggest obstacles in artificial intelligence; it occurs when the model model
learns in such a way that it can only be applied to the training dataset and, therefore, is
in suchina such
learns way athat
wayit that
can only
it canbe applied
only to thetotraining
be applied dataset
the training and, and,
dataset therefore, is notis
therefore,
not generalizable to other data [21]. Finally, all types of objects must be recognized. When
not generalizable to other data [21]. Finally, all types of objects must be recognized. When
Data 2022, 7, 53 4 of 10

generalizable to other data [21]. Finally, all types of objects must be recognized. When
labeling images, objects that are related to the objects we want to predict should not be
excluded; for example, we cannot exclude trucks from the dataset if we are labeling all cars;
all objects can be included in a category such as “vehicles”, or a category can be created for
each of these objects.
With these requirements in mind, we proceed to describe each of the tasks performed
for the construction of the dataset.

3.1. Obtaining the Dataset


The images of the dataset were taken by the authors using two different aircrafts,
specifically the DJI Mavic Mini 2 UAVs and the Yuneec Typhoon H (Intel Real Sense).
Details of the cameras used can be found in Table 2. The current regulations governing the
civilian use of remotely piloted aircraft in Spain [22] were complied with. To capture the
images, a series of video capture missions were planned, defining the locations, recording
angles and orientations, and schedules, among other aspects. A wide range of data were
obtained in terms of quantity and diversity of locations. Table 1 shows a breakdown of the
photographs that were tagged in the dataset.

Table 2. Detail of the cameras and configuration used for image acquisition.

Yuneec Typhoon H (CGO3) DJI Mavic Mini 2


Resolution 1920 × 1080 px 1920 × 1080 px
FOV angle 98◦ 83◦
Focal (35 mm) 14 mm 24 mm
Aperture f/2.8 f/2.8
Aspect ratio 16:9 19:9
Sensor 1/2.3” CMOS 1/2.3” CMOS

3.1.1. Selection of Locations


Locations were defined considering their interest as well as the variety of scenarios,
with complex junctions and intersections, such as roundabouts or split roundabouts, being
of particular interest. In the same way, crossings and junctions with fast roads are also
relevant due to their high accident rate. These scenarios have in common the possibility
of recording vehicles from a wide range of angles, which allows the learning algorithms
to be fed with greater variety. In addition, some straight road sections have also been
incorporated where vehicles can be seen at a wide range of distances.

3.1.2. Angle and Orientation of the Recordings


The aircrafts used allow the angle of their cameras to be adjusted, which was set
within the range 45–60◦ with respect to the horizontal axis of flight. This range allows one
to capture the sides and the upper part of the targets, as opposed to a completely zenithal
capture at 90◦ where only the upper part would be taken. The orientation chosen for the
different captures was the one that provided the best framing of the scene compared to
other criteria, such as maintaining the north orientation of the upper part of the images.

3.1.3. Height of the Recordings


The height of flight varied within the range 35–120 m, which corresponds to the minimum
height recommended for safety and the maximum by regulation. This variation in heights
allowed us to have a diversity of vehicle sizes and to provide variety to the dataset.
Data
Data
Data 2022,
2022,
2022,7,7,7,
x53x FOR
FOR PEER
PEER REVIEW
REVIEW 5ofof
55of 10
1010

3.1.4.
3.1.4.
3.1.4. Relationship
Relationship
Relationship between
between
between Resolution
Resolution
Resolution and
andand Recording
Recording
Recording Height
Height
Height
Camera
Camera
Camera resolution
resolution
resolution and
andand flight
flight
flight altitude
altitude
altitude are
areare related.
related.
related. The
The
The flight
flight
flight height
height
height was
was
was set
set
set according
according
according
toto
to the
the
the size
size
size ofofofthe
the
the objects
objects
objects tototobe
be be detected,
detected,
detected, which
which
which isisisimposed
imposed
imposed by by
by the
thetheaverage
average
average object
object
object sizetoto
size
size tobe
be be
used
used
used for
forforthe
thethe subsequent
subsequent
subsequent object
object
object recognition
recognition
recognition algorithms.
algorithms.
algorithms.
The width
widthofof
Thewidth
The ofthethe scene
sceneinin
thescene the
inthe images
imagesisisis1920
theimages 1920 pixels.
pixels.IfIfIfan
1920pixels. an element
elementofof
anelement known
ofknown
knownsize size
sizeisisis
identified,
identified,the
identified, the dimensions
thedimensions
dimensionsofofthe of the captured
thecaptured
capturedscene scene
scenecan can be calculated.
canbebecalculated.
calculated.Figure Figure 3 shows
Figure3 3shows
showsthat that
thataaa
BMWBMW
BMW X5
X5X5 (4.86
(4.86
(4.86 mmm long
longlong and
andand 1.78
1.78
1.78 mmm high)
high)
high) has
hashas 120
120120 ×
× 44 44
× 44 pixels
pixels
pixels in
inin the
thethe image.
image.
image. If
If If this
thisthis image
image
image had
had
had
been captured with the camera position at 90 ◦ to the horizontal, the flight height would be
been
been captured
captured with
with the the camera
camera position
position atat90°90°totothe the horizontal,
horizontal, the
the flight
flight height
height would
would bebe
77.76 ◦ to the horizontal, the flight height
77.76
77.76 m.m.
m. However,
However,
However, asasasthethe
the scene
scene
scene isisis shot
shot
shot atat
at
anan
an angle
angle
angle ofof
of45
45°
45° toto
the the horizontal,
horizontal, the the flight
flight height
height
is
is is 55
5555mmm as
asas
cancan
can be
bebe seen
seen
seen in
inin Figure
Figure
Figure 4.
4. 4.

Figure
Figure
Figure3.3.3. Frame
Frame
Frame resolution
resolution
resolution and
and
and vehicle
vehicle
vehicle size
size
sizeininin pixels.
pixels.
pixels.

Figure 4. Camera angle and relation between distance and altitude.


Figure
Figure 4. 4. Camera
Camera angle
angle and
and relation
relation between
between distance
distance and
and altitude.
altitude.
3.1.5. Planning and Execution of the Missions
3.1.5.
3.1.5. Planning
Planning
All and
and
recordings Execution
Execution
were ofof
carried the
the Missions
Missions
out during the day and in good visibility conditions to
comply All with regulations.
recordings were The recordings
carried
All recordings were carried out duringout duringinthe
interurban
theday
dayandandinenvironments
ingood were
goodvisibility
visibility carried out
conditions
conditions toto
during the
complywith
comply weekend, taking
withregulations.
regulations.The advantage
Therecordings of the fact
recordingsinininterurban that this is
interurbanenvironmentswhen most
environmentswere motorbikes
werecarried are
carriedout
out
used
during
during for
the recreational
theweekend, purposes.
weekend,taking The urban
takingadvantage
advantage images
ofofthe
thefact were
factthat taken
thatthis
this during
is iswhen
when themotorbikes
most
most working
motorbikesweek.
are
are
The
used
used flights were carried
forforrecreational
recreational out manually,
purposes.
purposes. The
The withimages
urban
urban the pilot
images operating
were
were the
takenduring
taken aircraft
duringthe controls
theworking rather
workingweek.
week.
than
The
The relying
flights
flights on carried
were
were automatic
carried
out flight
out plans.with
manually,
manually, withthethe pilot
pilot operating
operating the
the aircraft
aircraft controls
controls rather
rather
than
than relying
relying onon automatic
automatic flight
flight plans.
plans.
3.2. Techniques for Anonymizing the Dataset
3.2.
3.2. To ensurefor
Techniques
Techniques compliance
for Anonymizing
Anonymizing withthe
European
the Datasetdata protection regulations [23], the publication
Dataset
of images must comply with the requirement that they must not contain information that
ToToensure
ensurecompliance
compliancewith withEuropean
Europeandata dataprotection
protectionregulations
regulations[23],
[23],the
thepublica-
publica-
could identify individuals or lead to their identification through additional information.
tion
tion ofof images
images must
must comply
comply with
with the
the requirement
requirement that
that they
they must
must notnot contain
contain information
information
These images must be anonymized and, extending the term, pseudonymized. In our images,
that couldidentify
that identifyindividuals
individualsororleadleadtototheir
theiridentification
identificationthrough
throughadditional
additionalinfor-
infor-
onlycould
the license plates of cars and people are sensitive information to be anonymized. In
mation.
mation. These
These images
images must
must be be anonymized
anonymized and,
and, extending
extending the
the term,
term,
our dataset, there are no faces of people, so it was not necessary to apply tools in this pseudonymized.
pseudonymized. InIn
our
our
sense.images,
images,
On the only
only
other the
the license
license
hand, plates
plates
license ofof cars
cars
plates and
and
are people
people
likely are
are
to be sensitive
sensitive
identifiable, information
information
so toto
license plate be anon-
bedeletion
anon-
ymized.
ymized.
software In our
Infilters
our dataset,
dataset,
have there
there
been are
are nono
developed. faces
faces ofof people,
people, soso
it it
waswas not
not necessary
necessary toto apply
apply tools
tools
Data
Data 2022,
2022, 7,
7, xx FOR
FOR PEER
PEER REVIEW
REVIEW 66 of
of 10
10

Data 2022, 7, 53 6 of 10

in
in this
this sense.
sense. On
On the
the other
other hand,
hand, license
license plates
plates are
are likely
likely to
to be
be identifiable,
identifiable, so so license
license plate
plate
deletion software filters have been
deletion software filters have been developed.developed.
For
For anonymization, we applied algorithms from the OpenCV library [24,25], but
For anonymization,
anonymization, we we applied
applied algorithms
algorithms fromfrom thethe OpenCV
OpenCV librarylibrary [24,25],
[24,25], but
but
these
these algorithms focus on the process of “reading” the license plate once the image has
these algorithms
algorithms focus
focus on on the
the process
process ofof “reading”
“reading” the the license
license plate
plate once
once the
the image
image hashas
been
been segmented and the region of interest (ROI) has been located. In our case, this is the
been segmented
segmented and and thethe region
region ofof interest
interest (ROI)
(ROI) has
has been
been located.
located. In In our
our case,
case, this
this is
is the
the
most
most complex
complex part
part since
since the
theportion
portion of
ofthe
the license
license plate
plateininthe
the image
image is
isvery
very small,
small, so
so the
the
most complex part since the portion of the license plate in the image is very small, so the
segmentation
segmentation can generate too much noise likely to be considered license plate. Figure 5
segmentation can can generate
generate tootoo much
much noise
noise likely
likely to
to be
be considered
considered license
license plate.
plate. Figure
Figure 55
shows
shows that since the license plates are small in proportion to the image, even discarding
shows that
that since
since the
the license
license plates
plates are
are small
small inin proportion
proportion to to the
the image,
image, even
even discarding
discarding
objects
objects by size and proportion, there are many possibilities to be checked by the OCR.
objects byby size
size and
and proportion,
proportion, there
there are
are many
many possibilities
possibilities toto be
be checked
checked by by the
the OCR.
OCR.

Figure
Figure5.5.
Figure Binarized
5.Binarized and
Binarizedand thresholded
andthresholded image
thresholdedimage prepared
imageprepared for
preparedfor license
forlicense plate
licenseplate segmentation.
platesegmentation.
segmentation.

After
After realizing
Afterrealizing
realizingthat that this
thatthis was
thiswas
wasnot not
notthethe best
thebest option,
bestoption,
option,we we modified
wemodified
modifiedour our approach
ourapproach
approachto to neural
toneural
neural
networks
networks
networks(deep (deep learning
(deeplearning networks),
learningnetworks),
networks),which which
whichare are better
arebetter adapted
betteradapted to locating
adaptedtotolocating number
locatingnumber
numberplates plates
plates
in
in an
inan image
animage (segmentation
image(segmentation process).
(segmentationprocess).process).We We used
Weused
usedthe the public
thepublic dataset
publicdataset
dataset“car“car plate
“carplate detection”
platedetection”
detection”[26] [26]
[26]
available
availableinin
available Kaggle
inKaggle together
Kaggletogether
togetherwithwith
withaaaselection
selection
selectionof of our
ofourourownown images
ownimages
imagesfromfrom
fromthethe dataset
thedataset described
datasetdescribed
described
in
in this
inthis paper
thispaper
paperto to train
trainaaaneural
totrain neural network
neuralnetwork
networkYOLO YOLO
YOLOV4 V4 with
withaaaset
V4with set
setofof
of500 500 images
500images
imagesof of European
ofEuropean
European
format
format license
license plates
plates [27].
[27].Once
Once thethe image
image is segmented
is segmented and the
and the
format license plates [27]. Once the image is segmented and the license plate is located,license
license plate is
plate located,
is it isit
located, it
erased
is by a defocusing process. Figure 6 shows the result on the
is erased by a defocusing process. Figure 6 shows the result on the same image of the
erased by a defocusing process. Figure 6 shows the result same
on theimage
same of the
image dataset,
of the
in which two
dataset,
dataset, in license
in which
which two
twoplates likely
license
license to belikely
plates
plates legible
likely toare
to be located.
be legible In addition,
legible are
are located.
located. In there
In are three
addition,
addition, other
there
there are
are
vehicles
three
three otherin the image,
other vehicles
vehicles in but,
in the due to
the image, their
image, but, distance
but, due
due to from
to their the
their distancecamera,
distance from the
from the license
the camera, plate
camera, the cannot
the license
license
be detected,
plate
plate cannotnor
cannot be is
be it legiblenor
detected,
detected, in is
nor theit
is it image.
legible
legible inin the
the image.
image.

Figure
Figure6.6. Localization
6.Localization and
Localizationand blurring
blurringofof
andblurring two
oftwo license
twolicense plates
platesinin
licenseplates inaaadataset
dataset image
datasetimage using
imageusing neural
usingneural networks.
neuralnetworks.
networks.
Figure

3.3.
3.3. Dataset
3.3.Dataset Processing
DatasetProcessing
Processingand and Labeling
andLabeling
Labeling
Once
Once the
Oncethe images
theimages
imageshave have
havebeenbeen obtained,
beenobtained, they
obtained,they
theymustmust
mustbe be processed
beprocessed
processedand and labeled
andlabeled to
labeledto build
tobuild
build
models
models based
modelsbased
basedon on neural
onneural networks.
neuralnetworks.
networks. TheThe first
Thefirst step
stepisis
firststep to
isto divide
todivide the
dividethe videos
thevideos obtained
videosobtained
obtainedbyby
bythethe
the
UAVs
UAVs into
into images.
images. For
For this,
this, aa Python
Python script
script was
was developed,
developed, which,
which,
UAVs into images. For this, a Python script was developed, which, using the OpenCV using
using the
the OpenCV
OpenCV
computer
computervision
computer vision library,
vision library, allows
library, allows usus to
us tochoose
to chooseaaavideo
choose videoand
video anddivide
and divide
divide it it into
it into
into thethe
the number
number
number of of
of im-
im-
images
ages that
that compose
compose it. it.
The The number
number of of images
images thatthat
makemake
up up each
each video
video
ages that compose it. The number of images that make up each video depends on the FPS depends
depends on on
the the
FPS
FPS (Frames Per Second) at which the video is recorded. The videos for the realization of
Data 2022, 7, x FOR PEER REVIEW

Data 2022, 7, 53 (Frames Per Second) at which the video is recorded. The videos for the 7 ofrealizatio
10
work were captured at 30 FPS, so each second of the video is composed of 30 ima
eliminated the odd frames to reduce the size of the dataset as the changes betwee
this
arework
not were captured at
significant. 30 FPS,
There so each
are secondways
different of the video is composed
to label of 30 objects
and detect images. Wein image
eliminated the odd frames to reduce the size of the dataset as the changes between frames
For the development of the work, two methods were tested.
are not significant. There are different ways to label and detect objects in images [28,29].
For theThe first option
development of thewas to two
work, detect objects
methods werewith segmentation, but after the test, th
tested.
a problem in thatwas
The first option since the objects
to detect objectswith
were small andbut
segmentation, had a the
after lot test,
of background
there was (ev
athat is not an object), the model had a 99% detection accuracy, simply by assig
problem in that since the objects were small and had a lot of background (everything
that
entirenot
is an object),
image as thethebackground,
model had a 99% detection
as can accuracy,
be seen simply
in Figure 7.by
We assigning
decided theto use b
entire image as the background, as can be seen in Figure 7. We decided to use bounding
boxes, rectangles that mark the boundaries of the object, since the metrics used in
boxes, rectangles that mark the boundaries of the object, since the metrics used in this type
ofofannotation
annotation are are not influenced
not influenced by the background,
by the background, but by theofoverlapping
but by the overlapping the real of
rectangle
rectangle with
with the inferred
the inferred rectangle
rectangle [29,30]. [29,30].

Figure
Figure 7. Model
7. Model overestimation
overestimation (99% accuracy)
(99% accuracy) byeverything
by detecting detectingaseverything
background. as background.

The CVAT tool was used to tag the images. This tool allows us to label images in a
The CVAT tool was used to tag the images. This tool allows us to label ima
simple and agile way, owing to its “follow object” functionality, which allows us to label
simple
two imagesand agile way,
separated owing
in time, to its “follow
and calculates, object” functionality,
in an approximate which
way, the position allows us
of the
two between
object images these
separated in time,
two images; and
to label calculates,
accurately, in an approximate
an amplitude between images way, the positio
greater
than 20 should not be exceeded, and it will be manually checked that
object between these two images; to label accurately, an amplitude between the tracking has been
performed correctly
greater than [31].
20 should not be exceeded, and it will be manually checked that the
CVAT has a web application, but for greater security and to be able to modify default
has been performed correctly [31].
options such as the download size limit of the dataset once downloaded, we chose to
deployCVAT has a web
the application application,
in containers owingbuttofor greater
Docker (an security
open-source and to that
tool be able to modify
allows
options such
virtualization as the download
in containers as if it were size limit machine,
a virtual of the dataset once downloaded,
but lightweight, thus enabling we cho
the automation of the deployment of these applications) [32].
ploy the application in containers owing to Docker (an open-source tool that all
Once the images are labeled and the dataset validated, they are exported to YOLO
tualization in containers as if it were a virtual machine, but lightweight, thus ena
format, which is one of many export formats CVAT offers, and made available to the
automation
community ofused
to be the for
deployment
model training. of these applications) [32].
Once the images are labeled and the dataset validated, they are exported t
3.4. Datasetwhich
format, Validation
is one of many export formats CVAT offers, and made available to
The dataset
munity to be was
used used
fortomodel
create atraining.
basic model to analyze its value. The selected model
was yolov5m, a 365-layer PyTorch neuronal network. The mean average precision (mAP),
with a 0.5 intersection over union (IoU) was established as the parameter to be optimized.
3.4. Dataset
Most Validation
of the images without objects (90%) were removed for this training to ensure the
integrity of the results
The dataset was [33].used
Tablesto
3 and 4 show
create the hardware
a basic model used, the training
to analyze parameters,
its value. The selecte
and the result obtained, respectively. Figure 8 graphically illustrates the training results for
was yolov5m, a 365-layer PyTorch neuronal network. The mean average precision
the parameters mAp, loss (training set), and val_loss (test set).
with a 0.5 intersection over union (IoU) was established as the parameter to be op
Most of the images without objects (90%) were removed for this training to en
integrity of the results [33]. Tables 3 and 4 show the hardware used, the training
ters, and the result obtained, respectively. Figure 8 graphically illustrates the tra
sults for the parameters mAp, loss (training set), and val_loss (test set).

Table 3. Hardware used for training.

Processor Intel Core i5-6500TE 2.4 GHz


Operating system Ubuntu 20.04.3 LTS (Focal Fossa)
Data 2022, 7, 53 8 of 10

Table 3. Hardware used for training.

Processor Intel Core i5-6500TE 2.4 GHz


Operating system Ubuntu 20.04.3 LTS (Focal Fossa)
Data 2022, 7, x FOR PEER REVIEW Motherboard Intel RUBY-D718VG2AR 8 of 10

RAM 64 GB
Graphics card Nvidia RTX 2060
Hard disk 512 GB SSD
Hard disk 512 GB SSD

Table 4. Training parameters and result.


Table 4. Training parameters and result.
Intersection over Union: 0.5
Intersection Learning
over Union: rate 0.5 0.01
Train/validation
Learning rate split 0.01 90–10%
Train/validation split
Steps 90–10% 3659
Steps 3659
Batch size 4
Batch size 4
Total epochs epochs
Total 55 55
Total
Total time time training
training 21.8 h 21.8 h
mAP mAP 0.979460.97946
Class car [mAP] 0.994
Class car [mAP] 0.994
Class motorcycle [mAP] 0.962
Class motorcycle [mAP]
Precision 0.98456 0.962
RecallPrecision 0.955080.98456
Recall 0.95508

1
0.9
0.8
0.7
loss
0.6
[email protected]
0.5
val/loss
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50

Figure8.8.Results
Figure Resultsofoftraining
trainingaabasic
basicmodel.
model.

4.4.User
UserNotes
Notes
The
Theapproach
approachusedusedtotobuild
buildthe
thedataset
datasetshown
shownininthis
thiswork
workseeks
seekstotoprovide
provideresources
resources
tototrain
trainneural
neuralnetworks
networksforfor
artificial intelligence
artificial systems
intelligence systemsto be used
to be in the
used identification
in the of
identification
vehicles in traffic
of vehicles management.
in traffic management.

Author Contributions:Conceptualization,
AuthorContributions: Conceptualization,J.S.-S.,
J.S.-S.,S.G.
S.G.and
andS.B.R.;
S.B.R.;methodology,
methodology,J.S.-S.
J.S.-S.and
andS.G.;
S.G.;
software,
software,S.G.;
S.G.;validation,
validation,J.S.-S., S.G.
J.S.-S., and
S.G. S.B.R.;
and formal
S.B.R.; formal analysis, J.S.-S.
analysis, and
J.S.-S. S.G.;
and investigation,
S.G.; J.S.-S.
investigation, J.S.-
and S.G.;S.G.;
S. and resources, J.S.-S.,
resources, S.G.S.G.
J.S.-S., and and
S.B.R.; datadata
S.B.R.; curation, S.G.;S.G.;
curation, writing—original
writing—originaldraftdraft
preparation,
prepara-
J.S.-S.
tion, and S.G.;
J.S.-S. andwriting—review
S.G.; writing—reviewand editing, J.S.-S.,J.S.-S.,
and editing, S.G., S.B.R. and J.F.-A.;
S.G., S.B.R. visualization,
and J.F.-A.; S.G. and
visualization, S.G.
S.B.R.; supervision, J.S.-S.; project administration, J.S.-S.; funding acquisition, J.F.-A. All authors
and S.B.R.; supervision, J.S.-S.; project administration, J.S.-S.; funding acquisition, J.F.-A. All authors have
read
haveand
readagreed to the published
and agreed versionversion
to the published of the manuscript.
of the manuscript.
Funding: This publication is part of the I+D+i projects with reference PID2019-104793RB-C32,
PIDC2021-121517-C33, funded by MCIN / AEI / 10.13039 / 501100011033 /, S2018/EMT-
4362/”SEGVAUTO4.0-CM” funded by Regional Government of Madrid and “ESF and ERDF A way
of making Europe”.
Data 2022, 7, 53 9 of 10

Funding: This publication is part of the I+D+i projects with reference PID2019-104793RB-C32, PIDC2021-
121517-C33, funded by MCIN/AEI/10.13039/501100011033/, S2018/EMT-4362/”SEGVAUTO4.0-CM”
funded by Regional Government of Madrid and “ESF and ERDF A way of making Europe”.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are openly available in https://zenodo.
org/record/5776219 (accessed on 24 March 2022) with doi: https://doi.org/10.5281/zenodo.5776218.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Milić, A.; Randjelovic, A.; Radovanović, M. Use of drones in operations in the urban environment. In Proceedings of the 5th
International Conference on Information Systems for Crisis Response and Management, Washington, DC, USA, 4–7 May 2008.
2. Merkert, R.; Bushell, J. Managing the drone revolution: A systematic literature review into the current use of airborne drones and
future strategic directions for their effective control. J. Air Transp. Manag. 2020, 89, 101929. [CrossRef] [PubMed]
3. Hodgkinson, D.; Johnston, R. Aviation Law and Drones: Unmanned Aircraft and the Future of Aviation; Routledge: London, UK, 2018.
4. Ministerio de Fomento. Plan Estratégico para el Desarrollo del Sector Civil de Los Drones en España 2018–2021|Ministerio de
Transportes, Movilidad y Agenda Urbana. Available online: https://www.mitma.gob.es/el-ministerio/planes-estrategicos/
drones-espania-2018-2021 (accessed on 1 December 2021).
5. Cuenca, L.G.; Sanchez-Soriano, J.; Puertas, E.; Andrés, J.F.; Aliane, N. Machine Learning Techniques for Undertaking Roundabouts
in Autonomous Driving. Sensors 2019, 19, 2386. [CrossRef] [PubMed]
6. Cuenca, L.G.; Puertas, E.; Andrés, J.F.; Aliane, N. Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning
with Q-Learning. Electronics 2019, 8, 1536. [CrossRef]
7. Pettersson, I.; Karlsson, M. Setting the stage for autonomous cars: A pilot study of future autonomous driving experiences. IET
Intell. Transp. Syst. 2015, 9, 694–701. [CrossRef]
8. Bemposta Rosende, S.; Sánchez-Soriano, J.; Gómez Muñoz, C.Q.; Fernández Andrés, J. Remote Management Architecture of UAV
Fleets for Maintenance, Surveillance, and Security Tasks in Solar Power Plants. Energies 2020, 13, 5712. [CrossRef]
9. Yildiz, M.; Bilgiç, B.; Kale, U.; Rohács, D. Experimental Investigation of Communication Performance of Drones Used for
Autonomous Car Track Tests. Sustainability 2021, 13, 5602. [CrossRef]
10. Zhou, W.; Liu, J.; Lei, J.; Yu, L.; Hwang, J.N. GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban
Scene Semantic Segmentation. IEEE Trans. Image Process. 2021, 30, 7790–7802. [CrossRef] [PubMed]
11. Zhou, W.; Guo, Q.; Lei, J.; Yu, L.; Hwang, J.N. IRFR-Net: Interactive Recursive Feature-Reshaping Network for Detecting Salient
Objects in RGB-D Images. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–13. [CrossRef] [PubMed]
12. Tobías, L.; Ducournau, A.; Rousseau, F.; Mercier, G.; Fablet, R. Convolutional Neural Networks for object recognition on mobile
devices: A case study. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico,
4–8 December 2016; pp. 3530–3535. [CrossRef]
13. Vivek, R.; Vighnesh, B.; Sachin, J.; Pkulzc; Khanh. TensorFlow 2 Detection Model Zoo. Tensorflow. 2021. Available on-
line: https://github.com/tensorflow/models/blob/5ad16f952885c86ca0aa31a8eb3737ab7bb23ee1/research/object_detection/
g3doc/tf2_detection_zoo.md (accessed on 19 June 2021).
14. Zhou, W.; Wu, J.; Lei, J.; Hwang, J.-N.; Yu, L. Salient Object Detection in Stereoscopic 3D Images Using a Deep Convolutional
Residual Autoencoder. IEEE Trans. Multimed. 2021, 23, 3388–3399. [CrossRef]
15. Cao, G.; Xie, X.; Yang, W.; Liao, Q.; Shi, G.; Wu, J. Feature-fused SSD: Fast detection for small objects. In Proceedings of the Ninth
International Conference on Graphic and Image Processing (ICGIP 2017), Qingdao, China, 14–16 October 2017. [CrossRef]
16. Tackling the Small Object Problem in Object Detection. Roboflow Blog. 2020. Available online: https://blog.roboflow.com/
detect-small-objects/ (accessed on 17 June 2021).
17. Unel, F.O.; Ozkalayci, B.O.; Cigla, C. The Power of Tiling for Small Object Detection. In Proceedings of the 2019 IEEE/CVF
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp.
582–591. [CrossRef]
18. Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and Tracking Meet Drones Challenge. IEEE Trans. Pattern
Anal. Mach. Intell. 2021, 1. [CrossRef] [PubMed]
19. Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings
of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016;
pp. 779–788.
20. Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German
Highways for Validation of Highly Automated Driving Systems. In Proceedings of the 2018 21st International Conference on
Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2118–2125. [CrossRef]
21. Mutasa, S.; Sun, S.; Ha, R. Understanding artificial intelligence based radiology studies: What is overfitting? Clin. Imaging 2020,
65, 96–99. [CrossRef] [PubMed]
Data 2022, 7, 53 10 of 10

22. Ministerio de la Presidencia y para las Administraciones Territoriales. Boletín Oficial del Estado 29 December 2017. Available
online: https://www.boe.es/boe/dias/2017/12/29/pdfs/BOE-A-2017-15721.pdf (accessed on 1 December 2021).
23. Reglamento (UE) 2016/679 del Parlamento Europeo y del Consejo de 27 de Abril de 2016. Available online: https://www.boe.es/
doue/2016/119/L00001-00088.pdf (accessed on 1 December 2021).
24. Automatic License Plate Recognition using Python and OpenCV. Available online: https://sajjad.in/content/ALPR_paper.pdf
(accessed on 22 November 2021).
25. Zhang, C.; Tai, Y.; Li, Q.; Jiang, T.; Mao, W.; Dong, H. License Plate Recognition System Based on OpenCV. In 3D Imaging
Technologies—Multi-Dimensional Signal Processing and Deep Learning; Smart Innovation, Systems and Technologies; Jain, L.C.,
Kountchev, R., Shi, J., Eds.; Springer: Singapore, 2021; Volume 234. [CrossRef]
26. Car License Plate Detection. Available online: https://www.kaggle.com/andrewmvd/car-plate-detection (accessed on 22
November 2021).
27. Vehicle Registration Plates of Europe. Available online: https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Europe
(accessed on 22 November 2021).
28. Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated
Data Set for Object Detection in Video. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7464–7473. [CrossRef]
29. Hoeser, T.; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part
I: Evolution and Recent Trends. Remote Sens. 2020, 12, 1667. [CrossRef]
30. Padilla, R.; Netto, S.L.; Da Silva, E.A. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the
2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; p. 6.
31. Track Mode (Basics), CVAT. Available online: https://openvinotoolkit.github.io/docs/manual/basics/track-mode-basics/
(accessed on 29 June 2021).
32. Anderson, C. Docker [Software Engineering]. IEEE Softw. 2015, 32, 102-c3. [CrossRef]
33. Lee, Y.H.; Kim, B.; Kim, H.J. Efficient object identification and localization for image retrieval using query-by-region. Comput.
Math. Appl. 2012, 63, 511–517. [CrossRef]

You might also like