0% found this document useful (0 votes)
7 views5 pages

5-Jul-11093 Paper

The document discusses the use of Convolutional Neural Networks (CNNs) for classifying images captured by UAVs in urban areas, utilizing the UC MERCED dataset. It details the methodology, including the training of deep learning models using architectures like VGG16, Shuffle Net, and Squeeze Net, achieving varying accuracies, with VGG16 performing the best at 96%. The study highlights the challenges of object detection in urban environments and the potential of UAV technology in enhancing image classification accuracy.

Uploaded by

padma ratna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

5-Jul-11093 Paper

The document discusses the use of Convolutional Neural Networks (CNNs) for classifying images captured by UAVs in urban areas, utilizing the UC MERCED dataset. It details the methodology, including the training of deep learning models using architectures like VGG16, Shuffle Net, and Squeeze Net, achieving varying accuracies, with VGG16 performing the best at 96%. The study highlights the challenges of object detection in urban environments and the potential of UAV technology in enhancing image classification accuracy.

Uploaded by

padma ratna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Journal of Xidian University https://doi.org/10.5281/Zenodo.

12634856 ISSN No:1001-2400

UAV Urban Area VechicalClassification


Techniques

G.Padma Ratna(Asst prof.) M.Uma K.Pavan Kumar Sk.Babavali


Electronics and Communication Electronics and Communication Electronics and Communication Electronics and Communication
Engineering - ANUCET Engineering - ANUCET Engineering - ANUCET Engineering - ANUCET

Abstract: The project is based on Convolutional Neural Net- and classification performance. Processing this data requires
works (CNNs) to classify images captured by UAVs advanced computer vision techniques such as deep learning
(UNMANNED AERIAL VEHICLE) in urban areas. We have algorithms, which can identify each type of object in a sin-
used the UC MERCED dataset which popular data set for aer- gle image accurately. This can help in the precise object
ial image classification and detection using deep network de-
detection process in large data sets. One of the common
signer present in MATLAB. We trained the dataset using deep
learning architectures based on convolutional neural networks approaches to classifying objects is deep learning, which
to classify different classes of urban area images (e.g. buildings, involves training neural networks to recognize the objects in
vehicles, etc..) using VGG16-net architecture. They pushed the images. Convolutional neural networks (CNNs) have
depth to 16319 weight layers making it approx. 4 138 traina- achieved promising results in classifying the data acquired
ble parameters convolutional neural network layers, each of by the UAVs, but they still face some difficulties such as
which consists of a pair of two CNN layers of different sizes, classifying different images with similar properties. Howev-
Max Pool Layers, Batch Normalization, and Dropout layers er, these algorithms need large amounts of data for training.
with different dropout conditions. The network is further fol- Earlier attempts at solving the classification problem in-
lowed by fully connected neural networks to make the learning
volved defining and extracting certain features from image
process easier and faster. We make use of the "SDGM" opti-
mizer and ReLu activation function on Conv2D layers and the data sets, which represented most of the data with high con-
Classification Layer on the last (dense) fully connected layer fidence. These features aimed to capture interesting infor-
for better representation of data and class prediction. mation in images such as edges, circles, lines, or a combina-
tion of these, which were ideally invariant to translation,
Keywords4CNN (Convolutional Neural Network), VGG- scale, and varying light intensities. Examples of such fea-
16Net, Shuffle Net, Squeeze Net, ADAM Optimizer, and tures include BUILDINGS, VEHICLES, ROADS,
MATLAB. AGRICULTURAL, AIRPLANE, BASEBALL diamonds,
beaches, CHAPARRAL, forests, freeways, GOLF courses,
I. INTRODUCTION HARBOR, RIVER, runways, OVERPASS, TENNIS
COURT, STORAGE TANKS, and PEDESTRIANS.
UAV stands for Unmanned Aerial Vehicle, common- Once these features were extracted, classifiers such as Sup-
ly known as a drone. It is an aircraft without a human pilot port Vector Machine, Naive Bayes, Decision Trees, K-
onboard, operated either autonomously by computers Nearest Neighbors, or Linear Discriminant Analysis were
onboard the aircraft or remotely by a human operator. UAVs used to determine the membership of an unseen image.
come in various shapes, sizes, and configurations, ranging However, these methods were time-consuming, and it was
from small consumer drones used for recreational purposes hard to define features that captured a wide range of infor-
to large military-grade drones used for surveillance, recon- mation.
naissance, and combat operations. The use of UAVs has
grown significantly in recent years due to advancements in II. RELATED WORK
technology, making them more accessible, affordable, and
In the field of remote sensing and agricultural monitoring,
capable. However, their deployment raises various concerns
several significant contributions have been made utilizing
related to safety, privacy, regulatory compliance, and ethical
various advanced technologies, including Convolutional
considerations. As such, the development and use of UAVs
Neural Networks (CNNs), Unmanned Aerial Vehicles
continue to be subject to stringent regulations and guidelines
(UAVs), and edge computing. This section reviews perti-
imposed by aviation authorities worldwide.
nent literature that has informed the development and appli-
Object detection and classification in urban environments is
cation of these technologies. Yu et al. (2020) utilized Con-
a difficult task using conventional methods. One of the main
volutional Neural Networks (CNNs) for urban land cover
reasons is the high visual variability of urban objects, which
classification using multispectral and hyperspectral satellite
affects the accuracy and generalization of prediction models.
imagery, highlighting the potential of deep learning in re-
Moreover, image acquisition factors such as noise, motion
mote sensing. Kalman (1960) introduced the Kalman filter,
blur, occlusions, lighting variations, reflections, perspective,
providing a recursive solution to discrete-data linear filter-
and geo-location errors can make the task more complex. To
ing problems, which has since been adapted for various re-
overcome these challenges, we can use the data sent by
mote sensing applications. Islam et al. (2020) proposed a
UAVs (unmanned aerial vehicles), which can capture high-
vision-based precision agriculture framework employing
resolution aerial images of ground objects. For instance, the
UAVs for crop monitoring and disease detection, leveraging
data sent by the UAVs can provide more details than the
high-resolution aerial imagery for improved crop manage-
traditional data set, which can enhance the object detection

VOLUME 18, ISSUE 7, 2024 41 http://xadzkjdx.cn/


Journal of Xidian University https://doi.org/10.5281/Zenodo.12634856 ISSN No:1001-2400

ment. Capolupo et al. (2022) presented a method for detect- 9. However, we also found that this technique did not
ing Swiss parcel edges and buildings using very high- provide a significant improvement in accuracy for
resolution UAV imagery, demonstrating the potential of our model, so we decided to remove it from the final
UAVs in generating detailed land cover maps. Chen et al.
version of our code. Overall our deep learning model
(2022) surveyed deep learning techniques for small object
detection in remote sensing images, offering insights into achieved a test accuracy of 87%, demonstrating its
methodologies and challenges critical for environmental effectiveness in classifying the ten different
monitoring and disaster management. Shi et al. (2023) dis- categories in the UC MERCED LAND USE dataset.
cussed the challenges and opportunities associated with the
integration of edge computing in smart agriculture. Li et al.
(2023) provided a comprehensive survey on learning-based
scene understanding from remote sensing images. Zhang et IV. STEPS USED FOR TRAINING
al. (2021) utilized CNNs for automated detection and recog-
nition of irrigation pivots from satellite imagery. Finally, A. Import the dataset
Zhao et al. (2021) surveyed drone-based object detection
and tracking methods, discussing their advantages and UC MERCED LAND USE included 20 object
drawbacks. This survey highlights the rapid advancements classes, spanning people, animals, vehicles, and in-
in UAV technology and its applications in various fields, door items. Notable categories were person, bird,
including surveillance, environmental monitoring, and agri- car, bottle, and sofa.
culture.
.
III. PROPOSED METHODOLOGY
1. The process involves several key steps, from data
acquisition to the application of CNN
(Convolutional Neural Networks) techniques for
object classification.
2. We trained a deep learning model on the UC
MERCED LAND USE dataset using VGG-16,
shuffle and squeeze net. The code was
predominantly written in MATLAB and utilized
popular deep learning tools in MATLAB such as
deep learning designer.
3. We used the Adam optimizer and tuned several
hyperparameters, including a learning rate of 6001,
regularization term of 1×10-5, a batch size of 32,
and trained the model for 30 epochs.
4. To improve the robustness of our model, we
Fig:-1 The data set preprocessing.
initially employed data augmentation techniques
such as random horizontal flips and random affine
transformations from the deep learning designer B. Split the data into training and testing data
library. The data is split into training and testing data. From the da-
taset, the whole data is split in such a way that 90% of the
5. However, after testing, we found that these data is used for training, and the remaining 10% is used for
techniques did not significantly improve the testing.
model's accuracy. Therefore, we decided to limit
the code without these techniques. C. Training the models
Firstly, MATLAB which contains the deep network
6. Division of the UC merged data into two subsets designer is used to train the data. In the deep network de-
for model validation: 90% for training and 10% signer, the popular nets are called VGG-16, SUFFELE
for testing.
NET and SQUESE NETis used to train the data contained
7. Implementation of various classification in the data set. When we trained the data for the first time
algorithms, including VGG-16Net and Shuffle the training accuracy and testing accuracy came as output
Net and Squeeze Nett o accurately identify low which is nearly 60 and 50 respectively. When we
objects based on drone images. trained the same data again the training accuracy and test-
8. To prevent overfitting during training, we ing accuracy came as output which is nearly 70 and 65
implemented the Early Stopping method. which respectively which is more accurate than the first time.
stops the training process when the validation loss Again, when we trained the data set the training accuracy
stops improving. However, we also found that this was about to 95 and the testing accuracy was about 87
technique did not provide a significant improvement which is much higher than the first and second results.
in accuracy for our model.

VOLUME 18, ISSUE 7, 2024 42 http://xadzkjdx.cn/


Journal of Xidian University https://doi.org/10.5281/Zenodo.12634856 ISSN No:1001-2400

V.CONVOLUTIONAL NEURAL NETWORKS convolution reduces this cost by dividing the input channels
into groups and applying convolutions within each group
D. VGG-16 NET separately.

A convolutional neural network is also known as a Convolu-


tion Net, which is a kind of artificial neural network. A con-
volutional neural network has an input layer, an output lay-
er, and various hidden layers. VGG16 is a type of CNN
(Convolutional Neural Network) that is one of the best
computer vision models to date. The creators of this model
evaluated the networks and increased the depth using an
architecture with very small (3 × 3) convolution filters,
which showed a significant improvement on the prior-art
configurations. They pushed the depth to 16319 weight lay-
ers making it approximately 4 138 trainable parameters.
VGG16 is an object detection and classification algorithm
which can classify 1000 images of 1000 different categories
with 92.7% accuracy. It is one of the popular algorithms for
image classification and is easy to use with transfer learning. Fig:-3 Architecture of Shuffele Net
The 16 in VGG16 refers to 16 layers that have weights. In
VGG16 there are thirteen convolutional layers, five Max
F. Squeeze Net
Pooling layers, and three Dense layers which sum up to 21
layers but it has only sixteen weight layers i.e., learnable
parameters layer. VGG16 takes input tensor size as 224, 244 Squeeze Net is a lightweight convolutional neural network
with 3 RGB channels. The most unique thing about VGG16 (CNN) architecture designed to achieve high accuracy while
is that instead of having many hyper-parameters they fo- minimizing model size and computational cost. The primary
cused on having convolution layers of 3x3 filter with stride motivation behind Squeeze Net is to reduce model size and
1 and always used the same padding and Max pool layer of computational complexity while maintaining high accuracy
2x2 filter with stride 2. in tasks such as image classification. Squeeze Net employs a
novel architecture that significantly reduces the number of
parameters compared to traditional CNN architectures like
Alex Net while maintaining competitive accuracy. It
achieves this reduction by replacing traditional 3x3 filters
with 1x1 filters (squeeze layers) followed by 3x3 filters (ex-
pand layers), reducing the number of parameters without
sacrificing expressive power. Squeeze Net also incorporates
fire modules, which combine squeeze and expand layers,
allowing efficient use of computation and memory.

Fig:-2 VGG-16 Architecture


E. Shuffele Net
Shuffle Net is a convolutional neural network (CNN) archi-
tecture designed to achieve high accuracy while minimizing
computational cost, Shuffle Net was developed to address
the increasing demand for deploying CNNs on resource-
constrained devices such as mobile phones, IoT devices, and Fig:-4 Squeeze Net
embedded systems. The primary motivation behind Shuffle
Net is to reduce computational complexity, memory usage,
and model size while maintaining high accuracy in various G. ADAM Optimizer
computer vision tasks.traditional convolutions involve a Adaptive Moment Estimation is an algorithm for
significant number of parameters and computations, espe- optimization technique for gradient descent. The
cially as the number of channels increases. Pointwise group
method is really efficient when working with large

VOLUME 18, ISSUE 7, 2024 43 http://xadzkjdx.cn/


Journal of Xidian University https://doi.org/10.5281/Zenodo.12634856 ISSN No:1001-2400

problem involving a lot of data or parameters. It


requires less memory and is efficient. Intuitively, it Fig:-5 Result of Shuffle Net
is a combination of the 8gradient descent with I. VGG-16 Net
momentum9 algorithm and the 8RMSP9 algorithm. The VGG-16 algorithm got an accuracy of 96%. Among all
the algorithms used, VGG-16 got more accuracy with train-
This algorithm is used to accelerate the gradient
ing and testing data.
descent algorithm by taking into consideration the
8exponentially weighted average9 of the gradients.
Using averages makes the algorithm converge towards
the minima in a faster pace. Adam Optimizer inherits
the strengths or the positive attributes of the above
two methods and builds upon them to give a more
optimized gradient descent.

Fig:-6 Result of VGG-16Net


J. Squeeze Net
The Squeeze net algorithm got an accuracy of 87.2%. for
classification of urban area vehicle Classification.

Accuracy is a primary indicator, reflecting the overall rate at


which the model correctly predicts the outcome.
Fig:-5 ADAM Optimizer
VI. RESULTS ANALYSIS

Among the algorithms VGG-16, Shuffle Net and Squeeze


Net algorithm with more accuracy is the VGG-16 Net with
96% accuracy.

H. Shuffle Net

The Shuffle Net algorithm got an accuracy of 90%.


This algorithm is best used when we want to classify the
images in a dataset. Fig:-7 Result Squeeze Net

Traditional Pointwise Convolution:


Cost = H×W×C×KH \times W \times C \times KH×W×C×K
Grouped Convolution:
Cost = H×W×C×KG\frac{H \times W \times C \times
K}{G}GH×W×C×K

VOLUME 18, ISSUE 7, 2024 44 http://xadzkjdx.cn/


Journal of Xidian University https://doi.org/10.5281/Zenodo.12634856 ISSN No:1001-2400

VGG-16 Shuffle Net


Metrics Squeeze
Net
tion, and concatenation in an end-to-end process.
Training 89.42 90.2 91.16
Accuracy
Testing 87.19 89.4 87.3
VIII. References
Accuracy
Execution [1] R. E. KALMAN, "A NEW APPROACH TO LINEAR
Time 0.015 0.14 0.10 FILTERING AND PREDICTION PROBLEMS," IEEE TRANSACTIONS
(se- ON AUTOMATIC CONTROL, VOL. AC-8, NO. 2, PP. 143-148,
conds) 1960.
[2] X. YU, Y. LI, AND J. YANG, "URBAN LAND COVER
CLASSIFICATION WITH MULTISPECTRAL AND HYPERSPECTRAL
K. Confusion Matrix SATELLITE IMAGERY USING CONVOLUTIONAL NEURAL
NETWORKS," REMOTE SENSING, VOL. 12, NO. 7, ARTICLE 1081,
2020.
[3] M. Z. ISLAM, S. K. R. KHAN, AND J. WANG, "A VISION-BASED
PRECISION AGRICULTURE FRAMEWORK FOR CROP
MONITORING AND DISEASE DETECTION USING UNMANNED
AERIAL VEHICLES," APPLIED SCIENCES, VOL. 13, NO. 20,
ARTICLE 11320, 2020.

[4] G. CAPOLUPO, L. DE MAIO, AND G. RUOTOLO, "DETECTION


OF SWISS PARCEL EDGES AND BUILDINGS IN VERY
HIGH-RESOLUTION UAV IMAGERY," ISPRS ANNALS OF
PHOTOGRAMMETRY, REMOTE SENSING & SPATIAL
INFORMATION SCIENCES, VOL. X-4/W1, PP. 451-458, 2022.
[5] Z. CHEN, Y. WU, AND C. ZHAI, "A SURVEY OF DEEP LEARNING
TECHNIQUES FOR SMALL OBJECT DETECTION IN REMOTE
SENSING IMAGES," ADVANCES IN SPACE RESEARCH, VOL. 71,
NO. 4, PP. 1213-1229, 2022.
VII. Conclusion [6] S. SHI, L. LIU, AND L. CUI, "EDGE COMPUTING FOR SMART
AGRICULTURE: CHALLENGES AND OPPORTUNITIES," DRONES,
In recent years, semantic segmentation of aerial data VOL. 7, NO. 3, ARTICLE 190, 2023.

has been a prominent area of research in photo- [7] W. LI, H. ZHAO, AND X. YU, "LEARNING-BASED SCENE
grammetry, remote sensing, and computer vision. UNDERSTANDING FROM REMOTE SENSING IMAGES: A
Urban applications such as airborne mapping, object SURVEY," JOURNAL OF PARALLEL AND DISTRIBUTED
positioning, and building extraction from high- COMPUTING, VOL. 159, PP. 123-140, 2023.
resolution aerial images demand accurate and effi- [8] T. ZHANG, L. ZHANG, AND L. ZHANG, "AUTOMATED
cient segmentation algorithms. Deep learning mod- DETECTION AND RECOGNITION OF IRRIGATION PIVOTS FROM
els have shown great potential in handling complex SATELLITE IMAGERY USING CONVOLUTIONAL NEURAL
NETWORKS," INTERNATIONAL JOURNAL OF REMOTE SENSING,
scenes and this study focuses on evaluating the se-
VOL. 43, NO. 12, PP. 4943-4962, 2021.
mantic segmentation accuracy of UAV-based imag-
es in urban areas. The proposed method employs a [9] J. ZHAO, Y. ZHANG, AND Z. WANG, "A SURVEY ON DRONE-
deep learning framework based on VGG16 Net. BASED OBJECT DETECTION AND TRACKING METHODS,"
APPLIED SCIENCES, VOL. 11, NO. 20, ARTICLE 11320,
This architecture extracts and classifies features
through layers of convolution, max pooling, activa-

VOLUME 18, ISSUE 7, 2024 45 http://xadzkjdx.cn/

You might also like