5-Jul-11093 Paper
5-Jul-11093 Paper
Abstract: The project is based on Convolutional Neural Net- and classification performance. Processing this data requires
works (CNNs) to classify images captured by UAVs advanced computer vision techniques such as deep learning
(UNMANNED AERIAL VEHICLE) in urban areas. We have algorithms, which can identify each type of object in a sin-
used the UC MERCED dataset which popular data set for aer- gle image accurately. This can help in the precise object
ial image classification and detection using deep network de-
detection process in large data sets. One of the common
signer present in MATLAB. We trained the dataset using deep
learning architectures based on convolutional neural networks approaches to classifying objects is deep learning, which
to classify different classes of urban area images (e.g. buildings, involves training neural networks to recognize the objects in
vehicles, etc..) using VGG16-net architecture. They pushed the images. Convolutional neural networks (CNNs) have
depth to 16319 weight layers making it approx. 4 138 traina- achieved promising results in classifying the data acquired
ble parameters convolutional neural network layers, each of by the UAVs, but they still face some difficulties such as
which consists of a pair of two CNN layers of different sizes, classifying different images with similar properties. Howev-
Max Pool Layers, Batch Normalization, and Dropout layers er, these algorithms need large amounts of data for training.
with different dropout conditions. The network is further fol- Earlier attempts at solving the classification problem in-
lowed by fully connected neural networks to make the learning
volved defining and extracting certain features from image
process easier and faster. We make use of the "SDGM" opti-
mizer and ReLu activation function on Conv2D layers and the data sets, which represented most of the data with high con-
Classification Layer on the last (dense) fully connected layer fidence. These features aimed to capture interesting infor-
for better representation of data and class prediction. mation in images such as edges, circles, lines, or a combina-
tion of these, which were ideally invariant to translation,
Keywords4CNN (Convolutional Neural Network), VGG- scale, and varying light intensities. Examples of such fea-
16Net, Shuffle Net, Squeeze Net, ADAM Optimizer, and tures include BUILDINGS, VEHICLES, ROADS,
MATLAB. AGRICULTURAL, AIRPLANE, BASEBALL diamonds,
beaches, CHAPARRAL, forests, freeways, GOLF courses,
I. INTRODUCTION HARBOR, RIVER, runways, OVERPASS, TENNIS
COURT, STORAGE TANKS, and PEDESTRIANS.
UAV stands for Unmanned Aerial Vehicle, common- Once these features were extracted, classifiers such as Sup-
ly known as a drone. It is an aircraft without a human pilot port Vector Machine, Naive Bayes, Decision Trees, K-
onboard, operated either autonomously by computers Nearest Neighbors, or Linear Discriminant Analysis were
onboard the aircraft or remotely by a human operator. UAVs used to determine the membership of an unseen image.
come in various shapes, sizes, and configurations, ranging However, these methods were time-consuming, and it was
from small consumer drones used for recreational purposes hard to define features that captured a wide range of infor-
to large military-grade drones used for surveillance, recon- mation.
naissance, and combat operations. The use of UAVs has
grown significantly in recent years due to advancements in II. RELATED WORK
technology, making them more accessible, affordable, and
In the field of remote sensing and agricultural monitoring,
capable. However, their deployment raises various concerns
several significant contributions have been made utilizing
related to safety, privacy, regulatory compliance, and ethical
various advanced technologies, including Convolutional
considerations. As such, the development and use of UAVs
Neural Networks (CNNs), Unmanned Aerial Vehicles
continue to be subject to stringent regulations and guidelines
(UAVs), and edge computing. This section reviews perti-
imposed by aviation authorities worldwide.
nent literature that has informed the development and appli-
Object detection and classification in urban environments is
cation of these technologies. Yu et al. (2020) utilized Con-
a difficult task using conventional methods. One of the main
volutional Neural Networks (CNNs) for urban land cover
reasons is the high visual variability of urban objects, which
classification using multispectral and hyperspectral satellite
affects the accuracy and generalization of prediction models.
imagery, highlighting the potential of deep learning in re-
Moreover, image acquisition factors such as noise, motion
mote sensing. Kalman (1960) introduced the Kalman filter,
blur, occlusions, lighting variations, reflections, perspective,
providing a recursive solution to discrete-data linear filter-
and geo-location errors can make the task more complex. To
ing problems, which has since been adapted for various re-
overcome these challenges, we can use the data sent by
mote sensing applications. Islam et al. (2020) proposed a
UAVs (unmanned aerial vehicles), which can capture high-
vision-based precision agriculture framework employing
resolution aerial images of ground objects. For instance, the
UAVs for crop monitoring and disease detection, leveraging
data sent by the UAVs can provide more details than the
high-resolution aerial imagery for improved crop manage-
traditional data set, which can enhance the object detection
ment. Capolupo et al. (2022) presented a method for detect- 9. However, we also found that this technique did not
ing Swiss parcel edges and buildings using very high- provide a significant improvement in accuracy for
resolution UAV imagery, demonstrating the potential of our model, so we decided to remove it from the final
UAVs in generating detailed land cover maps. Chen et al.
version of our code. Overall our deep learning model
(2022) surveyed deep learning techniques for small object
detection in remote sensing images, offering insights into achieved a test accuracy of 87%, demonstrating its
methodologies and challenges critical for environmental effectiveness in classifying the ten different
monitoring and disaster management. Shi et al. (2023) dis- categories in the UC MERCED LAND USE dataset.
cussed the challenges and opportunities associated with the
integration of edge computing in smart agriculture. Li et al.
(2023) provided a comprehensive survey on learning-based
scene understanding from remote sensing images. Zhang et IV. STEPS USED FOR TRAINING
al. (2021) utilized CNNs for automated detection and recog-
nition of irrigation pivots from satellite imagery. Finally, A. Import the dataset
Zhao et al. (2021) surveyed drone-based object detection
and tracking methods, discussing their advantages and UC MERCED LAND USE included 20 object
drawbacks. This survey highlights the rapid advancements classes, spanning people, animals, vehicles, and in-
in UAV technology and its applications in various fields, door items. Notable categories were person, bird,
including surveillance, environmental monitoring, and agri- car, bottle, and sofa.
culture.
.
III. PROPOSED METHODOLOGY
1. The process involves several key steps, from data
acquisition to the application of CNN
(Convolutional Neural Networks) techniques for
object classification.
2. We trained a deep learning model on the UC
MERCED LAND USE dataset using VGG-16,
shuffle and squeeze net. The code was
predominantly written in MATLAB and utilized
popular deep learning tools in MATLAB such as
deep learning designer.
3. We used the Adam optimizer and tuned several
hyperparameters, including a learning rate of 6001,
regularization term of 1×10-5, a batch size of 32,
and trained the model for 30 epochs.
4. To improve the robustness of our model, we
Fig:-1 The data set preprocessing.
initially employed data augmentation techniques
such as random horizontal flips and random affine
transformations from the deep learning designer B. Split the data into training and testing data
library. The data is split into training and testing data. From the da-
taset, the whole data is split in such a way that 90% of the
5. However, after testing, we found that these data is used for training, and the remaining 10% is used for
techniques did not significantly improve the testing.
model's accuracy. Therefore, we decided to limit
the code without these techniques. C. Training the models
Firstly, MATLAB which contains the deep network
6. Division of the UC merged data into two subsets designer is used to train the data. In the deep network de-
for model validation: 90% for training and 10% signer, the popular nets are called VGG-16, SUFFELE
for testing.
NET and SQUESE NETis used to train the data contained
7. Implementation of various classification in the data set. When we trained the data for the first time
algorithms, including VGG-16Net and Shuffle the training accuracy and testing accuracy came as output
Net and Squeeze Nett o accurately identify low which is nearly 60 and 50 respectively. When we
objects based on drone images. trained the same data again the training accuracy and test-
8. To prevent overfitting during training, we ing accuracy came as output which is nearly 70 and 65
implemented the Early Stopping method. which respectively which is more accurate than the first time.
stops the training process when the validation loss Again, when we trained the data set the training accuracy
stops improving. However, we also found that this was about to 95 and the testing accuracy was about 87
technique did not provide a significant improvement which is much higher than the first and second results.
in accuracy for our model.
V.CONVOLUTIONAL NEURAL NETWORKS convolution reduces this cost by dividing the input channels
into groups and applying convolutions within each group
D. VGG-16 NET separately.
H. Shuffle Net
has been a prominent area of research in photo- [7] W. LI, H. ZHAO, AND X. YU, "LEARNING-BASED SCENE
grammetry, remote sensing, and computer vision. UNDERSTANDING FROM REMOTE SENSING IMAGES: A
Urban applications such as airborne mapping, object SURVEY," JOURNAL OF PARALLEL AND DISTRIBUTED
positioning, and building extraction from high- COMPUTING, VOL. 159, PP. 123-140, 2023.
resolution aerial images demand accurate and effi- [8] T. ZHANG, L. ZHANG, AND L. ZHANG, "AUTOMATED
cient segmentation algorithms. Deep learning mod- DETECTION AND RECOGNITION OF IRRIGATION PIVOTS FROM
els have shown great potential in handling complex SATELLITE IMAGERY USING CONVOLUTIONAL NEURAL
NETWORKS," INTERNATIONAL JOURNAL OF REMOTE SENSING,
scenes and this study focuses on evaluating the se-
VOL. 43, NO. 12, PP. 4943-4962, 2021.
mantic segmentation accuracy of UAV-based imag-
es in urban areas. The proposed method employs a [9] J. ZHAO, Y. ZHANG, AND Z. WANG, "A SURVEY ON DRONE-
deep learning framework based on VGG16 Net. BASED OBJECT DETECTION AND TRACKING METHODS,"
APPLIED SCIENCES, VOL. 11, NO. 20, ARTICLE 11320,
This architecture extracts and classifies features
through layers of convolution, max pooling, activa-