PakhiChini - Automatic Bird Species Identification Using Deep Learning
PakhiChini - Automatic Bird Species Identification Using Deep Learning
Abstract—The sector of entire image classification has recently of data and need to train the machine for classifying the birds
found outstanding accomplishment in Convolutaional Neural species.The process of image classification is the process of
Network. Lately, leveraging pretrained Convolutional Neural classifying objects with the help of dataset and divided them
Networks (CNN) offer a much better illustration of an input
image. ResNet [1] is one the top pretrained CNN networks that into various categories. In this paper, we faced difficulties
is mostly used in deep learning as pretrained CNN model. In while working with huge dataset and categorization.
this paper, we propose a deep learning model that is capable In past years, researcher showed interest to solve this issue.
of identifying individual birds from an input image. We tend We studied previous works and we came to this conclusion that
to additionally leverage pretrained ResNet model as pretrained not enough advancement has been made in bird classification
CNN networks with base model to encode the images. Usually,
birds are found in diverse scenarios which are seen in different with respect to advancement in deep learning. So, we want
sizes, shapes, sizes, colors from human point of view. Conducted to pursue this problem with the hope to achieve promising
experiments will be using the entity of different dimensions, cast result. In this paper, we have worked on automatic bird species
and celerity to study recognition performance. We achieved a identification in our context.
top-5 accuracy of 97.98% on our classifications. Our contributions in the paper are following,
Index Terms—Deep Neural Network, Computer Vision, Con-
volutional Neural Network, Image Classification, Image Recog- • We proposed an innovative model and the model is the
nition, Transfer Learning, Machine learning, Bird species classi- first work in this context.
fication, ResNet • We have collected birds data according to the geograph-
ical context of Bangladesh.
I. I NTRODUCTION • We leverage pretrained CNN networks which gives us
Bangladesh is a country of various kinds of birds. We do better representations of an image.
not even know the name of every birds around us. Birds hold Following paper is divided into 8 sections. Section II
a very important place in our culture. They are found chirping and III contains related works and model architectures and
everywhere for example cities and villages and they can even section IV, V will discuss about experimental details and result
be recognised by most of the people by their sounds. The analysis. Finally Analysis, web development, future scope and
composers, writers and the musicians from all over the world conclusion will be discussed in the section VI, VII, VIII, IX.
often find a source of dedication and inspiration from birds’
sounds. II. R ELATED W ORKS
Human can classify birds species easily but it is a hard job Image classification can be split into three categories: Super-
for machine to classify. Human needs a great deal of effort vised, Unsupervised and Semi-Supervised classifications. In
to stockpile information about birds which is also a costlier Supervised classification approach, image and corresponding
process. In this case, large scale of data processing about birds label is required to train a classification model. The Unsuper-
needs to be provided by a system which will serve and benefit vised classification is a reverse of Supervised classification
the governmental agencies, researchers etc. where training is not required to classify an image. Semi-
Various kinds of challenges are faced by ornithologists Supervised is a technique to take advantage of both Supervised
since decades regarding the identification of birds species. and Unsupervised techniques.
Ornithologists study the characteristics and attributes of birds With the advancement of machine learning algorithm, im-
and distinguished them by their living in the atmosphere, age identification is performed using machine learning algo-
their ecological influence, biology etc. The ornithology experts rithms.In [2] SVM (support vector machine) algorithm is used
identify the birds based on Linnaeus: Phylum, Kingdom, as a recognition algorithm. To classify two different images,
Order, Class, Family, Species. a decision tree was used. The classification problem can be
To recognize a bird from an image, the machine first needs solved using Data Mining techniques. In [3] recognized the
to identify the part of the bird and needs to ignore the birds species using data mining techniques.
background that is not relevant. It is easy for human to detect. [4], [5] applied Deep Neural Networks (DNN) techniques
To teach the machine, we have to collect numerous number to identify images and showed that bird species can be
978-1-7281-6823-4/20/$31.00 2020
c IEEE 1
Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on June 01,2023 at 15:24:57 UTC from IEEE Xplore. Restrictions apply.
effectively identified using DNN technique [6]. For compact methods to get the better performance in classification tasks.
optical categorization, [7] proposed an architecture which [17] leverage pretrained VGG-16 model to classify birds.
proceeds towards expert performance in the identification of
bird species. [7] carried out a particular inspection of state- III. M ETHODOLOGY A ND M ODEL A RCHITECTURE
of-the-art convolutional attribute implementations as well as In this paper, We proposed a novel deep learning model to
calibrated feature learning for compact classification. [7] Pose- classify bird species. We also proposed another deep learning
normalized is used in the model which combines smaller model using pretrained ResNet architecture. First we trained
level feature layers with separation routines and impartial our dataset with our base model then implemented the base
image features at top level feature layers works best. The model in the various layers of ResNet.
trial on bird species recognition moves forward with state-
A. Base Model Architecture
of-the-art performance with a huge development of accurate
classification rates over past technique which is 75% vs. 55- Figure 1 shows base model consists of CNN networks that
65%. is responsible for extracting features X = [x1 , x2 , x3 ....., xn ]
In past years, sound based classification has gained pop- from an input images. The image size is 224*224*3 RGB
ularity. There are various works have been done based on color channel. We used down sampling process by an average
the sound of the birds species. [4] proposed a technique of pooling layer with stride of 2. Two Fully connected layers
audio classification for identifying bird species. With the use used along with a Relu activation function. The dropout is
of nearest neighbour matching or decision trees using extracted used to keep few neuron frozen. Fully connected layers output
instruction, [4] shows the method of speech recognition and is placed into the softmax layer to predict the class.
new development in the field of deep learning. All these sound
based works have been done using supervised learning. [8]
Explored that the problem can be solved using unsupervised
learning to classify the birds species from sound sample.
Unsupervised approach is applied to acquire approximate note
models.
[9] Annotation errors are incredibly robust in learning
algorithms. The level corruption of training data can not
produce higher accuracy in testing error which provided that
training set has sample size.
[6] Wild species identification algorithm has been proposed
in this work that is based on the deep convolutional neural
network for animal classification. It is comprehensible that
the suggested deep convolutional neural network based wild
species identification attains higher level performance. [10]
explores large-vocabulary of bird species identification meth-
ods which is complex because of the flight calls classification.
Specifically, they used unsupervised dictionary learning and
established a “shallow learning” proceed towards a deep con-
volutional neural network that is bound to data augmentation.
At the end, they learnt to bind the models by utilizing late-
fusion method that can further be developed the outcome, Fig. 1. Base Model Architecture
acquiring an accuracy of state-of-the-art classification.
[11] Remarkable improvements have been established in
current years on object detection and identification with the B. Pretrained Model Architecture
expeditious expansion in the field of deep learning as well Image classification is done to classify a image according to
as computer vision, exceptionally deep convolutional neural its category. In deep learning terms, image classification can
networks (CNN). This paper has come up with research be done using transfer learning. Many state of the art results
about the detection of small-object in low-resolution and the are based using transfer learning [18], [19]. ResNet is one of
performance based on the methods of deep learning has been the well liked pretrained model that conventionally used as
assessed using a completely new dataset for birds identification pretrained CNN model to get advantage of transfer learning.
instead of using a general dataset for object recognition. ResNet trained with nearly 1.2 million images on ImageNet
[12] Formed an exceptional framework to focus on neural dataset and it has around more than 1000 categories. ResNet
networks specially on convolutional kernels,where pretrained is different and suitable to our work in the sense that it allows
networks have been altered to tasks which are specialized and to take advantage of transfer learning.
systematic inference has been equipped which focused on the Using CNNs, naturally the accuracy of the network is found
concept of transfer learning. In [13]–[16] proposed different elevated by expanding the depth of the network. This is with
2 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)
Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on June 01,2023 at 15:24:57 UTC from IEEE Xplore. Restrictions apply.
the assumption that over-fitting is not an issue. However, when to one. The equation (1) shows the softmax which is given
the network depth is increased, the signal needed to update the below,
weights become comparatively smaller by the time they reach exi
si = xi (1)
ie
the earlier layers. This results in Vanishing Gradients.
Another issue with training a network with large depth
is that it becomes very difficult to optimize the parameters The classical block in ResNet is a residual block. The main
and so, simply adding the networks raises the training error. objective of ResNet is to skip layer one or many by introducing
Residual Networks (ResNet), solves this issue by creating the a pretended “identity shortcut connection”.
network using Residual Model in Fig.2. Let G(y) is considered as an basic plotting that to be suitable
by a few loaded layers, denoting with y the inputs to the initial
of these layers. If one presumes that numerous down layers can
concurrently estimate a complex functions 2, then it is similar
to presume that they can relate it to estimate the residual
functions, i.e., G(y)-y (suppose that the data of input as well as
output are in same proportion). Alternately anticipated loaded
layers to estimate G(y), to understand easily we concede these
layers to estimate a residual function
2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4) 3
Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on June 01,2023 at 15:24:57 UTC from IEEE Xplore. Restrictions apply.
Impact of low resolution on the image recognition task
began our evaluation by measuring how the quality of images
affects the classification accuracy in the situation in which low
resolution is not accounted for. So for low resolution image
quality, model will struggle to predict.
The training procedure is stopped if the training accuracy
doesn’t increase in consecutive six epochs.
V. R ESULT
In this paper, we present the image classification results
achieved by our proposed two different deep learning models.
The classification performance is estimated using two mea-
sures: the top-1 and top-5 error.
Table I represent the performance of our proposed two deep
learning models. We achieved top-5 accuracy of 63.48 and
Fig. 4. Instance from dataset
96.71 using pretrained ResNet model.
TABLE I
P ERFORMANCE OF TWO MODELS : BASE MODEL AND PRETRAINED MODEL
Model Config Top-1 val Top-5 val Top-1 test Top-5 test
accuracy accuracy accuracy accuracy
Base Model 32.78 65.32 32.01 63.48
ResNet18 82.00 96.62 82.25 96.71
4 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)
Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on June 01,2023 at 15:24:57 UTC from IEEE Xplore. Restrictions apply.
we took advantage of transfer learning. Although, there were
more different pretrained model, we choose ResNet architec-
ture because of its top-5 accuracy. ResNet has a better top-5
accuracy than others pretrained models. Table III shows the
different pretrained CNN networks and their top-5 accuracy.
TABLE III
C OMPARISON WITH DIFFERENT PRETRAINED CNN NETWORKS
VII. W EB D EPLOYMENT
Our system is using deep learning algorithms with high
accuracy and we implemented cutting-edge vision automation
to get fast results with zero development costs. To demon-
strate the robustness and compatibility of our developed deep
learning model, a web-based API service was developed using
Flask micro-framework. This development shows that our deep
learning model can be beneficial to classify the bird species.
Fig.9 shows a diagram of our API service.
2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4) 5
Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on June 01,2023 at 15:24:57 UTC from IEEE Xplore. Restrictions apply.
We have designed a web interface where an user can input [3] E. Vilches, I. A. Escobar, E. E. Vallejo, and C. E. Taylor, “Data mining
a bird’s image, the image is loaded into the model which applied to acoustic bird species recognition,” in 18th International
Conference on Pattern Recognition (ICPR’06), vol. 3. IEEE, 2006,
processes and identify the specific species of that image pp. 400–403.
and returns the predicted result by calling API service. Our [4] E. Sprengel, M. Jaggi, Y. Kilcher, and T. Hofmann, “Audio based bird
deep learning model is responsible to predict/classify the bird species identification using deep learning techniques,” Tech. Rep., 2016.
[5] A. Harma, “Automatic identification of bird species based on sinu-
species. Fig.10 shows our model behavior on web based soidal modeling of syllables,” in 2003 IEEE International Confer-
service. ence on Acoustics, Speech, and Signal Processing, 2003. Proceed-
ings.(ICASSP’03)., vol. 5. IEEE, 2003, pp. V–545.
[6] G. Chen, T. X. Han, Z. He, R. Kays, and T. Forrester, “Deep convolu-
tional neural network based species recognition for wild animal mon-
itoring,” in 2014 IEEE International Conference on Image Processing
(ICIP). IEEE, 2014, pp. 858–862.
[7] S. Branson, G. V. Horn, S. J. Belongie, and P. Perona, “Bird
species categorization using pose normalized deep convolutional
nets,” CoRR, vol. abs/1406.2952, 2014. [Online]. Available:
http://arxiv.org/abs/1406.2952
[8] M. Graciarena, M. Delplanche, E. Shriberg, and A. Stolcke, “Bird
species recognition combining acoustic and sequence modeling,” in
2011 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2011, pp. 341–344.
[9] G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis,
P. Perona, and S. Belongie, “Building a bird recognition app and large
scale dataset with citizen scientists: The fine print in fine-grained dataset
collection,” in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), June 2015.
Fig. 10. Model behavior on web based service [10] J. Salamon, J. P. Bello, A. Farnsworth, and S. Kelling, “Fusing shal-
low and deep learning for bioacoustic bird species classification,” in
2017 IEEE International Conference on Acoustics, Speech and Signal
VIII. FUTURE SCOPE Processing (ICASSP). IEEE, 2017, pp. 141–145.
[11] Y. Liu, P. Sun, M. R. Highsmith, N. M. Wergeles, J. Sartwell,
In near future we will collect more data in our context to A. Raedeke, M. Mitchell, H. Hagy, A. D. Gilbert, B. Lubinski et al.,
“Performance comparison of deep learning techniques for recognizing
increase the precision of our system. Also for future study, birds in aerial images,” in 2018 IEEE Third International Conference
with an aim to achieve greater accuracy, more classification on Data Science in Cyberspace (DSC). IEEE, 2018, pp. 317–324.
models will be investigated. [12] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning
convolutional neural networks for resource efficient transfer learning,”
Drawing an analogy between different works in the relevant arXiv preprint arXiv:1611.06440, vol. 3, 2016.
work is not a simple job because of individual experimental [13] M. d. S. de Arruda, G. Spadon, J. F. Rodrigues, W. N. Gonçalves, and
protocols, so derived from the outcomes put up in this research B. B. Machado, “Recognition of endangered pantanal animal species
using deep learning methods,” in 2018 International Joint Conference
and the implementation of the affiliated works, we can also on Neural Networks (IJCNN), July 2018, pp. 1–8.
contend intriguing field to study for bird species identification [14] M. Lasseck, “Image-based plant species identification with deep convo-
and we plan to work with SIFT features in future. lutional neural networks.” in CLEF (Working Notes), 2017.
[15] A. Fritzler, S. Koitka, and C. M. Friedrich, “Recognizing bird species
In the next years it is evident that, deep learning is redefining in audio files using transfer learning.” in CLEF (Working Notes), 2017.
the margins of the state-of-the-art and opens the door to more [16] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and
distant development as in this area beforehand. X. Tang, “Residual attention network for image classification,” in The
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
July 2017.
IX. CONCLUSION [17] S. Islam, S. I. A. Khan, M. M. Abedin, K. M. Habibullah, and A. K.
Birds classification is one of the challenging task in field Das, “Bird species classification from an image using vgg-16 network,”
in Proceedings of the 2019 7th International Conference on Computer
of Deep Learning. There are some research works has been and Communications Management, 2019, pp. 38–42.
done to classify bird species based on western cultures. Due to [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
shortage of Asian based birds dataset, the work done on this with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.
topic is rather limited. Keeping in mind that, we build a birds [19] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
dataset based on our context. We also proposed two different large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
model and showed that our proposed pretraiend ResNet model [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in The IEEE Conference on Computer Vision and Pattern
achieved greater accuracy in compare to our based model. Our Recognition (CVPR), June 2016.
best the model has shown 97.98% accuracy in projecting of
identifying bird species.
R EFERENCES
[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[2] S. Fagerlund, “Bird species recognition using support vector machines,”
EURASIP Journal on Applied Signal Processing, vol. 2007, no. 1, pp.
64–64, 2007.
6 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)
Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on June 01,2023 at 15:24:57 UTC from IEEE Xplore. Restrictions apply.