0% found this document useful (0 votes)
55 views6 pages

Detection and Classification of Indian Classical Bharathanatyam Mudras Using Enhanced Deep Learning Technique

Detection_and_Classification_of_Indian_Classical_Bharathanatyam_Mudras_Using_Enhanced_Deep_Learning_Technique

Uploaded by

Vikas Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

Detection and Classification of Indian Classical Bharathanatyam Mudras Using Enhanced Deep Learning Technique

Detection_and_Classification_of_Indian_Classical_Bharathanatyam_Mudras_Using_Enhanced_Deep_Learning_Technique

Uploaded by

Vikas Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Detection and Classification of Indian Classical

2022 International Conference on Innovations in Science and Technology for Sustainable Development (ICISTSD) | 978-1-6654-9936-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICISTSD55159.2022.10010550

Bharathanatyam Mudras Using Enhanced Deep


Learning Technique

1st Sneha Haridas 2nd Dr. Ramani Bai V


Dept. of Computer Science and Engineering HOD,Computer Science and Engineering
Vidya Academy Of Science and Technology Vidya Academy Of Science and Technology
Thrissur, India Thrissur, India
[email protected] [email protected]
[email protected]

Abstract—Indian Classical dance forms like bharathanatyam may be employed in dance forms. It mentions the dance
are composed of advanced hand gestures, facial expressions postures referred to as as karanas, Nrita Hastas, Asamyukta
moreover as body moments. Because of its complexity, identifying hastas, and Samyukta hastas. Bharathanatyam have twenty
each mudra in bharathanatyam is extremely difficult. This paper
demonstrates a massive Convolutional Neural Network (CNN) eight asamyukta hastas and twenty three samyukta hastas.
that was trained on Google Colaboratory using a single step The hasta is conjointly termed as mudra. To perform or to
model, You Only Look Once version 3 (YOLOv3), to analyze represent Samyukta gestures both the hands are used whereas
images in the dataset and detect and classify the mudras. Open for representing Asamyukta mudras just one hand is used.
datasets of mudras are not presently available. So Bharatanatyam Every hasta will be accustomed represent a range of thoughts,
mudra dataset of single hand gesture images of 28 classes was
created. This proposed system is, as far as we know, the first ideas, and objects. Shapes of mudras plays an important role
attempt in this subject. YOLOv3 was never used to detect mu- within the mudra classification system. several works are
dras. YOLOv3 divides the image into sectors, predicts bounding planned for form descriptions that are employed in varied
boxes, and calculates probability for each. These bounding boxes applications. within the dance forms, to convey the story line
are then weighted according to the projected probability, and visual expressions, hand gestures and facial expressions are
the model is then able to detect the object based on the final
weights. The neural network was able to correctly generate test used.
data after being trained, with a mean average precision (mAP) People who are unfamiliar with the significance of dancing
of 73%. gestures will find this helpful. The work is a troublesome one
Index Terms—Convolutional Neural Network(CNN), Classifi- as a result of there are mudras that tally different mudras and
cation, Deep Learning, YOLOv3, Mudras will lead to mudra misclassification. The aim of this study is to
form a dataset of twenty eight hand gestures and to classify the
I. I NTRODUCTION images into corresponding classes. The study focuses on each
Dance could be a sort of communication that’s fueled by single hand gestures. To classify the images, a deep learning
music and adheres to a collection of rules that adjust reckoning technique called Yolov3 is used.
on the form. A series of distinct and elementary activity The following is a list of the remaining components of this
units (action elements) mix with music to embody a plan paper: Section II is devoted to a review of the literature. The
or specific emotions in dance [3]. Bharatanatyam, a classical methods employed in this system is introduced in Section
dance type that originated in India’s southern states, is on the III. The implementation section is shown in Section IV. The
approach of being absolutely mechanised, because of a severe findings is discussed in Section V and the paper is concluded
shortage of competent and actuated teachers/gurus. Teachers in Section VI.
are the authority who will perceive the precise linguistics
II. L ITERATURE S URVEY
which means and feelings of those dance syllabus. Each of
the inner and outer feelings that was sent by the dancers Mampi Devi et al. [1] focused on classifying the asamyukta
is troublesome for novice learners and also for the general hastas of Sattriya dance form. The work consists of a clas-
audience to understand. The Indian classical dance repository sification mechanism which runs in two phases. Based on
Natyashastra mentions regarding the dance syllables which Structural similarity and Medical Axis Transformation(MAT),
the images from the data set were classified in to 29 classes.
As mentioned in the previous works, the application of this

978-1-6654-9936-1/22/$31.00 ©2022 IEEE


18
Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on September 08,2024 at 14:22:57 UTC from IEEE Xplore. Restrictions apply.
work also lies in the self learning and e-learning domains. The incorporating CNN with RNN. The prediction achieved an
work focused on improving the recognition accuracy. So PBF accuracy of 80.87 % properly recognising 370 gestures out
kernel is cincorporated with the SVM model and the accuracy of 460 within the check set, whereas the pool layer approach
was found to be 97.24%. achieved a score of 95.21 % by correctly recognising 438
Soumitra Samanta et al. [2] used YouTube videos to create gestures.
their own Indian Classical Dance dataset. Each course con- Nikitha religious belief [8] classified totally different dance
tains 30 videos of varying resolutions (maximum resolution: forms. In this, they used a deep convolutional neural network
400x350). Based on the proposed sparse representation, the model using ResNet50. As per the authors, an accuracy score
dictionary learning technique entails using a movement de- of 0.911 is obtained for the work. They planned a piece
scriptor based on HOOF to represent each frame of the video that categorises dance forms thoroughly into eight categories.
(Histogram of Oriented Optical Flow). When using SVM to Furthermore, in terms of performance evaluation, the model
classify video frames, 86.67% is achieved. outperformed some recent works. Image thresholding and
Kishore et al. [3] proposes classifying actions of Indian sampling is completed whereas feeding the model with input
classical dance with the assistance of a convolutional neural images. The dataset that they need used consists of 800
net- works (CNN). By taking the offline and online videos images.
of dance steps, act recognition is done. The web videos Lakshmi Tulasi Bhavanam et al. [9] proposed a model which
were taken from live performances and YouTube. The offline combines the power of Support Vector Machine(SVM) and
videos are created by cinematography with the help of subjects Convolutional Neural Network(CNN) to classify the images
playacting two hundred mudras or acquainted dance steps. of dance form into appropriate classes. The work focused on
During this process, ten subjects were chosen to perform classifying and recognising Kathakali dance mudras into 24
the dance steps in varied backgrounds. Eight totally different classes. The data set used for the work is a self made one.
samples of variable size is employed for coaching the CNN In Kathakali, the performer uses 24 different types of hand
model. Every sample has different subject set. Out of 10 gestures to tell the story. The data set consisted of 654 images
subject samples, the remaining 2 samples were mounted to of the mudras of Kathakali, with 27 images for each mudra.
check the CNN model. The model consists of five, two and one Both the left and right hands were used to show the mudras.
each layers for corrected Linear Unit(ReLU), random pooling, There are two steps in SVM classification - preprocessing and
dense and SoftMax layers respectively. feature extraction. Using the output of the feature extraction
Anami and Bhandage et al. [4] proposed a method for iden- phase, the images were classified. The model showed an
tifying images of Bharatanatyam dance mudra. Various mudras accuracy of 74%. By examining these existing works, it was
and vertical-horizontal intersections were used as features in discovered that no work had been done in Yolov3.
this model. Mudras were classified as either conflicting or
nonconflicting. A rule based classifier is deployed in this work
to classify images into 24 classes of Bharatanatyam. And the
average reported accuracy is 95.25%.
Lai et al. [5] proposed a model for automated hand gesture
recognition that combines the power of CNN and RNN. The
model used both the depth and skeleton data. Both the types
of data can be used to train the neural networks which are
intended to recognise the hand gestures seperately. This work
focused on applying CNN to extract the prominent spatial
information from the depth data taken. To extract the temporal
and spatial information, various combinations of skeleton
and depth information were conducted. Overall accuracy of
85.46% is obtained for the 14/28 dataset which is a dynamic
dataset. In the future, the model could be extended to recognise
human activity, which could be used in variety of human
assistance applications and scenarios.
Basavaraj and Venkatesh et al.[6] approached the matter
with a 3 stage mechanism to classify the mudras. In second
stage, the Humoments, chemist values and intersections are
extracted out within the third stage, the mudras are classifed
mistreatment an ANN.
S. Masood et al. [7] propose a vision-based mechanism to
decrypt Argentine Republic signing gestures. They achieved
a 95.217 % accuracy. This accuracy worth gave an insight Fig. 1. Asamyuktha hastas
that spatial and temporal features will be simply extracted by

19
Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on September 08,2024 at 14:22:57 UTC from IEEE Xplore. Restrictions apply.
III. M ETHODOLOGY
A. Proposed System

Fig. 3. Annotated images using bounding box

Fig. 2. Proposed System

Bharathanatyam single hand mudras are classified using


yolov3. Yolov3 is a object detection algorithm that identifies
Fig. 4. Coordinates of bounding box with in the image
each and every specific objects in videos, images etc. This
work uses images as input. Dataset was created for this. This
dataset was split into training and testing set. Then the images
are trained for 10 to 14 hours and thus a model was created. 1) Darknet-53: Darknet-53 is used to extract features. It
This model was used to classify the images. is primarily made up of 3 x 3 and 1 x 1 filters with skip
connections, similar to ResNet’s residual network. Darknet-53
B. Dataset Preparation is a deeper feature extractor architecture used in YOLOv3.
2) Convolution layers in YOLOv3: It has 53 convolutional
Open datasets are not available. So a dataset has been layers, each of which is followed by a batch normalisation
created for this. The dataset consist of 5000 images and layer and the activation of the Leaky ReLU. CNN is used
their annotated text. Images are annotated using labelimg. The to predict the class as well as bounding box simultaneously.
annotated text are in Yolo darknet form. Image augmentation Numerous filters are convolved on the pictures using the con-
is a technique that is used to expand the training dataset volution layer, which results in multiple feature maps. There
and it will create different versions of similar contents. In is no pooling and the feature maps are down sampled using a
this proposed system, the images are horizontally as well convolutional layer with stride 2. It aids in the prevention of
as vertically flipped and rotation of 90° is performed with low-level feature loss, which is frequently linked to pooling.
clockwise, counter-clockwise, upside down directions. Images
are preprocessed and resized into 416*416. 3) Bounding Box: It is an outline that highlights the objects
in the images. Each bounding box have 4 attributes.
C. Model Architecture • Height (bh)
• Width (bw)
YOLO is one of the fastest and accurate object detecting
• Bounding box center (bx,by)
algorithm as compared to R-CNN, Fast R-CNN, Faster R-
• class (c)
CNN[11]. You Only Look Once Version 3 (YOLOv3) uses
a variant of Darknet,which originally has 53 layer network Bounding boxes are created by annotating the images. A
trained on Imagenet. For detection, 53 more layers are put text file is thus created including the XMin, XMax, YMin, and
on top of it, giving YOLOv3 a 106-layer fully convolutional YMax coordinates of the annotated bounding boxes of the mu-
underlying architecture. Yolov3 makes detections at 3 different dra. The bounding boxes are weighted using the probabilities,
scales. Prediction of an image is done in a single algorithm. and the model utilises the final weights to do detection.

20
Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on September 08,2024 at 14:22:57 UTC from IEEE Xplore. Restrictions apply.
object.data file with 28 classes, a train data directory, test data,
”object.names,” and a weights path that will be saved in the
backup folder.

Fig. 7. Configuration Steps

Fig. 5. Network structure of YOLOv3 [11]

Fig. 8. YOLOv3 custom configuration file

Fig. 6. CNN architecture of Darknet-53 B. Training


Yolov3 custom configuration file and yolov3 default weight
IV. IMPLEMENTATION file were used to train. After each round of training, a weight
file is generated, which is then used for testing.
A. Parameters Initialization
YOLOv3 has a configuration file that provides information
such as the class number, max-batches, filters, batch, subdivi-
sion, and steps, among other things. For this proposed model,
I created a custom cfg file. There are 28 classes in all. As Fig. 9. Generated Weight File
a result, the filter value is 99. (number of classes + 5)*3 is
the formula for determining the filter. For training, the batch
size is 24 and the subdivision is 16. For example, the batch C. YOLOv3 Algorithm Applied
size in the default yolov3.cfg file is 64 and subdivision is 16, An image is passed into the YOLOv3 model as an input.
meaning 4 images will be loaded at once, and it will take 16 This object detector searches through an image for the coor-
of these mini batches to complete one iteration. dinates that are present. It basically divides the input into a
Create a ”object.names” file that contains the names of grid and examines the target object’s attributes from that grid.
the classes that the model is looking for. Then there’s an The features that were recognised with a high confidence rate

21
Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on September 08,2024 at 14:22:57 UTC from IEEE Xplore. Restrictions apply.
([email protected]). Yolov3 final weight file and custom configura-
tion file were used for testing.

Fig. 10. Comparison with other detection methods[10]

in nearby cells are combined in one place to provide model


output. Fig. 12. Average precison chart

Fig. 11. Mudra Detected

V. EXPERIMENTAL RESULT
A. Environmental Setup
To do this, Opencv and Python 3.7.13 were used to train the
algorithm. Version 11.2 of CUDA is utilised. On a personal Fig. 13. Graph of losses Vs iterations
PC with a local 4GB GPU, the algorithm was trained. The
Colab notebook was utilised. On a GPU, training the neural
network model on dataset took about 10 to 11 hours. VI. C ONCLUSION AND F UTURE W ORK
Indian classical dances such as Kathakali and
TABLE I Bharathanatyam are made up of hand gestures, body
S PECIFICATIONS OF YOLOV 3
movements, and facial expressions which are inlined with
Specifications Parameters used the background music. Mudras are the basic elements of
Batch size 24 Bharathanatyam. Understanding the bharathanatyam mudras
Subdivision 16 are very much difficult because of the complexities arises with
Filters 99
Classes 28
the hand-gestures. Public dataset availability is comparatively
Max-batches 8000 lower, so an expert dancer is required to create the dataset.
Loss function Binary cross-entropy Bharathanatyam has 28 asamyuktha mudras that are classified
with an accuracy of 73% through this work. This work is
mainly focused on mudras. It can be extended where the
B. Test Result mudras go along with the adav. In future, an extension of
Twenty of the 28 classes have precision ranging from 70 this methodology can be done to check whether the mudras
to 100 %. Others have a precision of less than 50%. This are correct or not along with adavus. So it will be helpful
proposed system has 73 percentage mean average precision for those who want to learn the dance online. The foreigners

22
Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on September 08,2024 at 14:22:57 UTC from IEEE Xplore. Restrictions apply.
who is interested to know and learn the Indian dance forms
can get the meaning or decode the mudras of Bharathanatyam
dance form. .
R EFERENCES
[1] M. Devi and S. Saharia, “A two-level classification scheme for single-
hand gestures of Sattriya dance,” ICADW, Guwahati, 2016.
[2] S. Samanta, “Indian Classical Dance classification by learning dance
pose bases,” IEEE Workshop on the Applications of Computer Vision.
[3] K.V.V. Kumar, P.V.V. Kishore, “Indian Classical Dance Mudra Classifi-
cation Using HOG Features and SVM Classifier,” International Journal
of Electrical and Computer Engineering (IJECE), 2017
[4] Anami BS, Bhandage VA, “ A vertical-horizontal-intersections feature
based method for identification of bharatanatyam double hand mudra
images,” Springer,2018.
[5] Kenneth Lai and Svetlana N. Yanushkevich, “CNN+RNN Depth and
Skeleton based Dynamic Hand Gesture Recognition,” IEEE, 2018.
[6] Basavaraj S. Anami and Venkatesh A. Bhandage, “A Compara-
tive Study of Suitability of Certain Features in Classification of
Bharatanatyam Mudra Images Using Artificial Neural Network,” part
of Springer Nature, 2018.
[7] Sarfaraz Masood, Adhyan Srivastava, Harish Chandra Thuwal and
Musheer Ahmad, “Real-Time Sign Language Gesture (Word) Recog-
nition from Video Sequences Using CNN and RNN,” Advances in
Intelligent Systems and Computing, Springer
[8] Nikita Jain,Vibhuti Bansal, Deepali Virmani, Vedika Gupta,Lorenzo
Salas-Morera and Laura Garcia-Hernandez, “An Enhanced Deep Con-
volutional Neural Network for Classifying Indian Classical Dance
Forms,” MDPI,2020.
[9] Lakshmi Tulasi Bhavanam and Ganesh Neelakanta Iyer, “On the
Classification of Kathakali Hand Gestures Using Support Vector Ma-
chines and Convolutional Neural Networks,” International Conference
on Artificial Intelligence and Signal Processing (AISP), 2020.
[10] Nurul Iman Hassan,Fadhlan Hafizhelmi Kamaru Zaman,Nooritawati
Md. Tahir, Habibah Hashim, “People Detection System Using YOLOv3
Algorithm,” IEEE , 2020.
[11] Joseph Redmon, Ali Farhadi, “YOLOv3: An Incremental Improve-
ment,” University of Washington,2020.
[12] Abdullah Mujahid,Mazhar Javed Awan,Awais Yasin,Mazin Abed Mo-
hammed, “Real-Time Hand Gesture Recognition Based on Deep Learn-
ing YOLOv3 Model,” Applied Sciences, 2021.

23
Authorized licensed use limited to: Indian Institute of Information Technology Kottayam. Downloaded on September 08,2024 at 14:22:57 UTC from IEEE Xplore. Restrictions apply.

You might also like