FPGA Based Implementation of Binarized Neural Network For Sign Language Application
FPGA Based Implementation of Binarized Neural Network For Sign Language Application
Abstract—In the last few years, there is an increasing expensive which makes their deployment infeasible on
demand for developing efficient solutions for computer embedded hardware platforms like FPGAs for targeting
vision-related tasks on FPGA hardware due to its quick applications. This issue can be resolved by applying model
prototyping and computing capabilities. Therefore, this work
aims to implement a low precision Binarized Neural Network compression techniques such as Pruning [10], low-rank
(BNN) using a Python framework on the Xilinx PYNQ- approximation of convolutional kernels [11], Binarization
Z2 embedded platform to tackle the challenging problem of neural networks to reduce overall computational load
of Sign Language recognition. More specifically, the FINN [12].
framework is adopted and the BNN topology is modified to
adapt large resolution (i.e 64x64) to perform classification Several existing literature shows the idea of imple-
of proposed Indian Sign Language (ISL) gestures into cor- menting DL algorithms to tackle the challenges of Sign
responding numbers. In addition, data augmentation tech- Language application. In [6] authors have performed the
niques are also applied to improve the overall performance
analysis of various Deep Neural Networks (DNNs) based
of the neural network. Furthermore, hardware/software co-
verification of BNN topology is performed to validate the on pre-trained VGG-16 with transfer learning and fine-
accuracy after implementing it onto hardware. Extensive tuning, hierarchical networks on ISL gestures. A method-
experimental results show that it achieves a classification ology based on Artificial Neural Network (ANN) and
rate of 843.8 frames per second (FPS) on PYNQ-Z2 FPGA handcrafted feature extraction is reported in [13] used to
which delivers higher performance as compared to previous
recognize ISL gestures and translate them into English.
works. Also, the post-implementation results are analyzed in
terms of resource utilization and power consumption. A novel approach based on ANN is discussed in [14] for
Index Terms—Binarized Neural Network(BNN), Com- ISL number recognition using a leap motion controller.
puter Vision, Sign Language, FPGA [15] shows sensor-based methodology where static and
dynamic signs of ISL are collected using flex sensors
I. I NTRODUCTION and classification of received data is performed using
Recognition of Sign Language is an interesting and Long Short-Term Memory (LSTM) networks in real-
challenging task in computer vision systems [1]–[4]. This time. In [16] authors have proposed one-dimensional CNN
vision-based language is a primary means of communi- array architecture for recognizing ISL signs using signals
cation between hearing and vocally-impaired community recorded from a wearable IMU device.
and ordinary people. In recent years, researchers have However, all of the above-mentioned approaches are
shown tremendous interest in solving the complexity of performing well with accuracy and also, their implemen-
sign language gestures [5]–[7]. The lighting conditions, tation is either performed using sensors or on CPU/GPU
various hand movements and positions, constrained en- platforms but none of these provide efficient solutions
vironment, and intensive CPU computations are the most for deploying this application on low-power embedded
significant challenges that degrade the performance of sign devices. Yet, to date [17] is the only work that shows
language recognition in the real world. FPGA-based CNN implementation using an 8-bit dynamic
The recent progress of Deep Learning (DL) algorithms fixed point scheme for recognizing gestures of Swedish
especially Convolutional neural networks (CNNs) plays sign language. The authors have reused and modified the
a significant role in solving complicated tasks such as Zynqnet design to measure the performance on the FPGA
image recognition [8] and object detection [9]. However, platform. However, to optimize parameters of neural net-
they consume massive power and are computationally works this approach uses 8-bit quantization.
304
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 03,2024 at 10:57:53 UTC from IEEE Xplore. Restrictions apply.
Figure 3. Design Flow
Table I
M ODIFIED B INARIZED N EURAL N ETWORK T OPOLOGY
mapping to FPGAs. Third, The IP core generated from the
Input Output C++ network description and parameters is used for the
Layers Feature Feature No of filters, synthesis and generation of executable bit file that contains
Map(IFM) Map(OFM) Kernel Size
Binary CONV2D (64, 64, 3) (62, 62, 32) 32, (3, 3)
the neural network topology. The bit file generated from
Binary CONV2D (62, 62, 32) (60, 60, 32) 32, (3, 3) VIVADO is loaded in programmable logic (PL) as an
Max Pool 2D (60, 60, 32) (30, 30, 32) (2, 2) overlay onto the jupyter notebook environment of PYNQ.
Binary CONV2D (30, 30, 32) (28, 28, 64) 64, (3, 3) Here, PYNQ provides a python language library to call an
Max Pool 2D (28, 28, 64) (14, 14, 64) (2, 2)
Binary CONV2D (14, 14, 64) (12, 12, 64) 64, (3, 3) IP core which is generated by HLS and HDL. Further, the
Binary CONV2D (12, 12, 64) (10, 10, 64) 64, (3, 3) PYNQ-Z2 board contains pre-installed PYNQ hardware
Max Pool 2D (10 ,10, 64) (5, 5, 64) (2, 2) libraries that are used to measure the performance of
Binary CONV2D (5, 5, 64) (3, 3, 128) 128, (3, 3)
gestures classification.
Binary CONV2D (3, 3, 128) (1, 1, 128) 128, (3, 3)
Flatten (1, 1, 128) 128 —
Binary Dense 128 512 — IV. E XPERIMENTAL S ET-U P AND R ESULTS
Binary Dense 512 512 — A. Experimental Set-Up
In this paper, the PYNQ-Z2 FPGA board is selected as
operation between weights and activation with XNOR the testing platform that is used to accelerate the software
and pop count operations [19]. Instead of the binary process using PYNQ open-source framework as shown
dot product, the bitwise XNOR operations are performed in Figure 4. This board is the fusion of an FPGA and
and the pop count operation is used as a counter to set ARM cortex A9 processor along with peripherals that can
bits using results of XNOR operations. As compared to be used for real-time signal processing, parallel hardware
signed accumulation, the combination of these operations execution and, video processing applications. As discussed
significantly reduces LUT counts. in the previous section, IP core is generated using Vivado
HLS 2018.2 and further used in VIVADO to create a bit
C. Design Flow file to configure with FPGA.
The design flow of the modified BNN topology to
deploy the application of Sign Langauge into the PYNQ-
Z2 FPGA board is discussed and shown in Figure 3. This
FPGA board is selected for various reasons as it has a
small no of hardware resources (i.e LUTs and DSPs)
and is generally used for low-power embedded devices
because it consumes less power (approx 2.5W). The
training of the BNN model is performed using the Theano
framework on the CPU platform that creates a ‘1w-1a.npz’
file where 1w and 1a denotes 1-bit binary weights and
activations. Theano2 is a lightweight Python platform
that allows users to evaluate and optimize mathematical
operations involving multi-dimensional arrays efficiently.
It offers fast computations and is used for building and
training neural networks. Second, the FINN synthesizer
having weight generation script that extracts the network
parameters from the ‘1w-1a.npz’ file and converts them
to weights.bin, thresh.bin and, generates a config file.
These parameters and config file having a neural network
model are further used for validating on software and
2 https://pypi.org/project/Theano/
Figure 4. Experimental Set-Up
305
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 03,2024 at 10:57:53 UTC from IEEE Xplore. Restrictions apply.
Table II R EFERENCES
R ESOURCE U TILIZATION AND P OWER C ONSUMPTION S UMMARY
[1] L. S. T. Mangamuri, L. Jain, and A. Sharmay, “Two hand indian
sign language dataset for benchmarking classification models of
Utiliza- Avail-
Device Total Resources Usage machine learning,” in 2019 International Conference on Issues and
tion able
Power Challenges in Intelligent Computing Techniques (ICICT), vol. 1.
IEEE, 2019, pp. 1–5.
LUTs 32164 53200 [2] M. J. Cheok, Z. Omar, and M. Jaward, “A review of hand gesture
60.46%
LUTRAM 3444 17400 19.7% and sign language recognition techniques,” International Journal
PYNQ- of Machine Learning and Cybernetics, vol. 10, pp. 131–153, 2019.
1.83W FF 44243 106400 41.5%
Z2 [3] M. Jaiswal, V. Sharmay, A. Sharmaz, and R. Tomar, “Transfer
BRAM 82 140 58.5%
DSP 28 220 12.7% learning with l2 norm regularization for classifying static two hand
hindi sign language gestures,” in 2020 IEEE 9th International
Conference on Communication Systems and Network Technologies
Table III (CSNT). IEEE, 2020, pp. 44–48.
P ERFORMANCE C OMPARISON TO PRIOR WORKS [4] A. S. Ghotkar and G. K. Kharate, “Study of vision based hand
gesture recognition using indian sign language,” Computer, vol. 55,
Hwang et al. Prieto et al. Proposed p. 56, 2014.
[20] [17] Work [5] M. Jaiswal, V. Sharma, A. Sharma, S. Saini, and R. Tomar, “An
Sebastien Hand Proposed efficient binarized neural network for recognizing two hands indian
Dataset sign language gestures in real-time environment,” in 2020 IEEE
Marcel[21] Alphabet ISL gestures
Stratix 4 PYNQ-Z2 17th India Council International Conference (INDICON), 2020,
FPGA Platform XCKU060 pp. 1–6.
EPFSGX (XC7Z020)
Floating 1- bit [6] A. Sharma, N. Sharma, Y. Saxena, A. Singh, and D. Sad-
Approach Fixed Point hya, “Benchmarking deep neural network approaches for indian
Point Binarization
DSP Usage 928 646 28 sign language recognition,” Neural Computing and Applications,
FPS — 23.5 843.8 vol. 33, pp. 6685–6696, 2021.
[7] A. A. Barbhuiya, R. K. Karsh, and R. Jain, “Cnn based feature
extraction and classification for sign language,” Multimedia Tools
and Applications, vol. 80, pp. 3051–3069, 2021.
B. Results and Analysis [8] W. Rawat and Z. Wang, “Deep convolutional neural networks
for image classification: A comprehensive review,” Neural
The post-implementation results of a modified BNN Computation, vol. 29, no. 9, pp. 2352–2449, Sep. 2017. [Online].
topology for classifying static gestures are analyzed in Available: https://doi.org/10.1162/neco_a_00990
the form of resource utilization and power consumption [9] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object detection with
deep learning: A review,” IEEE Transactions on Neural Networks
targeting PYNQ-Z2 hardware as shown in Table II. The and Learning Systems, vol. 30, no. 11, pp. 3212–3232, Nov. 2019.
percentage utilization indicates the number of LUT and [Online]. Available: https://doi.org/10.1109/tnnls.2018.2876865
BRAM resources required to restore weights, thresholds [10] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both
weights and connections for efficient neural network,” ArXiv, vol.
parameters and, models. Also, the overall power consump- abs/1506.02626, 2015.
tion of 1.83W is very efficient for developing hardware [11] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up con-
solution for the intended application. volutional neural networks with low rank expansions,” ArXiv, vol.
abs/1405.3866, 2014.
To evaluate the performance of the proposed work, the [12] T. Simons and D.-J. Lee, “A review of binarized neural networks,”
comparative analysis is carried out with prior works based Electronics, vol. 8, no. 6, p. 661, Jun. 2019. [Online]. Available:
on Dataset used, FPGA platform, Approach, DSP usage https://doi.org/10.3390/electronics8060661
[13] P. C. Badhe and V. Kulkarni, “Artificial neural network based indian
and FPS in Table III. Using 1-bit Binarization precision sign language recognition using hand crafted features,” in 2020
approach, the model achieves a high classification rate of 11th International Conference on Computing, Communication and
843.8 FPS and significantly performs better with fewer Networking Technologies (ICCCNT), 2020, pp. 1–6.
[14] D. Naglot and M. Kulkarni, “Ann based indian sign language
DSP resources as compared to existing designs. numerals recognition using the leap motion controller,” in 2016
International Conference on Inventive Computation Technologies
(ICICT), vol. 2, 2016, pp. 1–6.
V. C ONCLUSION AND F UTURE W ORK [15] E. Abraham, A. Nayak, and A. Iqbal, “Real-time translation of
indian sign language using lstm,” in 2019 Global Conference for
Advancement in Technology (GCAT), 2019, pp. 1–5.
This paper demonstrates an efficient way of deploying [16] K. Suri and R. Gupta, “Convolutional neural network array for
the application of sign language with high performance sign language recognition using wearable imus,” in 2019 6th
using modified binarized networks on a low-power PYNQ- International Conference on Signal Processing and Integrated
Networks (SPIN), 2019, pp. 483–488.
Z2 FPGA platform. The utilization of 1-bit binarization of [17] R. Núñez Prieto, P. C. Gómez, and L. Liu, “A real-time gesture
weights and activations provides classification accuracy of recognition system with fpga accelerated zynqnet classification,” in
85% on static gestures and thus, outperforms the other 2019 IEEE Nordic Circuits and Systems Conference (NORCAS):
NORCHIP and International Symposium of System-on-Chip (SoC),
previous designs by achieving 843.8 frames per second 2019, pp. 1–6.
(FPS). As a future direction, the authors are planning to [18] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong,
extend the application in real-time using deeper neural M. Jahre, and K. Vissers, “Finn,” Proceedings of the 2017
ACM/SIGDA International Symposium on Field-Programmable
networks and with different bit precision. Gate Arrays, Feb 2017. [Online]. Available: http://dx.doi.org/10.
1145/3020078.3021744
[19] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,
ACKNOWLEDGEMENT “Binarized neural networks,” ArXiv, vol. abs/1602.02505, 2016.
[20] W.-J. Hwang, Y.-J. Jhang, and T.-M. Tai, “An efficient fpga-
This research work is funded by Department of Science based architecture for convolutional neural networks,” in 2017
40th International Conference on Telecommunications and Signal
and Technology (DST), India under project name Sign Processing (TSP), 2017, pp. 582–588.
Language to Regional Language Converter (SLRLC) with [21] S. Marcel, “Hand posture recognition in a body-face centered
project number SEED/TIDE/063/2016. space,” CHI ’99 Extended Abstracts on Human Factors in Com-
puting Systems, 1999.
306
Authorized licensed use limited to: COLLEGE OF ENGINEERING - Pune. Downloaded on August 03,2024 at 10:57:53 UTC from IEEE Xplore. Restrictions apply.