Sign Language Character Recognition Research Paper

This paper presents a method for using deep convolutional networks to classify images of American Sign Language (ASL) letters and digits. The authors trained a CNN from scratch on a dataset of ASL images to classify each letter and digit. Their network architecture included convolutional and dense layers. They achieved high training accuracy on classifying ASL letters and digits, but validation accuracy was lower when training and testing on different datasets, indicating overfitting. The paper demonstrates the potential of using deep learning for ASL recognition from images.

Uploaded by

Vanilla Suga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Sign Language Character Recognition Research Paper

Uploaded by

Vanilla Suga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Using Deep Convolutional Networks for

Gesture Recognition in American Sign Language

Vivek Bheda and N. Dianna Radpour
Department of Computer Science, Department of Linguistics
State University of New York at Buffalo
{vivekkan, diannara}@buffalo.edu

Abstract
In the realm of multimodal communication, sign
language is, and continues to be, one of the most
understudied areas. In line with recent advances in
the field of deep learning, there are far reaching
implications and applications that neural networks can
have for sign language interpretation. In this paper, we
present a method for using deep convolutional
networks to classify images of both the the letters and
digits in American Sign Language.
Figure 2. American Sign Language Numbers
1. Introduction
Sign Language is a unique type of communication that
often goes understudied. While the translation process One of the nuances in sign language is how often
between signs and a spoken or written language is formally fingerspelling is used. Fingerspelling is a method of
called ‘interpretation,’ the function that interpreting plays spelling words using only hand gestures. One of the
is the same as that of translation for a spoken language. In reasons the fingerspelling alphabet plays such a vital role
our research, we look at American Sign Language (ASL), in sign language is that signers used it to spell out names of
which is used in the USA and in English-speaking Canada anything for which there is not a sign. People's names,
and has many different dialects. There are 22 handshapes places, titles, brands, new foods, and uncommon animals
that correspond to the 26 letters of the alphabet, and you or plants all fall broadly under this category, and this list is
can sign the 10 digits on one hand. by no means exhaustive. Due to this reason, the recognition
process for each individual letter plays quite a crucial role
in its interpretation.

2. Related Work

Convolutional Neural Networks have been extremely
successful in image recognition and classification
problems, and have been successfully implemented for
human gesture recognition in recent years. In particular,
there has been work done in the realm of sign language
recognition using deep CNNs, with input-recognition that
is sensitive to more than just pixels of the images. With the
use of cameras that sense depth and contour, the process is
made much easier via developing characteristic depth and
motion profiles for each sign language gesture [5].
The use of depth-sensing technology is quickly
growing in popularity, and other tools have been
incorporated into the process that have proven successful.
Developments such as custom-designed color gloves have
been used to facilitate the recognition process and make the
feature extraction step more efficient by making certain
Figure 1. American Sign Language Alphabet gestural units easier to identify and classify [8].
Until recently, however, methods of automatic 4. Data
sign language recognition weren’t able to make use of the We initially trained and tested on a self-generated dataset
depth-sensing technology that is as widely available today. of images we took ourselves. This dataset was a collection
Previous works made use of very basic camera technology of 25 images from 5 people for each alphabet and the digits
to generate datasets of simply images, with no depth or 1-9. Since our dataset was not constructed in a controlled
contour information available, just the pixels present. setting, it was especially prone to differences in light, skin
Attempts at using CNNs to handle the task of classifying color, and other differences in the environment that the
images of ASL letter gestures have had some success [7], images were captured in, so we also used a premade
but using a pre-trained GoogLeNet architecture. dataset to compare our dataset’s performance with [3].
Additionally, a pipeline was developed that can be used so
3. Method people are able to generate and continue adding images to
Our overarching approach was one of basic supervised this dataset.
learning using mini-batch stochastic gradient descent. Our
task was that of classification using deep convolutional 4.1 Preprocessing
neural networks to classify every letter and the digits, 0-9, For generating our own dataset, we captured the images for
in ASL. The inputs were fixed size high-pixel images, 200 each sign, then removed the backgrounds from each of the
by 200 or 400 by 400, being padded and resized to 200 by images using background-subtraction techniques. When we
200. initially split the dataset into two for training and
validation, the validation accuracy showed to be high.
3.1 Architecture However, when we used datasets from two different
Most implementations surrounding this task have sources, i.e. training on ours and testing on the premade
attempted it via transfer learning, but our network was and vice versa, the validation accuracy drastically
trained from scratch. Our general architecture was a fairly decreased. Since training on one dataset and validating on
common CNN architecture, consisting of multiple another was not yielding as accurate of results, we used the
convolutional and dense layers. The architecture included 3 premade dataset for the different gestures to train the
groups of 2 convolutional layers followed by a max-pool network which yielded the following results.
layer and a dropout layer, and two groups of fully
connected layer followed by a dropout layer and one final
output layer.

Figure 4. Training Accuracy on ASL Alphabets

Figure 3. Network Architecture Figure 5. Training Accuracy on ASL Digits
Figure 8. Validation Accuracy on ASL Letters
Figure 6. Training Loss on ASL Alphabet

Figure 7. Training Loss on ASL Digits

4.2 Data Augmentation Figure 9. Validation Accuracy on ASL Digits
We saw the performances improve differently in our two
datasets via data augmentation. By transforming our
images just a few pixels (rotating by 20 degrees, translating
by 20% on both axes) there was an increased accuracy of
approximately 0.05. We also flipped the images
horizontally as we can sign using both hands. While it
wasn’t extremely effective, we saw that with better and
more representative initial training data, augmenting
improved the performance more drastically. This was
observed after augmentation of the premade dataset, which
improved the performance by nearly 20%.

5. Results
We observed 82.5% accuracy on the alphabet gestures, and
97% validation set accuracy on digits, when using the NZ
ASL dataset. On our self-generated dataset, we observed Figure 10. Validation Loss on ASL Letters
much lower accuracy measures, as was expected since our
data was less uniform than that which was collected under
studio settings with better equipment. We saw 67%
accuracy on letters of the alphabet, and 70% accuracy on
the digits. In terms of time complexity, gestures of the
letters converged in approximately 25 minutes, and the
digits converged in nearly 10 minutes.
6. Conclusions and Future Work
In this paper, we described a deep learning approach for
a classification algorithm of American Sign Language.
Our results and process were severely affected and
hindered by skin color and lighting variations in our
self-generated data which led us to resort to a pre-made
professionally constructed dataset. With a camera like
Microsoft’s Kinect that has a depth sensor, this problem
is easy to solve [5]. However, such cameras and
technology are not widely accessible, and can be costly.
Our method shows to have potential in solving this
problem using a simple camera, if enough substantial
training data is provided, which can be continuously
Figure 11. Validation Loss on ASL Digits done and added via the aforementioned processing
pipeline. Since more people have access to simple
5.1 Evaluation camera technologies, this could contribute to a scalable
We trained with a categorical cross entropy loss function solution.
for both our datasets. It is a fairly common loss function In recognizing that classification is a limited
used along with image classification problems. goal, we plan on incorporating structured PGMs in
future implementations of this classification schema that
would describe the probability distributions of the
different letters’ occurrences based on their sequential
Initially, we observed low accuracy measures when testing contexts. We think that by accounting for how the
on the validation set of the self-generated data, which we individual letters interact with each other directly (e.g.
accounted largely to the lighting and skin tone variations in the likelihood for the vowel ‘O’ to proceed the letter ‘J’),
the images. The higher accuracy measure for the digits was the accuracy of the classification would increase. This
expected, since the gestures for the digits are much more HMM approach with sequential pattern boosting
distinguishable and easier to classify. Compared to (SP-boosting) has been done with the actual gesture
previous methods working on this same task, our network units that occur in certain gestures’ contexts, i.e.
performed quite well, considering RF-JA were using both a capturing the upper-arm movements that precede a
color glove and depth-sensing Kinect camera. The cause of certain letter to incorporate that probability weight into
higher accuracy than Stanford’s method was likely due to the next unit’s class, [6] and processing sequential
their lack of background-subtraction for the images, since phonological information in tandem with gesture
they used a large dataset from ILSVRC2012 as part of a recognition [4], but not for part-of-word tagging with an
competition. application like what we hope to achieve.
We also recognize that the representation itself
makes a huge difference in the performance of
Method Accuracy
algorithms like ours, so we hope to find the best
representation of our data, and building off our results
deepCNN (our method) 82.5 from this research, incorporate it into a zero-shot
learning process. We see zero-shot learning as having
Stanford deepCNN [7] 72 the potential to facilitate the translation process from
American Sign Language into English. Implementing
RF-JA+C(h-h) [8] 90 one-shot learning for translating the alphabet and
numbers from American Sign Language to written
RF-JA+C(l-o-o) [8] 70 English, and comparing it with a pure deep learning
heuristic could be successful and have the potential to
benefit from error correction via language models.
Figure 12. Comparison of previous methods with ours; Stanford
didn’t use background subtraction, RF-JA(h-h) split the training Recent implementations of one-shot adaptation have also
and validation set 50-50, (l-o-o) omitted specific data. had success in solving real world computer vision tasks,
and effectively trained deep convolutional neural
networks using very little domain-specific data, even as
limited as single-image datasets. We ultimately aim to
create a holistic and comprehensive representation
learning system for which we have designed a set of
features that can be recognized from simple gesture
images that will optimize the translation process.

7. References

[1] X. Chen and A. Yuille. Articulated pose estimation by
a graphical model with image dependent pairwise
relations. In Advances in Neural Information
Processing Systems (NIPS), 2014.
[2] T. Pfister, J. Charles, and A. Zisserman. Flowing
convnets for human pose estimation in videos. In IEEE
International Conference on Computer Vision, 2015.
[3] Barczak, A.L.C., Reyes, N.H., Abastillas, M., Piccio,
A., Susnjak, T. (2011), A new 2D static hand gesture
colour image dataset for ASL gestures, Research Letters in
the Information and Mathematical Sciences, 15, 12-20
[4] Kim, Taehwan & Livescu, K & Shakhnarovich, Greg.
(2012). American sign language fingerspelling recognition
with phonological feature-based tandem models. In IEEE
Spoken Language Technology Workshop (SLT), 119-124.
[5] Agarwal, Anant & Thakur, Manish. Sign Language
Recognition using Microsoft Kinect. In IEEE International
Conference on Contemporary Computing, 2013.
[6] Cooper, H., Ong, E.J., Pugeault, N., Bowden, R.: Sign
language recognition using sub-units. The Journal of
Machine Learning Research, 13(1), 2205–2231, 2012.
[7] Garcia, Brandon and Viesca, Sigberto. Real-time
American Sign Language Recognition with Convolutional
Neural Networks. In Convolutional Neural Networks for
Visual Recognition at Stanford University, 2016.
[8] Cao Dong, Ming C. Leu and Zhaozheng Yin. American
Sign Language Alphabet Recognition Using Microsoft
Kinect. In IEEE International Conference on Computer
Vision and Pattern Recognition Workshops, 2015.

Cambridge PMP Basic Troubleshooting Steps
No ratings yet
Cambridge PMP Basic Troubleshooting Steps
12 pages
Neuromorphic Computing
89% (9)
Neuromorphic Computing
27 pages
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Introduction To Juniper Mist AI (IJMA)
No ratings yet
Introduction To Juniper Mist AI (IJMA)
3 pages
Hand Gesture Recognition Using Python
100% (1)
Hand Gesture Recognition Using Python
4 pages
A Brief Review On Linux: Index
No ratings yet
A Brief Review On Linux: Index
6 pages
SBP - Rule Based System Forward Chaining
No ratings yet
SBP - Rule Based System Forward Chaining
14 pages
Sign Language Rec
No ratings yet
Sign Language Rec
7 pages
Sign Language and Common Gesture Using CNN
0% (1)
Sign Language and Common Gesture Using CNN
7 pages
CCCCCC C
No ratings yet
CCCCCC C
60 pages
Embedded Systems
No ratings yet
Embedded Systems
11 pages
Computer Viruses and Antiviruses: Presented To: Raj Kumar Tandulkar Computer Teacher of Class "10"
No ratings yet
Computer Viruses and Antiviruses: Presented To: Raj Kumar Tandulkar Computer Teacher of Class "10"
18 pages
(Information Communication Technology) : By: Sankalp Singh Section: M12 Roll No: 40
No ratings yet
(Information Communication Technology) : By: Sankalp Singh Section: M12 Roll No: 40
105 pages
10 Skills Embedded Engineers Need Now
100% (1)
10 Skills Embedded Engineers Need Now
6 pages
Testing Tools
No ratings yet
Testing Tools
23 pages
Self-Attention GRU Networks For Fake Job Classification
No ratings yet
Self-Attention GRU Networks For Fake Job Classification
5 pages
Lesson8 0417 ICT
No ratings yet
Lesson8 0417 ICT
25 pages
Sms Based Digital Notice Board
No ratings yet
Sms Based Digital Notice Board
107 pages
Gesture Language Translator Using Raspberry Pi
No ratings yet
Gesture Language Translator Using Raspberry Pi
7 pages
AKIN PROJECT SIGN LANGUAGE (Tensor)
No ratings yet
AKIN PROJECT SIGN LANGUAGE (Tensor)
22 pages
Embedded Systems Tools & Peripherals
No ratings yet
Embedded Systems Tools & Peripherals
4 pages
Lecture 2 - Monitoring & Maintenance
No ratings yet
Lecture 2 - Monitoring & Maintenance
39 pages
SDLC
No ratings yet
SDLC
11 pages
Convolution Neural Networks For Hand Gesture Recognation
No ratings yet
Convolution Neural Networks For Hand Gesture Recognation
5 pages
Zimbra Vs Other Email Platforms
No ratings yet
Zimbra Vs Other Email Platforms
10 pages
MSP430 Microcontroller Basics Chapter 8.1
No ratings yet
MSP430 Microcontroller Basics Chapter 8.1
6 pages
Biomatrics
No ratings yet
Biomatrics
39 pages
SRS Report
No ratings yet
SRS Report
59 pages
Android 01 Mobile Computing - Done
No ratings yet
Android 01 Mobile Computing - Done
57 pages
B.tech CS S8 Client Server Computing Notes Module 1
No ratings yet
B.tech CS S8 Client Server Computing Notes Module 1
13 pages
Mobile Computing: Introduction:: Unit - I
No ratings yet
Mobile Computing: Introduction:: Unit - I
4 pages
Campus Area Network
No ratings yet
Campus Area Network
4 pages
Sign Language Interpreter Using Computer Vision and LeNet-5 Convolutional Neural Network Architecture
No ratings yet
Sign Language Interpreter Using Computer Vision and LeNet-5 Convolutional Neural Network Architecture
4 pages
OOP-1st Lecture - Building Block
100% (1)
OOP-1st Lecture - Building Block
55 pages
1st Line Support CV Template Example
No ratings yet
1st Line Support CV Template Example
2 pages
CWNA Guide To Wireless LANs, Second Edition - Ch1
100% (1)
CWNA Guide To Wireless LANs, Second Edition - Ch1
3 pages
SEN 320 Human Computer Interaction: Typography and Theory
No ratings yet
SEN 320 Human Computer Interaction: Typography and Theory
28 pages
CWNA Guide To Wireless LANs, Second Edition - Ch6
100% (1)
CWNA Guide To Wireless LANs, Second Edition - Ch6
5 pages
Final Project Report
100% (1)
Final Project Report
29 pages
Validated Survey Questionnaires
No ratings yet
Validated Survey Questionnaires
4 pages
Lazy Lerners (Learning From Your Neighbours)
100% (1)
Lazy Lerners (Learning From Your Neighbours)
11 pages
Fingerprint Door Lock System
No ratings yet
Fingerprint Door Lock System
84 pages
Basics of Network Monitoring Ebook
No ratings yet
Basics of Network Monitoring Ebook
10 pages
Learn Python and Speak: The Language of Data
No ratings yet
Learn Python and Speak: The Language of Data
2 pages
The Science and Art of Effective Web and Application Design
No ratings yet
The Science and Art of Effective Web and Application Design
7 pages
Video Clasification PDF
100% (1)
Video Clasification PDF
114 pages
IOT
No ratings yet
IOT
23 pages
HCI NOTES (R16) Unit-I, II
No ratings yet
HCI NOTES (R16) Unit-I, II
59 pages
Linux Lab 01 PDF
No ratings yet
Linux Lab 01 PDF
5 pages
A Comprehensive Survey On Detection of Sinkhole Attack
No ratings yet
A Comprehensive Survey On Detection of Sinkhole Attack
21 pages
Seed Sowing Robot
No ratings yet
Seed Sowing Robot
83 pages
Network operating system A Complete Guide
From Everand
Network operating system A Complete Guide
Gerardus Blokdyk
No ratings yet
Nikto
No ratings yet
Nikto
15 pages
Introduction To Computer Networks For Non-Techies - Udemy
No ratings yet
Introduction To Computer Networks For Non-Techies - Udemy
11 pages
Networking Midterm Preparation
No ratings yet
Networking Midterm Preparation
2 pages
Mac vs. PC
No ratings yet
Mac vs. PC
18 pages
OS Lab Manual
No ratings yet
OS Lab Manual
63 pages
Artificial Intelligence Based Student Attendance Using Face Recognition
No ratings yet
Artificial Intelligence Based Student Attendance Using Face Recognition
76 pages
Face Recognition Attendance System
No ratings yet
Face Recognition Attendance System
12 pages
SYMBIAN OS Report
No ratings yet
SYMBIAN OS Report
25 pages
Palm Vein Technology
No ratings yet
Palm Vein Technology
17 pages
Bug tracking system Complete Self-Assessment Guide
From Everand
Bug tracking system Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Nano Dentistry
No ratings yet
Nano Dentistry
104 pages
Robots
No ratings yet
Robots
22 pages
What Is Nanotechnology
No ratings yet
What Is Nanotechnology
2 pages
Computer Vision ET
No ratings yet
Computer Vision ET
12 pages
S2 2021 433783 Bibliography
No ratings yet
S2 2021 433783 Bibliography
8 pages
Radial Basis Function (RBF) Neural Networks For The Senior Design Project
No ratings yet
Radial Basis Function (RBF) Neural Networks For The Senior Design Project
17 pages
2 - Self-Supervised Learning For Anomaly Detection and Localization
No ratings yet
2 - Self-Supervised Learning For Anomaly Detection and Localization
28 pages
W1-Introduction To Artificial Intelligence
No ratings yet
W1-Introduction To Artificial Intelligence
46 pages
Multilayer Perceptron Neural Network
No ratings yet
Multilayer Perceptron Neural Network
8 pages
On Handwritten Digit Recognition
100% (1)
On Handwritten Digit Recognition
15 pages
Artificial Intelligence MCQS
No ratings yet
Artificial Intelligence MCQS
22 pages
Deep Learning
No ratings yet
Deep Learning
1 page
Safiya Technical
No ratings yet
Safiya Technical
17 pages
Facial Emotion Recognition: State of The Art Performance On FER2013
No ratings yet
Facial Emotion Recognition: State of The Art Performance On FER2013
9 pages
Industrial Revolution 4.0 and Its Impact To Civilization: Subtitle
No ratings yet
Industrial Revolution 4.0 and Its Impact To Civilization: Subtitle
8 pages
Blue Futuristic Illustrative Artificial Intelligence Project Presentation
No ratings yet
Blue Futuristic Illustrative Artificial Intelligence Project Presentation
13 pages
STS Chapter 6
No ratings yet
STS Chapter 6
29 pages
Solving XOR Problem Using DNN AIDS
No ratings yet
Solving XOR Problem Using DNN AIDS
4 pages
Score Report Comp 2017 BTest-7 SET C
100% (1)
Score Report Comp 2017 BTest-7 SET C
292 pages
Batch No: 7 VYSHNAVI KHANDE 160031497 AVUTHU LIKITHA 160030088 Tullimalli Harsha Sree 160031385
No ratings yet
Batch No: 7 VYSHNAVI KHANDE 160031497 AVUTHU LIKITHA 160030088 Tullimalli Harsha Sree 160031385
15 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Bbit 3202 Artificial Intelligence Exam 2
No ratings yet
Bbit 3202 Artificial Intelligence Exam 2
3 pages
Artificial Intelligence
100% (1)
Artificial Intelligence
16 pages
Deep Learning
100% (1)
Deep Learning
3 pages
Commercialization of New Technologies in India: An Empirical Study of Perceptions of Technology Institutions
No ratings yet
Commercialization of New Technologies in India: An Empirical Study of Perceptions of Technology Institutions
8 pages
ML MCQ Unit 1
No ratings yet
ML MCQ Unit 1
8 pages
Lecture 01 Introduction To MEMS - New
No ratings yet
Lecture 01 Introduction To MEMS - New
8 pages
Technology - The Road Ahead
No ratings yet
Technology - The Road Ahead
1 page

Sign Language Character Recognition Research Paper

Uploaded by

Sign Language Character Recognition Research Paper

Uploaded by

Using​ ​Deep​ ​Convolutional​ ​Networks​ ​for

Gesture​ ​Recognition​ ​in​ ​American​ ​Sign​ ​Language

2.​ ​Related​ ​Work

Figure​ ​4.​ ​Training​ ​Accuracy​ ​on​ ​ASL​ ​Alphabets

Figure​ ​7.​ ​Training​ ​Loss​ ​on​ ​ASL​ ​Digits

You might also like

Using Deep Convolutional Networks for

Gesture Recognition in American Sign Language

2. Related Work

Figure 4. Training Accuracy on ASL Alphabets

Figure 7. Training Loss on ASL Digits