SSICT-2023 Paper 5

The document proposes developing a sign language model that translates text to 3D skeleton sign pose sequences. It trains an encoder-decoder model on the PHOENIX14T dataset containing parallel sign videos, German translations, and sign glosses. The model's performance is evaluated using a back-translation method that translates the generated skeleton videos back to text and compares them to the original text input using BLEU score. Initial results were poor but improved after parameter tuning, showing potential for future optimization.

Uploaded by

Bùi Nguyên Hoàng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

SSICT-2023 Paper 5

Uploaded by

Bùi Nguyên Hoàng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Transformation of Text to Skeleton based Sign

Language
Nhu-Vinh Hoang1, 2, 3 , Nghia-Viet Hoang1, 2, 3 , Nhu-Binh Nguyen Truc1, 2, 3 , Kim-Phat Tran1, 2, 3
1 Faculty of Information Technology, University of Science, VNU-HCM
2 Vietnam National University, Ho Chi Minh City, Vietnam
3 {hnvinh21, hnviet21, ntnbinh21, tkphat21}@apcs.fitus.edu.vn

Abstract—Automating Sign Language Interpretation is being sidering business aspect, the model can be developed into a
considered seriously to improve the conversation quality of bot teaching sign language. It will mark a big step forward
people with hearing impaired. The more places Sign Language is disability-inclusive efforts.
available, the more accessible deaf people to the community. In
this paper, the authors attempt to developve the Sign Language
model which translate from discrete text sentences to continuous In this paper, we try to developve a Sign Language Model
3D skeleton sign pose sequences, using the Back Translation Eval- that translate from text to skeleton video. Our approach
uation method which transform Video to Text for comparison. includes two different parts: once is text to sign pose and
The model is trained on PHOENIX14T DATASETS. The data
another is text to sign pose via gloss. In addition, we attempt
includes parallel sign videos and German translation sequence
with 8057 videos of 9 different signers, 2887 German words, 1066 to evaluate the performance by back translation evaluation
different sign glosses. Although the output skeleton videos are mechanism using Sign Language Translation.
bad, the results suggest that there is a considerable potential for
future improvements since the authors obtain the validation score We evaluate on the RWTH-PHOENIX-Weather-2014T
of 17.252 DTW. Overall, the results imply a promise foundation
for continued investigation and optimization of the approach, (PHOENIX14T) dataset which includes parallel sign videos
with the aim of improving the conversation quality between the and German translation sequence with 2887 German words
Deaf community and hearing counterparts. and 1066 different sign glosses from a combined 835,356
frames with 25 fps, 210 x 260 resolution, for each video
I. I NTRODUCTION using OpenPose model to extract 2D joint positions then lift
to 3D. The Sign Language Production (SLP) model initially
Equity is a critical component of modern society, shaping using encoder-decoder architecture to convert text sequences
how individuals and communities are treated with equal op- input to gloss . Then re-train the Progressive Transformers SLP
portunity to success. The concept of equity has been central model, transform the text sequences and gloss representation
to political and social discourse for decades, with many to skeleton sign pose sequence. To evaluate the performance,
countries and organizations embracing it as a core principle. we use back translation evaluation method using state-of-the-
To ensure that no one with a disability is excluded, effective art Sign Language Translation model which translate back the
communication methods are imperatively being developed and skeleton videos to text sequences, then using BLEU-n score
implemented for over 1.5 billion people worldwide facing metric to evaluate model and comparison with the text input.
hearing difficulties [4]. Sign language interpretation is high-
lighted as this means of communication is vital to the deaf Our experiment is divided into three different times. In the
and hard of hearing community. first time, because the dataset has only 15 sets of text, gloss and
However, becoming a sign language interpreter is a chal- skeleton, our model gives bad result. In the second time, the
lenging and rigorous process that requires fluency in multiple total of dataset is bigger with 8057 videos. However, the model
languages, specialized training, and a deep understanding of does not show any improvement. We find some problems
cultural differences. This raises the question of whether an through this two times and try to fix some parameters, and
automated sign language interpreter could be developed to the result is significantly improve. xxx This paper includes five
address these challenges and provide more accessible com- sections. In Section II, we provide a brief review of current
munication for the deaf and hard of hearing community. methodologies and methods used for transform texts to hand
Improving the communication quality for this community gestures as videos as well as the model which the authors
is crucial to promoting inclusion and equality. One potential use to provide ground truth. In Section III, we present our
solution is to develop an automated sign language interpreter method for transforming texts to the video of sign language.
that could provide accessible communication to individuals The experimental setup, results, and comparisons are presented
with hearing disabilities. Additionally, providing sign language in Section IV. Lastly, Section V provides conclusions and
interpreters in a variety of areas, particularly news broadcasts suggestions for future work to improve the performance of
and conferences, can significantly improve accessibility and the proposed technique.
inclusion for the deaf and hard of hearing community. Con-
II. R ELATED W ORK running using OpenPose” by Emily Hansen et al, the
In this paper, we present a existing method that propose author utilizes OpenPose to automatically access the
Sign Language Production to build a 3D Sign pose sequences lower-limb kinematics of runners by tracking the joint
of skeleton form some text sequences. And another model angles and positions of the runners during a treatmill run.
which support recognizing and translate back the German sign
language (weather broadcast vocabulary) to text for evaluate Beside being considered as a foundation in many
model performance. research work, OpenPose also provides a great potential
as a tool for data processing. In the paper ”Head Pose
1) Progressive Transformers for End-to-End Sign Lan- Estimation for Longitudinal Behavioral Analysis using
guage Production (Ben Saunders, Necati Cihan Camgoz, OpenPose” by Kevin Huang et al., the authors use
Richard Bowden - ECCV 2020) [3] OpenPose to track the head positions of the children
• Sign Language Production is still a challenging prob- with autism spectrum disorder during the interactions.
lem, including the challenge when mapping from The data is gathered and analyzed to identify the patterns
lingual domain into visual domain [2]. Previous over time. In the research work by Ali K. Thabet et al.
approach by Stoll et al. focused on create sign pose [Analyzing the Passing Strategies of Professional Soccer
sequences from text via glosses []. By contrast, our Teams using OpenPose], the author uses OpenPose
paper aim to produce sign pose sequences directly to extract the body pose of the soccer players during
without priori. Moreover, the number of frames is the matches. The data then is analyzed to identify the
dynamically depend on length of input in oder to passing strategies and patterns among players.
produce correct result.
• Symbolic Transformer is still be used for data pro- B. Symbolic Transformer
cessing - convert from Text to Gloss representation. Symbolic Transformer is a relative new data processing
We use this for Text to Gloss to Pose (T2G2P) tool which gains attention in the machine learning and
models. This model will be used to compare with natural language processing fields. Additionally, it’s used
our end-to-end Text to Pose (T2P) model. in transforming from text to glosses, a process which
2) Sign Language Translation involves mapping natural language descriptions to a
Sign Language Translation (SLT) is a difficult and structured vocabulary of concepts and labels.
complex task of converting sign language into text. Due
to the differences in grammar between sign language In the paper ”Generating Textual Glosses for Equations
and spoken language, as well as the large number with Symbolic Transformer” by Nafiseh Shabib et
of meaningless words in a sentence, this task can al., the author suggest a method which leverages the
be challenging. One of the complex tasks of Sign textual glosses generation of Symbolic Tranformer
Language Translation is directly convert sign videos for mathematical equations. The result reflects that
sequence to spoken language sentences, The Sign Symbolic Transformer outperforms other existing
Language Transformers Joint End-to-end Sign Language methods. In the paper ”Symbolic Transformer Networks
Recognition and Translation (Necati Cihan Camgoz for Knowledge Base Completion” by Jianyuan Shi et al.,
– CVPR’20) [1] are currently state-of-the-art models Symbolic Transformer is used to represent the concepts
in SLT. In this paper, these models are used for back and relations in the knowledge base and acchieve
translation mechanism method to evaluate Sign Language state-of-the-art results on several benchmark datasets.
Production model performance.
III. M ETHOD
3) Data Processing
A. Method overview:
A. OpenPose Our method contain two main parts:
a) Data preparation: The model is trained on
OpenPose is a popular and widely-used open-source
PHOENIX14T data sets. This data sets contains
library which uses deep learning techniques to detect
8057 sign language videos.
human body joints from images and videos. Many
scientists use OpenPose as a foundation for their
work in computer vision, human pose estimation.
The famous paper is that of Zhe Cao et al[], who
proposes a method for real-time multi-person 2D pose
estimation using OpenPose, which achieves state-of-the-
art results on several benchmark datasets. OpenPose is
not only used in the field of human pose estimation,
it’s also be considered in sport sciences. In the paper
”Automated assessment of lower-limb kinematics in Fig. 1. A frame from a video of the dataset
We use OpenPose model to produce skeleton sequences
for each sign language videos, extracted to 2D joint
positions, minimise 3D pose whilst maintaining
consistent bone length and correcting misplaced joints.
Then apply skeleton normalisation and represent 3D
joints as x, y and z coordinates for Sign Language
Production Model. Texts are also pre-processed to
be come glosses using Symbolic Transformer. In
conclusion, out data sets contains 8057 sets of skeleton
video, texts and glosses, 7096 sets are used for training,
519 sets are used for developing and 642 sets are used
for testing.
Fig. 4. Progressive Transformers model

pose sequence, the model has take advantage of tem-

poral smoothing technique. Specifically, the output pose
sequence from the newest stage is smoothed by averaging
it with the output from the previous stages, weighted by a
decreasing exponential decay factor. This helps to ensure
that the pose estimates from each stage are consistent with
those from the previous stages.
C. Symbolic Transformer:
Fig. 2. Pipeline for transforming text to gloss We also introduce Symbolic Transformer which transforms
text into gloss before go through Progressive Transformer
b) Training models: Our method use ProgressiveTrans- model. Symbolic Transformer converts input text X into
formersSLP for training models. glosses with N words Z = (z1 , z2 , ..., zN ).
To do this work, source text X and target gloss Z are
embedded through a linear embedding layer to produce
one-hot-vector in a high-dimensional space where tokens
with the similar meanings are closer. Positional Encoding
function is also applied to provide word order information.
Our symbolic transformer model consists of an encoder-
decoder architecture. The encoder first learns the contex-
tual representation of the source sequence through self-
attention mechanisms, understanding each input token in
relation to the full sequence. The decoder then determines
the mapping between the source and target sequences to
produce target prediction.

Fig. 3. Overall pipeline for Text to Skeleton based Sign language

B. Progressive Transformer:
In this work, Progressive Transformers (Figure 2b) trans-
late from the symbolic domains of gloss or text to contin-
uous sign pose sequences that represent the motion of a
signer producing a sentence of sign language. The model
must produce skeleton pose outputs that can both express
an accurate translation of the given input sequence and a
realistic sign pose sequence. In the detail, the text input can
be described as X = (x1 , x2 , ..., xT ) which T is number of
words, and output of this model is a sign pose sequences Fig. 5. Symbolic Transformers model
with U frames Y = (y1 , y2 , ..., yU ).
To ensure the continuity and smoothness of the output D. Evaluate performance method:
This method use back translation evaluation mechanism
to evaluate performance. The 3D Sign pose sequence
(skeleton) output will be transformed to spoken language
(text) by Sign Language Translation model (the authors
use state-of-the-art model for best comparison).
To measure the translation performance of this method, we
utilized BLEU score (n-grams ranging from 1 to 4), which
is the most common metric for machine translation.

IV. EXPERIMENT AND RESULTS

V. C ONCLUSION
Conversion of Text to Sign Language plays an important
role in enhancing communication between the Deaf and hear-
ing. Our experiments are evaluated on the PHOENIX14T
dataset. The results are possible for improvement for the
reasons lack of experience in training model. In both tries,
our validation and train batch loss are still decreasing. In the
second try, our e-pox is at 500. If we increase it to 20000, the
validation and train batch loss could decrease more.
Challenges from the beginning days: “properly” references
rareness; unknown code bugs, definitions, advanced knowl-
edge; new to notebook-based Google Lab. No prior knowledge
about Machine Learning / Deep Learning leads to our investing
a huge amount of time in Literature Review. Learn to adjust
I/O, trace back to important code blocks, know how to import
data and train the model. Seek for important portion of codes
was a problem until our search engine utilization came in.
Adjust console logs to get proper output for further research.
The coming goals are learn more about data processing,
optimize code, think of alternative SL interpreting methods,
enhance training time and use Deepfake to visualize 3D output.
Besides that, add “speech to text” phase to complete the initial
objective: speech to sign language, create a case study: self-
learning Sign Language and develop an application supporting
the Deaf in communication by applying our method.

ACKNOWLEDGMENT
The authors would like to thank Mr Minh-Triet Tran.

R EFERENCES
[1] Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard
Bowden. Sign language transformers: Joint end-to-end sign language
recognition and translation. 2020.
[2] Sergio Escalera Vassilis Athitsos Mohammad Sabokrou Razieh Rastgoo,
Kourosh Kiani. All you need in sign language production. 2022.
[3] Ben Saunders, Necati Cihan Camgoz, and Richard Bowden. Progressive
Transformers for End-to-End Sign Language Production. 2020.
[4] WHO. Deafness and hearing loss.

Autoencoder Asset Pricing Models
No ratings yet
Autoencoder Asset Pricing Models
22 pages
Saep 349 PDF
No ratings yet
Saep 349 PDF
41 pages
An Introduction To Artificial Intelligence
No ratings yet
An Introduction To Artificial Intelligence
10 pages
Ramaprasad 1983 OntheDefinitionofFeedback
No ratings yet
Ramaprasad 1983 OntheDefinitionofFeedback
12 pages
Group 09
No ratings yet
Group 09
3 pages
Sign Language Translator Presentation
No ratings yet
Sign Language Translator Presentation
19 pages
Project Synopsis (1)
No ratings yet
Project Synopsis (1)
17 pages
Paper_1
No ratings yet
Paper_1
19 pages
CVPR 2022 - Signing at Scale- Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
No ratings yet
CVPR 2022 - Signing at Scale- Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
12 pages
IRJAEH0202016+-+Real-Time+Sign+Language+Recognition+and+Translation+Using+Deep+Learning+Techniques
No ratings yet
IRJAEH0202016+-+Real-Time+Sign+Language+Recognition+and+Translation+Using+Deep+Learning+Techniques
5 pages
Progressive Transformers For End-to-End Sign Language Production
No ratings yet
Progressive Transformers For End-to-End Sign Language Production
20 pages
Mohammed Maqdoom Jahagirdarp2Yo
No ratings yet
Mohammed Maqdoom Jahagirdarp2Yo
9 pages
Sign Language To Text Conversion
50% (2)
Sign Language To Text Conversion
27 pages
From_table_of_content_report_s2t (1) (1)2
No ratings yet
From_table_of_content_report_s2t (1) (1)2
33 pages
Sign Language Translation
No ratings yet
Sign Language Translation
23 pages
Conversion of Sign Language To Text and Audio Using Deep Learning Techniques
No ratings yet
Conversion of Sign Language To Text and Audio Using Deep Learning Techniques
9 pages
Fin Ijprems1697272136
No ratings yet
Fin Ijprems1697272136
5 pages
Dynamic Tool For American Sign Language Finger Spelling Interpreter
No ratings yet
Dynamic Tool For American Sign Language Finger Spelling Interpreter
5 pages
2021a1r002
No ratings yet
2021a1r002
51 pages
Plag Free
No ratings yet
Plag Free
28 pages
Sign Language Recognition Using Convolutional Neur
No ratings yet
Sign Language Recognition Using Convolutional Neur
12 pages
Python Doc Chap
No ratings yet
Python Doc Chap
59 pages
Text To Sign Language Translation System A Review
No ratings yet
Text To Sign Language Translation System A Review
5 pages
Pose-Guided Sign Language Video GAN With Dynamic Lambda
No ratings yet
Pose-Guided Sign Language Video GAN With Dynamic Lambda
6 pages
75 Online
No ratings yet
75 Online
14 pages
Smart Translation
No ratings yet
Smart Translation
24 pages
Hand Signs To Audio Converte1
No ratings yet
Hand Signs To Audio Converte1
11 pages
107
No ratings yet
107
11 pages
Rechaerch - Copy
No ratings yet
Rechaerch - Copy
14 pages
Staticsign CNN
No ratings yet
Staticsign CNN
8 pages
AUDIO TO SIGN LANGUAGE Final Fishries++ (2621+to+2630)
No ratings yet
AUDIO TO SIGN LANGUAGE Final Fishries++ (2621+to+2630)
10 pages
2021A1R002-1
No ratings yet
2021A1R002-1
14 pages
xCabot22
No ratings yet
xCabot22
8 pages
Sign Language Translator Thesis
100% (2)
Sign Language Translator Thesis
6 pages
Real-Time Conversion for Sign-to-Text and Text-to-Speech Communication using Machine Learning
No ratings yet
Real-Time Conversion for Sign-to-Text and Text-to-Speech Communication using Machine Learning
8 pages
Hand Sign Language Translator For Speech Impaired
No ratings yet
Hand Sign Language Translator For Speech Impaired
4 pages
Bachelor of Technology IN Artificial Intelligence and Machine Learning
No ratings yet
Bachelor of Technology IN Artificial Intelligence and Machine Learning
14 pages
Dual Mode Sign Language Recognizer-An Android Based CNN and LSTM Prediction Model
No ratings yet
Dual Mode Sign Language Recognizer-An Android Based CNN and LSTM Prediction Model
5 pages
Report
No ratings yet
Report
8 pages
Sign Language To Text Converter
No ratings yet
Sign Language To Text Converter
18 pages
RA2111003010551 and RA2111003010555
No ratings yet
RA2111003010551 and RA2111003010555
8 pages
Sign Language To Text Conversion - A Survey
No ratings yet
Sign Language To Text Conversion - A Survey
8 pages
PPTT
No ratings yet
PPTT
35 pages
CCIS2018 jAUTI2017 OpenSigns
No ratings yet
CCIS2018 jAUTI2017 OpenSigns
16 pages
Visual Language Interpreter
No ratings yet
Visual Language Interpreter
7 pages
NCE Project Presentation Template (2)
No ratings yet
NCE Project Presentation Template (2)
25 pages
Real-Time Sign Language Interpreter Using Deep-Learning
No ratings yet
Real-Time Sign Language Interpreter Using Deep-Learning
8 pages
Sign Language Translator Presentation - II
0% (1)
Sign Language Translator Presentation - II
26 pages
Research Paper
No ratings yet
Research Paper
13 pages
Chen_A_Simple_Multi-Modality_Transfer_Learning_Baseline_for_Sign_Language_Translation_CVPR_2022_paper
No ratings yet
Chen_A_Simple_Multi-Modality_Transfer_Learning_Baseline_for_Sign_Language_Translation_CVPR_2022_paper
11 pages
Si-Lang Translator With Image Processing
No ratings yet
Si-Lang Translator With Image Processing
4 pages
ABSTRACT
No ratings yet
ABSTRACT
34 pages
AI Report
No ratings yet
AI Report
23 pages
Adobe Scan 11 Jun 2024
No ratings yet
Adobe Scan 11 Jun 2024
8 pages
s44163-024-00113-8
No ratings yet
s44163-024-00113-8
11 pages
Sign Language To Text Conversion in Real Time Using Transfer Learning
No ratings yet
Sign Language To Text Conversion in Real Time Using Transfer Learning
5 pages
Sign Language Detection
No ratings yet
Sign Language Detection
5 pages
PFX-48420843 (1)
No ratings yet
PFX-48420843 (1)
6 pages
2-Way Arabic Sign Language Translator Using CNNLSTM Architecture and NLP
No ratings yet
2-Way Arabic Sign Language Translator Using CNNLSTM Architecture and NLP
6 pages
Report Sld
No ratings yet
Report Sld
21 pages
1001 Submission
No ratings yet
1001 Submission
7 pages
Final Conf PPT
No ratings yet
Final Conf PPT
11 pages
Sign Language To Voice Converter
No ratings yet
Sign Language To Voice Converter
13 pages
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
From Everand
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
JED RAMOS
No ratings yet
D Linear Regression With R
No ratings yet
D Linear Regression With R
9 pages
BuiNguyenHoang Lab W02
No ratings yet
BuiNguyenHoang Lab W02
6 pages
Exercise Chap 22
No ratings yet
Exercise Chap 22
7 pages
Exercise Chap 22-1
No ratings yet
Exercise Chap 22-1
4 pages
Transformers For Vision
No ratings yet
Transformers For Vision
28 pages
Human Action Recognition On Raw Depth Maps
No ratings yet
Human Action Recognition On Raw Depth Maps
4 pages
ML-Aided Simulation: A Conceptual Framework For Integrating Simulation Models With Machine Learning
No ratings yet
ML-Aided Simulation: A Conceptual Framework For Integrating Simulation Models With Machine Learning
14 pages
FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce
No ratings yet
FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce
50 pages
EEE354 Assignment Answer Scheme
No ratings yet
EEE354 Assignment Answer Scheme
7 pages
Arduino PID Library Luminosity Control
No ratings yet
Arduino PID Library Luminosity Control
8 pages
1 Introduction To Business Communication
No ratings yet
1 Introduction To Business Communication
51 pages
Pertemuan5 EDA Checklist
No ratings yet
Pertemuan5 EDA Checklist
18 pages
Homonyms and Synonyms in Database
No ratings yet
Homonyms and Synonyms in Database
1 page
Program Crud Sederhana
No ratings yet
Program Crud Sederhana
2 pages
Holonic Manufacturing Systems
No ratings yet
Holonic Manufacturing Systems
19 pages
18EE3AI22 Kulkarni Yash Rajendra AI69002 Design Lab Report
No ratings yet
18EE3AI22 Kulkarni Yash Rajendra AI69002 Design Lab Report
3 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Improving Large Vocabulary Urdu Speech Recognition System Using Deep Neural Networks
No ratings yet
Improving Large Vocabulary Urdu Speech Recognition System Using Deep Neural Networks
5 pages
Optimal Input Is Comprehensible
No ratings yet
Optimal Input Is Comprehensible
2 pages
Ai PDF
No ratings yet
Ai PDF
23 pages
PDC - End Sem - 5
No ratings yet
PDC - End Sem - 5
2 pages
ML Mod-4
No ratings yet
ML Mod-4
30 pages
Clothing Recognition Using Deep Learning Techniques
No ratings yet
Clothing Recognition Using Deep Learning Techniques
51 pages
Artificial Intelligence: Production Systems
No ratings yet
Artificial Intelligence: Production Systems
21 pages
Differences Between First Language Acquisition and Second Language Learning
No ratings yet
Differences Between First Language Acquisition and Second Language Learning
8 pages
Deep Learning With Advanced NLP
No ratings yet
Deep Learning With Advanced NLP
18 pages
4 Theenterprisecloudcomputingparadigm 131130040800 Phpapp02
No ratings yet
4 Theenterprisecloudcomputingparadigm 131130040800 Phpapp02
13 pages
New HR Future
No ratings yet
New HR Future
4 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
DSP Lab#05
No ratings yet
DSP Lab#05
6 pages

SSICT-2023 Paper 5

Uploaded by

SSICT-2023 Paper 5

Uploaded by

Transformation of Text to Skeleton based Sign

pose sequence, the model has take advantage of tem-

Fig. 3. Overall pipeline for Text to Skeleton based Sign language

IV. EXPERIMENT AND RESULTS

You might also like