Bilingual Neural Machine Translation From English To Yoruba Using A Transformer Model
Bilingual Neural Machine Translation From English To Yoruba Using A Transformer Model
Abstract:- The necessity for language translation in Communication can be categorized into two: verbal/oral
Nigeria arises from its linguistic diversity, facilitating and non-verbal. Non-verbal communication, which predates
effective communication and understanding across verbal communication, does not rely on words but instead
communities. Yoruba, considered a language with limited employs gestures, cues, vocal tones, spatial relationships, and
resources, has potential for greater online presence. This other non-linguistic elements to convey messages. It is often
research proposes a neural machine translation model used to express emotions such as respect, love, dislike, or
using a transformer architecture to convert English text discomfort and tends to be spontaneous and less structured
into Yoruba text. While previous studies have addressed compared to verbal communication.
this area, challenges such as vanishing gradients,
translation accuracy, and computational efficiency for Verbal/oral communication involves the use of spoken
longer sequences persist. This research proposes to words or language to convey information. It requires
address these limitations by employing a transformer- organizing words in a structured and grammatically correct
based model, which has demonstrated efficacy in manner to effectively communicate ideas to the audience.
overcoming issues associated with Recurrent Neural Verbal communication offers several advantages over non-
Networks (RNNs). Unlike RNNs, transformers utilize verbal communication, including efficiency in time and
attention mechanisms to establish comprehensive resource utilization, support for teamwork, prompt feedback,
connections between input and output, improving flexibility, and enhanced transparency and understanding
translation quality and computational efficiency. between communicators.
Keywords:- NLP, Text to Text, Neural Machine Translation, Nigeria is a culturally diverse society with a multitude of
Encoder, Decoder, BERT, T5. languages [22]. Estimates suggest there are between 450 to
500 languages spoken throughout the country. While English
I. INTRODUCTION serves as the official language, Nigeria has also designated
three regional languages: Hausa, spoken by approximately 20
The existence of man on the surface of the earth is million people in the North; Yoruba, spoken by about 19
identified as the existence of communication. In the million in the West; and Igbo, spoken by around 17 million in
beginning, God created man in His image and since then there parts of the South-East. These three languages are considered
has been a communication between man and God and major languages.
between men (Genesis 1:27-29). This simply shows that
communication is the basis of human animal existence. Additionally, Nigeria recognizes about 500 other
Communication is the means of sending and receiving languages, some of which hold varying degrees of official
information, ideas, thoughts, and feelings between status within their regions and serve as lingua franca, such as
individuals or groups. It involves the transmission and Efik, Ibibio, and Fulani. Among these languages, Nigerian
reception of messages through various channels such as Pidgin serves as a national lingua franca.
speech, writing, gestures, body language, and even through
technological means like emails or social media. This is the In the Nigerian context, "majority" languages refer
reason why the desire to communicate is essential in man and specifically to Hausa, Yoruba, and Igbo, while "minority"
this is exhibited even by babies, who in their own way express languages encompass those spoken by smaller ethnic groups.
their needs and feelings by crying, laughing, groaning, etc., These distinctions are formally acknowledged and hold
even animals communicate to one another, while machines on significant roles in various sectors, particularly in education
the other hand communicate at their lowest levels using within the country.
discrete values of 1s and 0s called bits i.e., binary digits.
English Language has since continued to gain so much Neural machine translation encompasses various forms:
prominence in the country that its dominance has stifled the text to text, text to speech, speech to speech, and speech to
growth (and even led to the extinction of some) of the 529 text. Natural languages are categorized into high-resource
indigenous languages in Nigeria [18]. English holds a languages (HRL) and low-resource languages (LRL) based on
dominant position across all sectors in Nigeria, including the availability of textual and speech data on the web [17].
government, education, media, judiciary, and science and High-resource languages, such as English, benefit from
technology. Government officials often refrain from using abundant data resources that facilitate machine learning and
indigenous languages even in informal settings to avoid understanding. Similarly, many Western European languages
perceptions of tribalism. Both at the national and state levels, like French and German, as well as Chinese, Japanese, and
English remains the primary language for debates and official Russian, fall into this category. In contrast, low-resource
records, despite efforts to promote indigenous languages. Due languages lack sufficient data resources, making them less
to Nigeria's linguistic diversity, English plays a central role in studied, resource-scarce, and less commonly taught. These
national unity and administrative functions. Legal languages are often characterized by limited computational
proceedings, particularly in the Supreme Court and High tools and sparse textual and speech data online [21]
Courts, are primarily conducted in English, with lawyers
presenting arguments and judges delivering judgments in the The Yoruba language falls under the category of low-
same language. resource languages (LRL), facing several significant
challenges. One of the primary issues is the difficulty in
However, the prevalence of the English language in employing alignment techniques across different levels (word,
Nigeria, particularly in the southwestern region, has resulted sentence, and document) for annotation, creating a cohesive
in the diminishing use of our esteemed Yoruba language, dataset, and collecting raw text. These tasks are essential for
notably among its speakers. This shift has profound cultural any natural language processing (NLP) endeavor or mapping
and linguistic implications for national identity, leading to technique [17].
many children being unable to communicate in their native
tongue. As a result, many children are unable to speak their Furthermore, the lack of adequate textual material poses
native language. Language and culture are inseparable, and a critical challenge, as lexicons form the backbone for many
the decline in the use of Yoruba among younger generations NLP tasks. Developing an effective lexicon for LRLs like
has contributed to the erosion of our cultural heritage. Yoruba remains particularly challenging due to limited textual
Consequently, younger people are increasingly losing touch resources. Additionally, the morphology of LRLs such as
with the core values and traditions of their cultures. This shift Yoruba is dynamic and continuously evolving, leading to a
is also evident in their fashion choices, which now mirror diverse and expansive vocabulary. This variability
those of the predominant language speakers. complicates the development of a robust framework for
morphological pattern recognition, given the presence of
Research conducted by [19] showed that using the multiple roots and complex word formations.
mother tongue in primary education significantly benefits
learning outcomes. In a six-year primary education project, it This research is set to propose a text-to-text neural
was observed that pupils taught in Yoruba performed better machine translator for English language to Yoruba language
compared to those taught solely in English. It was noted that to address the challenges of low resource language as Yoruba
"the child learns better in his mother tongue, which is as language, extinction of Yoruba Language, inability of Yoruba
natural to him as his mother’s milk." However, if English children to speak Yoruba language, vanishing of Yoruba
continues to serve as Nigeria's national and educational culture, and the aforementioned challenges using a
language, as well as its lingua franca for unity in a diverse and transformer model. The Transformer model is a type of neural
multilingual country, its pervasive use in media, legislation, network [15] initially recognized for its exceptional
and throughout society could pose a threat to indigenous performance in machine translation. Today, it has become the
languages. This trend suggests that indigenous languages are standard for constructing extensive self-supervised learning
not merely at risk but are gradually declining [20]. systems. In recent years, Transformers have surged in
prominence not only within natural language processing
To break the language barrier, there arise the language (NLP) but also across diverse domains like computer vision
translator. Language translators are essential tools that enable and multi-modal processing. As Transformers advance, they
communication between individuals or groups who speak are progressively assuming a pivotal role in advancing both
different languages. Initially, human language translators are the research and practical applications of artificial intelligence
there to translate languages. However, human language (AI) [16].
translators face several challenges and problems, despite their
invaluable role in facilitating communication across
languages. Some of these challenges include accuracy,
precision, ambiguity etc. Neural machine translation (NMT)
has revolutionized the field of translation due to several key
advantages it offers over traditional approaches.
II. LITERATURE REVIEW translating each phrase into the target language, and
optionally reordering them based on target language models
Many researchers have worked in the area of Neural and distortion probabilities. However, the research faces
Machine Translation, most especially translation of English challenges such as insufficient data, orthographic errors, and
language to Yoruba Language. Such among them are: high error rates during system compilation.
In their work titled "Web-Based English to Yoruba In [6] titled Japanese-to-English Machine Translation
Machine Translation" [24], the researchers aimed to achieve Using Recurrent Neural Networks system. The research
English-to-Yoruba text translation using a rule-based method. objective is to translate Japanese language to English
Their approach primarily utilized a dictionary-based rule- language. A bidirectional recurrent neural network (BiRNN)
based system, considered the most practical among various was used as the encoder with the forward hidden state of an
types of machine translation methods. However, a limitation RNN that reads the source sentence as it is ordered and the
of this approach was its inability to accurately translate backward hidden state of an RNN that reads the source
English words with multiple meanings, influenced by their sentence in reverse. However, there was no complex sentence
grammatical context. translations.
In their study titled "Development of an English to In "[7], titled 'Arabic Machine Translation Using
Yorùbá Machine Translator" [1], the researchers aimed to Bidirectional LSTM Encoder-Decoder,' the research aimed to
create a machine translator that converts English text into develop a model primarily based on Bidirectional Long Short-
Yorùbá, thereby making the Yorùbá language more accessible Term Memory (BiLSTM). The objective was to map input
to a wider audience. The translation process employed phrase sequences to vectors using BiLSTM as an encoder, and
structure grammar and re-write rules. These rules were subsequently use another Long Short-Term Memory (LSTM)
developed and evaluated using Natural Language Tool Kits for decoding the target sequences from these vectors. The
(NLTKs), with analysis conducted using parse tree and input sentences were sequentially embedded into the BiLSTM
Automata theory-based techniques. However, the study noted encoder word by word until the end of the English sentence
that accuracy decreases as the length of sentences increases. sequence. The hidden and memory states obtained were fed
into the LSTM decoder to initialize its state, and the decoder
In [2], titled 'Machine to Man Communication in Yoruba output was processed through a softmax layer and compared
Language,' the research aims to establish communication with the target data. However, the system faced limitations
between humans and machines using a Text-To-Speech due to being trained on a limited number of words in the
system for the Yoruba language. The study involves text corpus.
analysis, natural language processing, and digital signal
processing techniques. An observed outcome of the research In "[8], titled 'Using Bidirectional Encoder
is that pronunciation accuracy decreases as sentence length Representations from Transformers for Conversational
increases. Machine Comprehension,' the goal was to apply the BERT
language model to the task of Conversational Machine
[3] in “Development of a Yorúbà Text-to-Speech Comprehension (CMC). The research introduced a novel
System Using Festival”. The objective of the research is to architecture called BERT-CMC, which builds upon the BERT
develop a speech synthesis for Yoruba language. Speech data model. This architecture incorporates a new module for
were recorded in a quiet environment with a noise cancelling encoding conversational history, inspired by the Transformer-
microphone on a typical multimedia computer system using XL model. However, experimental findings indicated that
the Speech Filing System software (SFS), analysed and Transformer-based architectures struggle to fully capture the
annotated using PRAAT speech processing software. contextual nuances of conversations.
However, the speech produced by the system is of low quality
and the measure of naturalness is low. In "[9], titled 'Development of a Recurrent Neural
Network Model for English to Yorùbá Machine Translation,'
In [4], titled 'Token Validation in Automatic Corpus the aim was to create a recurrent neural network (RNN) model
Gathering for Yoruba Language,' the research aims to create a specifically for translating English to Yorùbá. The model
language identifier that identifies potentially valid Yoruba underwent testing and evaluation using both human and
words. The approach involves using a dictionary lookup automatic assessment methods, demonstrating adequate
method to collect words, which is implemented as a Finite fluency and providing translations of decent to good quality.
State Machine using Java. However, the system's capability is However, the research had limitations: it could only handle
currently restricted to gathering monolingual Yorùbá language single tasks and a limited vocabulary size, and it lacked the
data. capability to expand its vocabulary extensively.
Normalization converts features into similar scale, this will The research will make use of T5-Base on the decoder
improve the performance of the system. side of the transformer. Text-to-Text Transfer Transformer
(T5). T5-Base is the standard or baseline version of T5. It
Next is the Feed Forward Network (FFN) comprises strikes a balance between model complexity and efficiency,
two dense layers that are individually and uniformly applied making it widely used and well-suited for many NLP tasks. It
to every position. The Feed Forward layer is primarily used to offers a good trade-off between computational cost and
transform the representation of the input sequence into a more performance and serves as a solid starting point for most
suitable form for the task at hand. This is achieved by applications. T5-Base strikes a good balance between
applying a linear transformation followed by a non-linear computational efficiency and task performance, making it a
activation function. The output of the Feed Forward layer has widely used variant.
the same shape as the input, which is then added to the
original input. The output will be embedded as done in the encoding
stage
The FFN takes a vector (the hidden representation at a
particular position in the sequence) and passes it through two = +TE+SE (14)
learned linear transformations, (represented by the matrices
and and bias vectors and . A rectified-linear Where is the positional encoding, TE is the Token
(ReLU) activation function applied between the two linear encoding and SE is the segment E,
transformations.
where = sequence of length and is the size of the
embedding vector.
) + (11)
The will be divided into four,
Where and and and are bias vector
one will be sent to the Add and Norm phase and three will be
sent to the Multi-Head Attention. The three will be Query (Q),
After the Multi Head attention, the result is added to the
Key (K) and Value (V).
embedded input using the residual connection.
The attention between the Q,K and V is calculated by
= (12)