Challenges_in_Rendering_Arabic_Text_to_English_Using_Machine_Translation_A_Systematic_Literature_Review

Uploaded by

jood.otoom15

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Challenges_in_Rendering_Arabic_Text_to_English_Using_Machine_Translation_A_Systematic_Literature_Review

Uploaded by

jood.otoom15

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Received 10 July 2023, accepted 21 August 2023, date of publication 29 August 2023, date of current version 7 September 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3309642

Challenges in Rendering Arabic Text to English

Using Machine Translation: A Systematic
Literature Review
SHAHAB AHMAD ALMAAYTAH 1 AND SOLEMAN AWAD ALZOBIDY 2
1 Department of English Language and Humanity, Applied College, King Faisal University, Hofuf 31982, Saudi Arabia
2 Department of English Language and Translation Studies, College of Sciences and Theoretical Studies, Saudi Electronic University, Ryiadh 11673, Saudi Arabia

Corresponding author: Shahab Ahmad Almaaytah ([email protected])

The authors extend their appreciation to the Deanship of Scientific Research at King Faisal University for funding this research work
through the project number GRANT2, 503.

ABSTRACT The Arabic text can be translated into English using a variety of machine translation techniques.
The translation of Arabic text into English still poses a number of challenges in contemporary Arabic.
To identify these challenges that encounter while translating Arabic text into English using machine trans-
lation, a systematic literature review (SLR) approach is used. The SLR steps—protocol creation, first and
final selection, quality assessment, data extraction and synthesis—are used. Nineteen challenges are reported
during the SLR process based on fifty-six research papers. The four most important problems are carefully
examined, and the possible solutions of other researchers are discussed. Word sense disambiguation, Arabic
named entity, rich and complex morphology and low resource are the four critical challenges during rendering
Arabic text to English text. Other challenges are also reported in this article.

INDEX TERMS Natural language processing, machine translation, Arabic, systematic literature review,
challenges.

I. INTRODUCTION that indicate the quality of the translation with reference

Machine translation (MT) has advanced for practically all to the Arabic language needs to be further improved. The
languages in recent years and has become quite important in challenges faced by machine translation can be broadly
many applications [1]. As a result, current MT advancements divided into two groups: technical challenges and linguistic
have greatly improved translation quality [2]. Machine trans- challenges.
lations that are correct and precise are becoming more and A major technical challenge associated with AMT is the
more in demand. Finding an adequate and ideal translation, lack of datasets and lexical resources that can be utilised
however, is a difficult task in any linguistic context [3], [4]. as common benchmarks for conducting unified tests. As a
Different machine translation systems already exist includ- matter of truth, academics frequently only collect data rel-
ing Al-MutarjimTM Al-Arabey 3.01 , Sakhr2 , SYSTRAN3 , evant to their own fields of study, ignoring a wide range
Shaheen4 , Bing Translator5 , Babylon6 , and Google Trans- of other fields in the process. They then used these data
late7 . There are several challenges highlighted in various to try to fix the linguistic problems with Arabic. MT is
study works [3], [5], [6], [7], such as linguistic mistakes, made more difficult by additional technological difficulties
such out-of-vocabulary (OOV), extremely long sentences,
The associate editor coordinating the review of this manuscript and and out-of-domain test data [8]. Examples of effective solu-
approving it for publication was Agostino Forestiero . tions include BPE [9], character-level BPE variation [10], and
1 https://al-mutarjim-al-arabey.software.informer.com/3.0/
2 http://www.sakhr.com/index.php/en/
hybrid approaches [11].
3 https://www.systransoft.com/ A main linguistic challenge is the nature of the Ara-
4 https://mt.qcri.org/api bic language as great degree of ambiguity, linguistic com-
5 https://www.bing.com/translator/ plexity, and variety when compared to other languages.
6 https://translation.babylon-software.com/ Other Arabic features like word order freedom, several dia-
7 https://translate.google.com
critization schemas, a wide variety of dialectal variants along
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
94772 VOLUME 11, 2023
S. A. Almaaytah, S. A. Alzobidy: Challenges in Rendering Arabic Text to English Using MT

social and geographic dimensions present serious linguistic all the terms. In addition, linguistic specialists are required to
problems to MT [12]. For instance, it has been demonstrated develop thorough norms.
that improving the performance of AMT [13], [14], [15], [16],
[17] through pre-processing the Arabic source by morpholog- B. STATISTICAL MACHINE TRANSLATION
ical segmentation [14], [15], syntactical reordering [16], and Statistical machine translation [4], [20], [21], [22] uses sta-
hybridization. tistical models from a group of datasets made up of parallel
A survey on Arabic machine translation was conducted corpora that have been sentence-aligned. For the majority of
to explore the techniques that employ machine transla- languages, phrase-based models to SMT provide the most
tion available in literature and to encourage researchers to cutting-edge performance [24]. In this approach, initially, the
study these techniques. This survey focused on the sum- translation model is trained on the bilingual corpus to esti-
marization of major techniques used in machine translation mate the probability of the source sentence being a translated
from Arabic into English, and discusses their strengths and version of the target sentence. Then the language model is
weaknesses [4]. trained on monolingual corpora which is used to improve the
Various surveys [4], [18], [19] are conducted in which the fluency of the output translation. At the end, the maximum
topic of Arabic machine translation to other languages was probability of product of both the language model and the
thoroughly examined. All of these earlier analyses and studies translation model is computed which gives the most probable
came to the conclusion that it is difficult to design a good MT sentence in the target language. Phrase-based, syntax-based,
system that satisfies human criteria [4]. However, none of the and hierarchical phrase-based models are the three types of
mentioned survey papers performed a systematic literature SMT models [22].
review to identify the existing challenges of Arabic Machine Statistical MT can handle ambiguity by recording
Translation (AMT). This research paper adopt the systematic phrase-based translations with their frequency of occurrence
literature review for the identification of various challenges on a phrase table [4], [20]. The translation result generates
and their possible solutions exists in the literature. Further through this approach is more fluent and natural. In addition
classification of these challenges is performed. In order to this mechanism is language independent, easy, cheap and fast
accomplish this, we intend to address the following research to build.
questions:
Research Question 1: What are the challenges, as identi-
C. NEURAL MACHINE TRANSLATION
fied in the literature, of rendering Arabic text to English using
machine translation? Neural MT models have been proposed and have outper-
Research Question 2: What are the proposed solutions formed than other mechanisms though these models need a
and its limitations, as identified in the literature of rendering huge amount of parallel data to be trained. Convolutional
Arabic text to English using machine translation? neural networks (CNN) are used to encode a source text into
a continuous vector, and recurrent neural networks (RNN)
are used as the decoder to predict the word in the destination
II. ARABIC MACHINE TRANSLATION MECHANISMS language. The concept of the attention mechanism was devel-
Rule-based, statistical, and neural machine translation are the oped by Bahdanau et al. [2], where the decoder pays attention
three basic mechanisms for machine translation [4]. These to input or to any element of the input text. A vector with the
three approaches are also used in Arabic machine translation. same size as the input sequences is produced by calculating
attention using each encoder output and the current hidden
state.
A. RULE-BASED MACHINE TRANSLATION A neural MT model was created between Arabic text and
In Rule-Based machine translation, a set of linguistic rules English by Almahairi et al. [25]. In some studies, neural char-
are used to translate the source text to the target text [4], [20], acteristics for Arabic text is investigated [25], and [26]. The
[21], [22]. A language specialist usually develops the rules. primary difference between neural and statistical MT is that
The use of bilingual or multilingual lexicons, including those the former has a specific language model while the latter has
for Arabic and other languages, is another component of this seen success in a variety of domains [25] in terms of fluency
strategy. Keep in mind that the lexicons and rule collection and accuracy. The fundamental issue with NMT, however,
were constructed manually. is that it necessitates the use of a large parallel corpus [25],
The main Arabic MT system, known as UniArab, was which increases the complexity of the training model.
created by Salem et al. [23] as a global MT system based on Baniata et al. [27] introduced Transformer-based neu-
a linguistic model. This method’s strength lies in its ability ral machine translation model for Arabic text. This system
to thoroughly examine both the syntax and semantic lev- used subword units and shared vocabulary within the Arabic
els, as indicated in [4], and the fact that it still works for dialect to enhance the behavior of the multi-head attention
language pairings with little available parallel data, such as sublayers for the encoder. Experiments are carried out to
low-resource language pairs [22]. However, it is hard to create validate that the proposed mechanism adequately addresses
laws that apply to all languages because doing so would need the unknown word issue and boosts the quality of Ara-
extensive linguistic expertise and a top-notch dictionary. The bic translation. Self-attention-based Transformer [28] is a
latter is more expensive to construct and might not include stack of layers in a sequence-to-sequence model. To create