0% found this document useful (0 votes)
22 views

Proposed Model for Arabic Grammar Error Correction Based on CNN

This paper presents a model for Arabic Grammar Error Correction (GEC) using Convolutional Neural Networks (CNN) with an attention mechanism. It addresses the challenges in Arabic Natural Language Processing (ANLP) and aims to improve the accuracy of grammar correction tools, which are currently lacking. The proposed model utilizes multi convolutional layers and incremental training techniques to enhance performance and achieve human-level results.

Uploaded by

Ahad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Proposed Model for Arabic Grammar Error Correction Based on CNN

This paper presents a model for Arabic Grammar Error Correction (GEC) using Convolutional Neural Networks (CNN) with an attention mechanism. It addresses the challenges in Arabic Natural Language Processing (ANLP) and aims to improve the accuracy of grammar correction tools, which are currently lacking. The proposed model utilizes multi convolutional layers and incremental training techniques to enhance performance and achieve human-level results.

Uploaded by

Ahad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2019 International Conference on Computer, Control, Electrical and Electronics Engineering (ICCCEEE19)

Proposed Model for Arabic Grammar Error


Correction Based on Convolutional Neural
Network
Aiman Solyman¹, Zhenyu Wang¹, Qian Tao¹

¹School of Software Engineering, South China University of Technology, Guangzhou, China


[email protected], [email protected], [email protected]

Abstract—Deep learning and machine learning algorithms natural languages, in particular, create concepts, find methods,
are widely used in Arabic Natural Language Processing process and analyze large amounts of natural language data [4].
(ANLP) aims to develop tools and techniques to process Recently, Arabic Natural Language Processing (ANLP) has
human languages in several forms like written and spoken received more attention after significant research carried out on
context. The ANLP still lacks tools and applications to other languages like English NLP. Furthermore, several
bridge the hole between Arabic and other languages. applications have been developed using machine learning (ML)
Furthermore, there are insufficient available resources and Deep learning (DL), included text categorization, sentiment
such as dictionaries, grammatical rules, corpora, etc. analysis, grammar error correction, and dialogue systems [5].
Grammatical Error Correction (GEC) one of the NLP tasks However, ANLP nowadays still lacks the tools to cover various
seek to develop automatic tools to correct grammar and application [4], and the quality of these tools requires additional
spelling, the input is incorrect words or sentences and the efforts to bridge the hole between Arabic and other languages.
output become the corrected version of the same sentences.
Also, we have to attend to the availability of resources such as
Recently, Neural Networks are used in GEC and had
dictionaries, grammatical rules, corpora, etc.
promising results but some improvement is still needed.
Grammatical Error Correction (GEC) is a subfield of NLP,
The limitation of the previous studies are handcrafted and
most often extracted from short sentences, also the whole aims to build automatic systems to correct different kinds of
previous Arabic neural approaches are used Recurrent errors in text, such as spelling, grammatical, and word choice
Neural Networks (RNNs). This paper present work-in- errors, like human. GEC systems typically work for the
progress for developing an Arabic GEC model based on sentence correction, taking a potentially erroneous sentence as
multi convolutional layers with an attention mechanism. input and transform it into its corrected version. It also
Moreover, proposed the incremental techniques and multi- normalized the text from a set of errors, as shown in table 1,
round training model using parallel corpus to get more listed by [6].
accuracy and fluently results, also to achieve human-level
TABLE I
performance. The most famous error list in Arabic language

Indexed Terms — Arabic Natural Language Processing Error type Description


(ANLP), Grammatical Error Correction (GEC) , Spelling errors Use an incorrect word with in the sentence
Convolutional Neural Network (CNN). Word Choice Errors wrong place for the punctuation allocation in
the sentence
Punctuation errors inadequate lexicon and local diallage
I. INTRODUCTION language
Arabic language classified as a Central Semitic language Lexical errors incorrect derivation or inflection
Morphological errors grammatical errors, agreement of gender,
(CS) [1], and now it is the lingua franca of 22 Arab countries number
distributed between Africa and Asia [2]. Furthermore, it is one Syntactic errors Use an incorrect word with in the sentence
of the six official languages of the United Nations (UN), spoken
by about 420 million people and used by one billion six hundred
million Muslims in their daily worship [2]. There are two main Human languages are an irregularity, complexity, and
versions of Arabic language [3], first one is Modern Standard variability of error types, as well as the semantic and syntactic
Arabic (MSA), this is the most widely used version in media on the context dependencies. Many models were developed to
outlet from TV, movies, newspapers and radio broadcasts. The get better performance for text normalization and error
second version, Classical Arabic (CA) or Quranic Arabic correction.
version, it is a little complicated for non-native Arabic speakers. Moreover, the development of some sub-domains in ANLP,
It has special symbols called (Tanween-‫)ﺗﻨﻮﻳﻦ‬, used to signify like machine translation, question answering, language
proper pronunciation to give certain effects to words. generation, and multi-document summarization depends on the
Natural language processing (NLP) is a field of computer availability of good grammar models which are covers the
science that aims to allow users interact with computers using entire language. Recently, Neural network achieved promising

978-1-7281-1006-6/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 05:11:25 UTC from IEEE Xplore. Restrictions apply.
results on GEC using Convolutional Neural networks (CNNs) have a more morphological like prefixes and suffixes represent
and Recurrent Neural Networks (RNNs). The power of the features such as gender, person, number and mood. Moreover,
CNN is considers feature extraction and classification as one this tends of the MRL languages have more inflect words types
joint task. The input of the most NLP tasks consists of [4], this feature increases the degree of stemming and ambiguity
characters, words, sentences or documents represented as a from different meanings of the same surface word, figure 1 as
matrix. The available approaches were handcrafted and most an example shows the complexity of MRL for a single word.
often extracted from short sentences. The whole previous This ambiguity is aggravated because of some words in the
Arabic neural approaches for GEC used RNNs. The current Arabic language has about up to 12 analyses, for an example,
project will be the basis for future ALNP projects such as text the word ‫ ﻓﺴﻴﺎﻛﻠﻮﻧﻬﺎ‬fasayakolonahA "and thy will eat it" started
and ended by a set of suffix and prefix represent for objects,
generation, dialog systems, and semantic parsing. We proposed
processive or personal.
an encoder-decoder model with nine convolutional layers and
This kind of complexity and ambiguously effect of ANLP
attention mechanism, and use machine translation (MT) for
application, for example, the common automatic machine
Arabic GEC task. The proposed model will be able to apply translation software like Google, Bing or Baidu is still far from
with other languages like English and Chinese. Moreover, the having a good and high accuracy translation Arabic to other
model will be trained incrementally based on words with rare languages [8].
word segmentation to achieve better and flaunt results.
The rest of this paper is organized as follows: Section II III. NEURAL NETWORKS ALGORITHMS
Arabic language challenges. In Section III, Neural network
The main concept of Artificial Neural Networks (ANN) is
algorithms. The background and related work in Section IV.
trying to simulate the human brain, and the Biological Neural
Section V. the proposed model. The results in Section V1.
Networks (BNN). The BNN architecture it is an interconnected
Finally, the conclusions and future work in the last section. neurons networks aims to transmit patterns of electrical signals,
each node received an input signal and based its output via an
II. ARABIC LANGUAGE CHALLENGES axon as an input to another node. Early in 1957, first ANN
The Arabic language has a unique architecture, and it is a presented by Rosenblatt it was able to vary its own weights, and
little difficulty, and complicity even for native-language have an ability to learn and develop itself to solve linear and
speakers. The architecture consists of grammar, spelling, simple problems.
pronunciation, dialogue languages and punctuation marks. The
main characteristics and uniqueness of the Arabic language, as
follows [7]: 1) right-to-left language means reading and writing
from left to right. 2) Arabic language has 28 characters, and
some of these characters each one has different shape depends
on its position within the word like character / ‫ ﻍ‬gain/ can be
used in 4 forms as the following (" ‫ﻍ‬, "‫"ﻋــــ‬, "‫"ــﻌــ‬, "‫)" ــﻊ‬. 3) The
shape of many characters is the same, and we differentiate
between them using a dot above or below the letters for example
(n-‫ﻥ‬, b-‫ﺏ‬, t-‫)ﺕ‬. 4) There are no upper and lower characters,
same like Chinese and Korean. 5) Numbers are classified based
on gender (feminine and masculine), singular, dual and plural.
6) usually the words comprise several formed roots each root
often composed of three letters. 7) designating verbs in the Figure 2. The architecture of a simple Neural Network.
present or future tenses using prefixes but identifying verbs in
the past tense by suffixes. Figure 2 shows the architecture of a simple Neural Network.
Moreover, we get the output by computing the linear
combination of the input widgets using a nonlinear activation
function. Technically, let's consider we have x₁, x₂, ... , xₙ as
input node and b ϵ bias, and y is the output, calculated by
summation of each input widgets w₁, w₂, ... , wₙ and b, as follow
function:

‫ ݕ‬ൌ  σ௡௜ୀଵ ‫ݓ‬௜ ‫ݔ‬௜  ൅ ܾ

Recently, the ANN become more complex and the


application of NN widely used in our daily life. In this paper,
we are focusing on using the ANN in ANLP. There are many
Figure 1. An example for the rich Arabic words morphology algorithms and models, and it has achieved a great success with
and ambiguity [4].
natural languages processing, as follows:
Furthermore, the Arabic language classified as a
morphologically rich languages (MRL), that means its words

Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 05:11:25 UTC from IEEE Xplore. Restrictions apply.
A. Recurrent Neural Networks joint probability. Finally, used the softmax function [14], to
A recurrent neural network (RNN), it is popularly used in normalize the probability distribution for the output of the last
NLP, and the main idea is to use cycle connection between each layer of the decoder. There are many studies that are used
units in the network [9]. This feature worked as embedded sequence-to-sequence RNN models with LSTM or Attention
memory to keep the network state, used to process sequence of model to overcome the vanishing problem and improving the
inputs and made the RNN more appropriate for language quality of the output.
processing tasks such as speech recognition, dialogue system,
and text normalization. The deep RNN has a limitation with the D. Convolutional Neural Network
long sequence of text and complex data this problem called Convolutional neural network (CNN), it is multilayer
vanishing gradient. The vanishing problem makes training perceptron and fully connected networks, that means each node
parameters less effectively, also more time-consuming and in the layer are connected to all nodes in the next layer [15].
computationally expensive at the earliest layers. To overcome Also, knows as one of the classical neural networks commonly
this problem we used Long Short-Term Memory (LSTM) unit used to analyze and process images. The architecture of CNN
[10], aims to categorize data into long term and short term consists of the input layer, output layer, and convolutional
memory cells. This technique allows RNN to determent the hidden layers passed results as a mathematical form to
important data to be memorized and looped back into the successive layers. Recently, CNN applied for NLP tasks like
network again, and the rest of the data should be forgotten. semantic analysis, machine translation, Grammar error
Moreover, we used Back-propagation Through Time (BPTT) correction and achieved promising results. CNN improved the
method [11] fro training RNN. speed of training and computation when we use a large amount
of vocabulary.
B. Bidirectional Recurrent Neural Networks The current study will use a hybrid of efficient and modern
Bidirectional Recurrent Neural Networks (BRNN), invented models based on Multi encoder-decoder convolutional layers
by Schuster and Paliwal in 1997 [12]. The architecture of (nine layers) with an attention mechanism. Also, we will use
BRNN is based on connecting two hidden layers, and passing pre-training word level to each word embedding by initializing
the input sequence through the opposite side of the two layers. the source and target words from a large Arabic corpus using
This kind of interconnection allows the output layer to get the the fasttext tool.
information from the previous layer (backward), and the latest
layer (forward) simultaneously. BRNN have hidden layers of IV. BACKGROUND AND RELATED WORK
the two RNN to update and increase the available information Machine learning and deep learning showed promising
of the output. The BRNN are useful when the sequence of the results in automatic spelling and grammar correction. In this
input is need to know the token located before and after the section will present the previous methods and techniques for
current token. BRNN applications as, grammar and spelling error detection and correction. Generally, the GEC models in
correction, semantic analysis, and handwriting recognition. the first step used to detect the incorrect spelling words,
Training the BRNN is almost similar like RNN training, we morphological and syntactic errors in the input text. There are
used BPTT method with some different because the two hidden two main techniques used to detect incorrect-word or spelling
layers have no interactions when applying back-propagation, so errors [16], as follows:
we need additional processes to update input and output layers,
that cannot be done at once. We fix this problem by passing the A. Dictionary lookup
forward states and backward states first, then network output is It is a basic technique aims to compare the input strings with
passed. the language resource such as lexicon, dictionary or corpus. The
language resource usually contains all inflected forms of the
C. Sequence to sequence model word. Then the system starts looking for the given word, if it
One of the language models used neural networks aims to did not find in the language resource, will mark it as an un-
map a sequence of input words or sentences into the same know word or incorrect word. This kind of techniques improves
sequence length output, and different form. Sequence to the search performance and reduce the size of resources through
sequence (seq2seq) or decoder-encoder model [13], has the pattern-matching algorithms and the morphological
successfully applied in applications such as online chatbots, analysis.
Google Translate, and voice-enabled devices. The main
components of the current model are the encoder and decoder. B. N-gram analysis
In the encoder, the process started by receiving the input Used with the statistical models were designed to assign a
sequence as x = (x₁, x₂, … , xₙ), unlike simple BRNN or RNN probability to sequence items such as samples, letters or words
no need to take into account the state each unit, only keep the in such a sequence order text. Also, for the task of error
last layer state often called context vector or sentence detection, n-gram analysis used to estimate the likelihood of the
embedding intended of representing the input sentence. On the given input and accordingly identifying the correct spell word.
other hand, passing the context vector and all the previously lately, n-gram analysis is widely used in speech recognition and
predicted words y = (y₁, y₂, ..., yₙ) to the decoder. The decoding statistical natural language processing.
process predicts the next word yₜ. Moreover, the decoder
identifying the probability over the output y by eliminating the

Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 05:11:25 UTC from IEEE Xplore. Restrictions apply.
Figure 3. The architecture of the proposed convolutional encoder-decoder model.

Error correction techniques seek to return correct sentences 'good' (correct) Arabic, studies used this technique like [19, 20].
from an incorrect given sentence, the inputs can be considered There are many studies tried to combining the rule-based and
as the sequence of words and the output are the predicted deep learning algorithms into a hybrid system using a grammar-
corrected words, the modern GEC approaches as follows: based parser for text-to-SQL translation, and deep learning to
complement the rule-based grammar by fixing the syntax,
A. Rule-based Grammar eradicating typos, such studies like [21, 22].
Basically, it is about building linguistic rules-based system Furthermore, the previous end-to-end Arabic GEC
structures, imitates human which plays a main role in the approaches were handcrafted and most often extracted from
building and improving these kinds of systems. The biggest short sentences, and used recurrent neural networks (RNNs).
advantage of rules-based grammar that is always a way to check Moreover, there are limitations on training models based on a
the query placed by a user and how it could do that. Also, the limited error correction sentence pairs that not allow models to
whole rules are written by human, controlled and able to report correct sentences perfectly. Also, using single-round to correct
any error to localize and fix by adjusting the rules in the related sentences is usually not suitable for sentences with multi
module. Moreover, the grammar rules-based systems are very grammar errors.
flexible manner, and can easily be updated with new data types
or functions, and without any significant changes to the rule V. PROPOSED MODEL
system. This approach is based on extension of the existing The encoder-decoder models are powerful and widely used
rules, so the system doesn't require a massive training corpus. in machine translation and text normalization tasks. The
The limitation the rule-based systems it is requires skilled encoder network is responsible to map the erroneous source
experts on linguist or a knowledge engineer to manually encode sentence into a fixed length of a context vector. The decoder
and apply each rule in NLP, and this leads the system network seeks to generate the output as a corrected sentence
development to be more complex. There are many models and from the context vector. The weakness of the encoder-decoder
approaches for Arabic gramma rule-based such as, [17, 18]. models is the poor performance with the long input, to address
this problem we use attention mechanism [23]. Attention used
B. Machine-learning and Deep learning algorithms with the long sequences of text to speed up the training model
Recently, it is widely used in GEC task, these algorithms are and allow the decoder network to attend the different sections
able to understand human language without being explicitly of the input sentence at each time step when predicting the
programmed through statistical methods. Machine learning and output sequence.
Deep learning systems start analyzing the training set to build There are two main limitations for seq2seq models in GEC
its own knowledge, produce its own rules and classifiers. Deep task [24], as follows: (1) the limited error corrected sentence
learning algorithms are based on probabilistic results, and the pairs for training datasets lead to ineffective corrections and fail
obvious advantage of deep learning it’s the ability of to achieve totally correct sentence; (2) the seq2seq models
learnability, also no manual rule grammar coding is needed. usually correct the sentence on single-round inference, and this
Moreover, it is an attractive and simpler alternative is to think is not an efficient way to correct complex errors because some
of error correction as a translation task. The underlying idea is corrected errors in a sentence make the context strange. The
that a statistical machine translation (SMT) system should be philosophy of the proposed model is to train the model
able to translate text written in 'bad' (incorrect) Arabic into incrementally by increasing the error-corrected sentence pairs.

Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 05:11:25 UTC from IEEE Xplore. Restrictions apply.
Also, generate and use less fluent sentences during training output vectors. The number and size of convolution filters are
to achieve human-level performance. Generating fluency boost the same as those in the encoder, also each decoder layer has its
of sentence pairs during training time will be additional training own attention module calculated by predicting the target word
instances at the subsequent training epochs to improve the error at each time step tₙ plus biases b and the dimension victor
correction model performance and correct more sentences. weight W multiply by decoder state y as the following equation:
Moreover, to get better results we have to correct the sentence
incrementally using fluency boost mechanism with multi-round ‫ݖ‬௡௟ ൌ  ܹ௭ ܻே௅ ൅ ܾ௭ ൅ ‫ݐ‬௡ିଵ
correction inference [24]. The proposed model will be based on
the encoder-decoder architecture using nine convolutional The attention weights are computed by using softmax to
layers with attention mechanism as in [25]. Let's consider we normalize a dot product of the encoder output vectors e₁, …, eₙ
have a sequence S of words as input consists of n tokens as S₁, with zₙ. The source context vector xₙ is computed by summation
..... , Sₙ and Sᵢ ϵ Vᵢ where Vᵢ refer to the vocabulary source. The the encoder output vectors and the source embeddings. The
vocabulary consists of a number of unique tokens, included a addition of the source embeddings helps to better retain
padding token as start-of-sentence and end-of-sentence tokens information about the source tokens. The context vector xₙ for
to make the input sentences have equal length, and the Out of each layer is linearly mapped of cₙ. The output vector of the
Vocabulary (OOV) token used during inference to replace any decoder layer is the summation of xₙ, yₙ and the previous layers
character or word outside the training dataset. The current output vector gₙ. The final decoder layer output vector gₙ is
model will use word embedding, each Wᵢ word in the source linearly mapped to dₙ, where dₙ donate to dropout [26], and it is
sentence mapped to row vector Cᵢ initialized and distribution in applied before each layer on encoder and decoder network, also
random uniform between 0 and 1. Then use pre-trained word embeddings and decoder outputs. Then mapped the decoder
level for each word embedding by initializing the source and output vector to the softmax and target vocabulary size V to
target words from a large Arabic corpus. We will use FastText computed the target word probabilities. Moreover, to improve
[18], for word representations which extend to Word2Vec [19] the sentences correction and fluency without changing its
by adding sub-word information to the word vector to overcome original meaning will use incremental training to get a fluent
the challenge of the small amount of dataset. Also, to deal with sentence. Then, we allow the model to predict n-best outputs S₁,
the rare words into the parallel Arabic corpus we use a Byte Pair ..., Sₙ given a correct sentence on the first training round. We
Encoding (BPE) algorithm to split the word into multiple also, compare the correctness and fluency of each output to its
frequent sub-words. This kind of word embeddings calculated correct version. If the output sentences fluency score is lower
by representing a word as a set of characters N-grams and than its correct sentence, we call it a disfluency candidate and
merging the skip-gram embeddings of these characters n-gram we will use the top two results as new input pairs to our model.
sequences using the fastText tool. The architecture of the The best sequence of target words for Arabic language is
proposed model consists of nine convolutional encoder and obtained by a right-to-left beam search. In a beam search the
nine decoder layers, as shown in Figure 3. top d candidates at every decoding time step are retained. The
A. Encoder top scoring retrieved candidate at the end of the beam search it
will be the correct hypothesis.
The input embeddings are obtained from the previous step
which is the pre-trained word level embedding using matrices
that are trained before. The first encoder layer is a convolutional C. Dataset
filter's response to map every sequence of the consecutive input The most appropriate dataset of this project is Qatar Arabic
vectors to a feature vector f. Special tokens are shown in figure3 Language Bank¹ (QALB) is created as a part of a collaborative
<start> and <end> denotes to the beginning and end of the project between Columbia University and the Carnegie Mello
input sentences, and usually used after convolution operations University Qatar, funded by the Qatar National Research Fund.
to make sure the returned output vectors as the same number of The data comes from online commentaries written to Aljazeera
input tokens number. The output of each convolutional articles. The training data contains 2 million words, the
operation will follow by a non-linearity using gated linear units development and the test data contain about 50,000 words. The
(GLU) [25], where GLU used to reduce vanishing and make the data was annotated and corrected by native Arabic speakers. In
process faster. The input vectors for encoder layer will be added our research, we used the release of QALB at ANLPACL 2015
as residual connections layers, and the output vector of the final which includes data sets of native and non-native Arabic
encoder layer is linearly mapped to get encoder network output speakers. QALB corpus is provided in three subsets are:
vector. training, development, and test are respectively used as the file
extension.

B. Decoder VI. RESULTS


Now at the decoder, first pad the beginning-of-sentence In this stage of the project, we applied the proposed model
marker and the previously generated tokens as the same way at for the testing set of the QALB corpus, results have been shown
the encoder source token. Each embedding linearly mapped in Table 2. Our model without any extra knowledge of NLP and
passed as input to the first decoder layer, and each decoder GER processing achieved 40.6 in F0.5. The results will be
layer consists of convolution operations followed by non- improved by applying both pre-trained embedding and BPE
linearities GLU, as performed on the previous decoder layers

Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 05:11:25 UTC from IEEE Xplore. Restrictions apply.
algorithm as proposed in the previous section. The BPE and Resources and Tools for Building MT, 2003, pp. 1-8: Association for
Computational Linguistics.
Emb algorithms increase the ability to generate unknown word.
[9] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, "How to construct deep
recurrent neural networks," arXiv preprint arXiv:1312.6026, 2013.
TABLE 2 [10] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural
The result of test QALB dataset computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[11] P. J. Werbos, "Backpropagation through time: what it does and how to
P R F1 do it," Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, 1990.
[12] M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural
70.23 72.10 71.14 networks," IEEE Transactions on Signal Processing, vol. 45, no. 11, pp.
2673-2681, 1997.
[13] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning
Due to time suppress, we couldn’t apply the whole model
with neural networks," in Advances in neural information processing
architecture to get the final results and this is the primary systems, 2014, pp. 3104-3112.
results. Moreover, we have to increase the amount of the dataset [14] M. Costa, "Probabilistic interpretation of feedforward network outputs,
and apply the incremental training to achieve our goal for the with relationships to statistical prediction of ordinal quantities,"
International journal of neural systems, vol. 7, no. 05, pp. 627-637, 1996.
task of Arabic GEC.
[15] Y. Jia et al., "Caffe: Convolutional architecture for fast feature
embedding," in Proceedings of the 22nd ACM international conference
VII. CONCLUSION AND FUTURE WORK on Multimedia, 2014, pp. 675-678: ACM.
[16] K. Kukich, "Techniques for automatically correcting words in text,"
In this paper, we have presented the importance and Acm Computing Surveys (CSUR), vol. 24, no. 4, pp. 377-439, 1992.
uniqueness of the Arabic language, and also different [17] K. Shaalan, "Rule-based approach in Arabic natural language
challenges faced by the ANLP. We have performed a simple processing," The International Journal on Information and
survey of the most famous neural network models and Communication Technologies (IJICT), vol. 3, no. 3, pp. 11-19, 2010.
[18] K. F. Shaalan, "Arabic GramCheck: A grammar checker for Arabic,"
algorithms, including RNN, BRNN, and CNN, covered their Software: Practice and Experience, vol. 35, no. 7, pp. 643-665, 2005.
respective characteristics, advantages, and disadvantages. Then [19] D. Watson, N. Zalmout, and N. Habash, "Utilizing Character and Word
identified which algorithms we would use and why. Embeddings for Text Normalization with Sequence-to-Sequence
Furthermore, presented the modern state-of-the-art Models," arXiv preprint arXiv:1809.01534, 2018.
[20] S. Ahmadi, "Attention-based encoder-decoder networks for spelling and
Grammatical Error Detection and Correction approaches for grammatical error correction," arXiv preprint arXiv:1810.00660, 2018.
Arabic and English language. We have also discussed the [21] N. Zalmout and N. Habash, "Don’t throw those morphological analyzers
limitation of the previous studies and the motivation of the away just yet: Neural morphological disambiguation for Arabic," in
Proceedings of the 2017 Conference on Empirical Methods in Natural
current study. Furthermore, we present our proposed model,
Language Processing, 2017, pp. 704-713.
Arabic GEC based Multi convolutional layers (nine layers), [22] M. Nawar, "CUFE $@ $ QALB-2015 Shared Task: Arabic Error
encoder-decoder with an attention mechanism. In addition, we Correction System," in Proceedings of the Second Workshop on Arabic
have presented how to implement incremental and multi-round Natural Language Processing, 2015, pp. 133-137.
[23] A. Vaswani et al., "Attention is all you need," in Advances in neural
training model using parallel corpus to get more accuracy and
information processing systems, 2017, pp. 5998-6008.
fluently results. Then, evaluate our model using QALB 2015 [24] T. Ge, F. Wei, and M. Zhou, "Reaching Human-level Performance in
test set to approve our hypothesis against the previous models. Automatic Grammatical Error Correction: An Empirical Study," arXiv
To the best of our knowledge, the proposed model will be the preprint arXiv:1807.01270, 2018.
first project investigated using incremental training and [25] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin,
"Convolutional sequence to sequence learning," in Proceedings of the
correcting Arabic grammatical errors based on convolutional 34th International Conference on Machine Learning-Volume 70, 2017,
neural networks. pp. 1243-1252: JMLR. org.
[26] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
Salakhutdinov, "Dropout: a simple way to prevent neural networks from
REFERENCES overfitting," The Journal of Machine Learning Research, vol. 15, no. 1,
pp. 1929-1958, 2014.
[1] G. Khan, M. P. Streck, and J. C. Watson, The Semitic languages: An
international handbook. Walter de Gruyter, 2011.
[2] Istizada. (2019). Complete List of Arabic Speaking Countries Available:
http://istizada.com/complete-list-of-arabic-speaking-countries-2014/
[3] K. C. Ryding, A reference grammar of modern standard Arabic.
Cambridge university press, 2005.
[4] S. L. Marie-Sainte, N. Alalyani, S. Alotaibi, S. Ghouzali, and I. Abunadi,
"Arabic natural language processing and machine learning-based
systems," IEEE Access, vol. 7, pp. 7011-7020, 2019.
[5] A. A. Al-Ajlan, H. S. Al-Khalifa, and A. S. Al-Salman, "Towards the
development of an automatic readability measurements for Arabic
language," in 2008 Third International Conference on Digital
Information Management, 2008, pp. 506-511: IEEE.
[6] W. Zaghouani et al., "Large scale arabic error annotation: Guidelines
and framework," 2014.
[7] H. Hasanuzzaman, "Arabic language: characteristics and importance,"
The Echo. A Journal of Humanities & Social Science, vol. 1, no. 3, pp.
11-16, 2013.
[8] B. Babych and A. Hartley, "Improving machine translation quality with
automatic named entity recognition," in Proceedings of the 7th
International EAMT workshop on MT and other Language Technology
Tools, Improving MT through other Language Technology Tools:

Authorized licensed use limited to: University of Wollongong. Downloaded on May 31,2020 at 05:11:25 UTC from IEEE Xplore. Restrictions apply.

You might also like