0% found this document useful (0 votes)
128 views

PCX - Report 1

The document discusses developing a grammar error detection and correction model for the Amharic language using deep learning. It notes that existing Amharic grammar checkers have limitations and do not enable both error detection and correction. The proposed model would enhance morphology-based tagging, design an architecture for Amharic grammar error detection and correction using a suitable deep learning algorithm, and evaluate the model's performance on accuracy and other metrics. The goal is to develop a robust method to detect and correct syntax errors in Amharic texts.

Uploaded by

Molalegn Bezie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

PCX - Report 1

The document discusses developing a grammar error detection and correction model for the Amharic language using deep learning. It notes that existing Amharic grammar checkers have limitations and do not enable both error detection and correction. The proposed model would enhance morphology-based tagging, design an architecture for Amharic grammar error detection and correction using a suitable deep learning algorithm, and evaluate the model's performance on accuracy and other metrics. The goal is to develop a robust method to detect and correct syntax errors in Amharic texts.

Uploaded by

Molalegn Bezie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Plagiarism Checker X Originality Report

Similarity Found: 11%

Date: Thursday, August 18, 2022


Statistics: 607 words Plagiarized / 5703 Total words
Remarks: Low Plagiarism Detected - Your Document needs Optional Improvement.
-------------------------------------------------------------------------------------------

Chapter-1: Introduction Background The advancement of computer technology has a


significant influence on the world, frequently causing large changes in human activity
within human communities. Written languages and communication play essential roles
in bringing about such dramatic changes. Written language communication happens
through writing assisted by computers, the internet, and mobile in addition to
conventional paper and pen. However, since most of the information is presented in an
unstructured text format to produce meaningful insight, it is critical to use Natural
Language Processing (NLP) approach (Gelbukh, 2010; Liddy, 2001).

Natural language processing (NLP) is a multidisciplinary field that includes computer


science, artificial intelligence, and computational linguistics. It is essential to generate
concepts, find ways, process, and analyze large amounts of data using computational
mechanisms (Larabi Marie-Sainte et al., 2019). Its purpose is to bridge the gap between
human and computer communication using natural language. There are many Natural
Language Processing applications, including Machine Translation, Question and
Answering, Part of Speech Tagger, and Morphological Analyzer. Another example of a
Natural Language processing application is a grammar checker.

The task of a grammar checker is to correct grammatical faults in written text. The aim is
to create a system that takes an input text, analyzes the context to identify/detect
grammatical errors, and then gives corrected text (Yuan, 2017). When writing, people
might make grammatical errors due to a lack of grammatical knowledge of the
language. Many grammar checker software has developed for many languages using
various methodologies to assist users in solving such grammatical problems. Some
grammar checker approaches are Pattern matching, syntax-based, rule-based, statistical,
Deep learning, and hybrid(Tesfaye, 2011).
Pattern matching is a primitive method that uses data storage to store common
grammar errors and their corresponding corrections (Henrich & Reuter, 2009). A
sentence compared to an error entry in the data storage. Error is detected if the match is
found and can be corrected using the data storage correction. In syntax-based approach
(Henrich & Reuter, 2009). a text is correctly morphologically and syntactically analyzed.
A lexical database, a morphological analyzer, and a parser are all required. The parser
used to generate Each sentence is syntactic structure. The text is incorrect if the parsing
fails.

The rule-based is a common approach for detecting easy-to-find errors using simple
rules. Rules for detecting errors become more complicated as sentences become more
complex. The statistical approach assumes that a text's grammar can be corrected using
only large amounts of text to form a statistical database detect errors (Henrich & Reuter,
2009). In this technique, two approaches can be employed to achieve the purpose of
fixing grammar. One compares the data directly to the text that needs to be corrected.
The second generates grammar to check and parse the text from the statistical data.

The hybrid grammar checking combines the existing approach to make the grammar
checker more robust and efficient. Recently most Grammar Checkers developed using a
deep learning approach. It does not need a feature engineer significantly to extract
features automatically. A wide range of studies on grammar checkers for several
languages, including English (Jannya & Pv, 2020), and Chinese (Yuan, 2017), has been
suggested and conducted. The grammar checker is dependent on the language it is
correcting. As a result, it may be impossible to utilize a grammar checker built for one
language for some other languages.

The morphological structure of the different language has different (Strahan, 2009). As a
result, it is significant to develop grammar error detection and correction for the
Amharic language. Grammar Checker systems are significant for writing and preparing
error-free documents for a given language. It is helpful in different scenarios, such as
text writing and language learning (Gidey, 2021). And also, it is significant for upstream
NLP applications as part of their process. In addition, it is significant to deliver or convey
meaningful information to the reader.

Therefore, we propose Amharic grammar error detection and correction model using a
deep neural network. Motivation Amharic uses the Fidel script having 33 fundamental
characters and 7 vowels. It is the working language of Ethiopia’s Federal Democratic
Republic and some regional and local administrations, including Amhara,
Benishangul-Gumuz, Gambella, the Southern Nations Nationalities and People Region
(SNNPR), Addis Abeba, and Dire-Dawa (Gobena, 2010). As a result, Amharic documents
are highly increased from time to time. to prepare such a document, it is necessary to
have Amharic grammar error detection and correction. Many people speak Amharic as a
second language.

As a result, they are prone to grammatical errors. Furthermore, Amharic grammar error
detection and correction is a critical component of many NLP applications such as
question answering, machine translation, and information extraction. There are works on
Amharic grammar checkers (Aynadis Temesgen, 2013; YARED, 2020). This work does not
correct a given Amharic grammar error text. Manually correcting grammatical errors
takes time for many documents. This problem motivated us to study and investigate
grammar error detection and correction in the Amharic language. Statement of the
problem A grammar checker has been developed for various languages such as English
(Grundkiewicz et al.,

2019), Tigrigna (Gidey, 2021), Afan-Oromo(Tesfaye, 2011), and Chinese (Yuan et al.,
2019). Natural languages have differences in morphology and grammar. It is impossible
to apply a grammar checker from one language to another. The language's morphology
and grammatical nature make it difficult to adapt existing grammar checker tools from
other languages. There are works on Amharic grammar checkers using the rule-based
and statistical-based approaches (Aynadis Temesgen, 2013). But this work does not
work for all errors due to the incompleteness of the rule. (YARED, 2020) conducted using
a deep learning approach. However, they used a corpus-based morphology-tag
extraction; it does not work for tokens not found in the corpus.

Their works focus on grammar error detection and didn’t consider error correction. To
the best of our knowledge, there are no works on grammar error detection and
correction model developed for the Amharic language. Therefore, the problem that this
research work tries to address is how to design a model for Amharic grammar error
detection and correction with automatic generation of morphology-based part of
speech tagger. Finally, the study answers the following research questions.

How to enhance morphology-based tag generation for grammar error detection? What
deep learning approach is suitable for grammar error detection and correction? What is
the performance of the proposed model? Objective of the study General Objective The
main objective of this research work is to develop a grammar error detection and
correction model for Amharic language using deep learning approach. Specific
Objectives The specific objectives of this research work are: To collect, prepare and
annotate a dataset for training and testing. Enhance morphology-based tag generation
for grammar error detection.
To design architecture for Amharic grammar error detection and correction. Select
Suitable Algorithm for Grammar Error detection and Correction Evaluate the
performance of the proposed model. Methods Literature Review The literature review
will be done to identify problems, find the gaps, understand the state of art in a
grammar checker, and other related concepts to our work. Furthermore, Discussion will
be made with an Amharic linguistics expert to detect Amharic grammar error sentences
and how to correct them.

Data collection Analyzing grammar errors and their related corrected text will be
required to design the Amharic grammar error detection and correction model. To
accomplish so, data will be gathered from many sources such as websites, journals,
educational publications, and other sources to better understand the features of
grammar errors and their corrected text. Development Tools Several tools will be
required to meet the research goals. we use python programming as a development
tool. For morphological analysis, Horn Morpho (Gasser, 2011) will utilize. different
Amharic semantic models (S. M. Yimam et al.,

2021) will be used part of speech tagger and named entity recognition tools as part of
our work. Evaluation
Finally, a performance evaluation of the proposed model will be conducted by
comparing the system’s result with their corresponding manually annotated text (with
the help of linguistics). To evaluate the performance, we will use accuracy, precision,
recall, F-Score, and BELU Score as evaluation metrics. Scope of the study The scope of
this research is on the development of Amharic grammar error detection and correction.

Mainly focuses on syntax errors (agreement errors like subject-verb disagreement,


object-verb disagreement, adjective-noun disagreement, and adverb-verb
disagreement); moreover, this work will use only Amharic textual documents from
different sources. However, semantic errors, incorrect word order, and missing word
grammatical errors are out of the scope of this study. Significance of the study As a
grammar error detection and correction, it provides the corrected text from a given
natural language error text.

Deep learning-based Amharic grammar error detection and correction model can be
applicable in generating the corrected Amharic text for subject-verb, object-verb,
adjective-noun, and adverb-verb errors from a collection of documents. In this regard,
users can write and prepare error-free documents. This research helps the development
of other higher-level natural language processing applications like question answering,
machine translation, and sentiment analysis that require a grammar checker tool as part
of their processes.

It can also apply in academic areas for language learning like elementary, high schools,
and universities. Organization of the Thesis The whole part of this thesis is five chapter
including this chapter. The rest of the thesis is organized as follows. Chapter two
presents a detailed explanation and discussion of the literature review and related work.
Chapter three presents the design of the Amharic grammar error detection model using
a deep neural network. Chapter Four presents the result and discussion. Finally, the last
chapter describes a conclusion and recommendations.

/ Figure 1_1 : Organization of The Thesis


Chapter-2: Literature Review Overview The concept of a grammar checker, as well as
theoretical concepts or ideas related to Amharic language and grammar checkers, are
presented in this chapter. The chapter begins with a brief introduction to the Amharic
language, followed by sections on Amharic Parts of Speech, Amharic Morphological
Characteristics, Grammatical Structure, and Agreement Errors. This chapter also covers
the many methodologies used in grammar checkers, ranging from rule-based to deep
learning. Finally, we will discuss related efforts on grammatical checkers in local and
foreign languages.

Overview of Amharic language Amharic is a Semitic language spoken in Ethiopia and


classified as an Afro-Asian language. The Amharic language is the working language of
the federal government of Ethiopia. It has the working language for commercial and
administrative sectors in all cities and towns in Amharic. Many people from different
ethnic groups including Amhara speak Amharic as a first language (Gobena, 2010). The
Amharic language has its writing system called Fidel (???). Fidel is a writing system of
Amharic which contains consonants and vowels.

The Ethiopian alphabet contains seven (7) vowels and thirty-three (33) fundamental
shapes represented by consonants followed by vowels. The alphabet of Amharic letters
is appearing in a grid format; the consonants are seen vertically, and the vowels are
horizontal. The written system of Amharic is from left to right. The Amharic script is not
speaking; it is an alphabet. But, we can say it is a syllabary, which means each letter
represents the whole syllable. So, using these systems anyone can easily understand the
Ethiopian alphabet (Tessema, 2014).

Amharic Part of Speech Part of speech, according to the dictionary, is a category to


which a word is given based on its meaning, form, or grammatical function. The noun,
verb, adjective, adverb, pronoun, conjunction, preposition, and interjection are all
common parts of speech in Amharic languages (B. Yimam, 2000). Nouns
Amharic Noun is one of the word classes of the Amharic language and is mostly used to
identify or label a person, thing, or place (B. Yimam, 2000).

For example, for a person (???????????), for things (???????), for a place (??? ?????? ???)
(Tessema, 2014; B. Yimam, 2000). Adjective
An adjective is a word that tells a piece of additional information to a noun. An adjective
is a word that adds to the meaning of a noun by providing additional information. A
noun in Amharic is preceded by adjectives. The number and gender of Amharic
adjectives are inflected. These are some examples of adjective words(??? ? ??? ? ??)
(Tessema, 2014; B. Yimam, 2000). The Amharic pronoun is a part of the word class.
Is it also a noun subclass that allows us to employ a pronoun instead of a noun?
Personal, demonstrative, possessive, and interrogative pronouns are the types of
pronouns. interrogative pronouns: -????????????????. demonstrative pronouns: -
????????????????????. possessive pronoun: -???????? ??? ????? ????????? ?????. personal
pronouns 1st person (?????), 2nd (????????????), person 3rd person (??????????)
(Tessema, 2014; B. Yimam, 2000). Verbs
A verb is a word that belongs to a group of words that describe an action or a state.
imperative (???? ? ???), imperfective (???? ? ?????? ), Perfective (?? ? ???), gerundive verbs
(??? ? ?????), and more Amharic verbs exist (Tessema, 2014; B.

Yimam, 2000). Adverb


An adverb is a word that modifies verb as in terms of Place (??? ???? ???), Time (??? ?????
??? ???? ???) (Tessema, 2014; B. Yimam, 2000). Preposition
Prepositions are words that are used to link pronouns, nouns, or phrases to other words.
Most of the time prepositions are short in length and they are placed directly in front of
nouns. For example (??????), etc. for example ?? ??????? ?? ???? ??? ?? ?? ??? ” (Tessema,
2014; B. Yimam, 2000). Conjunctions
Prepositions are words that connect pronouns, nouns, and phrases to one another.

The majority of the time, prepositions are brief and positioned directly in front of nouns.
For instance (? ? ? ? ?? ??) (Tessema, 2014; B. Yimam, 2000). Amharic morphology
Morphological analysis is the process of finding the smallest unit of words like root,
stem, and other (Aynadis Temesgen, 2013). An Amharic morpheme can be free or
bound; a free morpheme can stand alone as a word, whereas a bound morpheme
cannot (Naber et al., 2003). In Amharic, words can be constructed from morphemes in
two ways. These are by inflection and derivation.

Inflectional Morphology is the study of the conjunction of a word and a morpheme,


which generally results in a word of the same class as the original stem and having the
same syntactic function. Inflection can be performed by marking a word category for
gender, number, case, definiteness, aspect, and politeness (Gebremariam, 2017; Kassa,
2018). Derivational morphology is the study of how words are formed from morphemes
through processes such as affixation and compounding. Amharic Sentence A sentence is
a set of words that conveys a complete statement, inquiry, exclamation, or command.

It contains a subject and predicate. a sentence provides a complete thought or idea by


answering all or some of the following questions. ‘What?’, ‘Where?’, ‘who’, ‘When?’, and
‘How?’. The person or object performing the activity or being described is the subject of
a sentence. The rest of the sentence, on the other hand, tells us what the subject does or
is, and includes the verb and object of the statement. A person or thing affected by a
verb's action or involved in the consequence of an action, or a noun, pronoun, or noun a
phrase that describes that person or thing, is the object of a sentence.

Amharic sentences can be classified into simple, and complex sentences(Tessema, 2014;
B. Yimam, 2000). Simple sentence A simple sentence has only one subject and verb
phrase and conveys a complete notion. Declarative, negative, interrogative, and
imperative sentences are examples of basic sentences. Declarative sentences are used
to express thoughts, feelings, and opinions. The speaker wants to communicate certain
circumstances, events, activities, etc., which could be mental or physical, actual or
hypothetical. For example, ??? ?? ????? ??? / “alemu wede bahirdar meta”/ “Abebe came
to Bahir Dar”.

An interrogative sentence is a sentence that asks questions regarding the subject,


complement, or action the verb describes. It is typically written with the pronouns.
“??/man/who","??/min/what","??/yet/where","????/endet/how","??/meche/when", and
"???/lemin/why". For example “????? ???? ???? ?/asmemelash menehen amemeh?/What
makes you sick?”. A negative sentence is a declarative sentence that is expressed
negatively as a phrase. For example, ???? ????? ?????? /mesfn tmhrt algebam/Mesfin did
not attend school.”. An imperative sentence issues a request or an order. The verb's
suffix implies that the subject, a second-person pronoun, is being referred to. For
example, ????? ????!/debterun ansaw/ Pick up the notebook”.

Complex sentence A complex sentence, on the other hand, is made up of one or more
noun phrases, adjective phrases, and verb phrases that are combined in a
simple-complex, complex complex, or complex-complex form. For example: "??? ?? ???
???? ??? ?? ??? ?? ??? ??? ??? ???????/tewat lay zenab zenebe , ahun gin trt yale semay
begna lay anetsebareke./„It rained in the morning, but now a clear sky was shining
above us. Common Grammar Errors The majority of languages have grammatical rules
and restrictions, and Amharic is no exception. It is fairly simple for others to understand
what one writes if they are familiar with the language's rules.

However, for the user when writing, especially for non-natives, knowing all the rules may
be very challenging. As a result, distinct errors are made when Amharic writings are
written. The majority of the groups of grammar mistakes addressed in this section occur
often in other languages. This section aims to highlight the mistakes found in Amharic
writings. Linguistic specialists are used, to identify clusters of grammar errors. In Amharic
sentences, subject-verb, object-verb, adjective-noun, and adverb-verb agreements are
frequently used(Tessema, 2014).

Subject-verb Disagreement A subject is the part of a sentence that clearly shows what
the sentence is about or who or what performs the action. Typically, the subject is a
noun, a noun phrase, or a pronoun. A subject in Amharic agrees with the verb in person,
gender, and number and appears at the start of a sentence (Tessema, 2014). For
example, in the sentence “???? ?? ?????? ?? ? /aster ?? university hide/Aster has gone to
university.” in this sentence the subject: “????/aseter/ is singular feminine 3rd person.
Similarly, the verb ??/gone is singular masculine 3rd person. the morphological structure
of the two words does not much since the subject is feminine while the verb is
masculine.

If one of the morphological properties doesn’t agree with the subject and the verb it is
said to be subject-verb disagreement. Object-verb Disagreement An object is a noun,
noun phrase, or pronoun that is affected by the verb's action. In person, number, and
gender, the object of a sentence should agree with the verb(Tessema, 2014). For
example, ? ?????? ??? ?? ???? ?????? the object “???” is singular and the verb “?????” is
plural; so, it does not agree with number such kind of disagreement is Object-verb
disagreement.

Adjective-Noun Disagreement Modifiers are words or phrases that are used to elaborate
on something in a sentence. Adjectives that modify nouns and adverbs that modify
verbs are examples of modifiers. Adjectives in Amharic appear before the noun they
modify. In terms of number and gender, the Amharic Adjective should agree with the
noun it modifies (Tessema, 2014). Adverb-Verb Disagreement Adverbs in Amharic are
divided into subclasses such as adverbs of time, place, circumstance, and so on. The
time adverbs describe the time when an event occurs. These adverbs may indicate the
precise time or duration of a given action. In the adverbs, Amharic verbs indicate the
time at which action occurs.

One of the most common Amharic grammatical errors is the time adverb and tense
disagreement. For the verb, the appropriate adverb should be used, and vice versa. In
Amharic, an adverb modifies a verb or expresses its action in terms of time (B. Yimam,
2000). The agreement between the adverb and the verb in Amharic is concerned with
time because an adverb is not inflicted (Yifru, 2010). Grammar Checker and approaches
Grammar checker Grammar is a set of structural rules that direct how sentences,
phrases, clauses, and words are formed in a given natural language processing system
(Aynadis Temesgen, 2013).

The flow of information must be grammatically correct to convey and exchange


information via text or other communication tools. As a result, natural language
processing must be automated. NLP has a variety of applications; one of these
applications is grammar checking. Grammar checking is the process of determining the
correctness of a text or whether it is correct or incorrect. A correct sentence is one in
which the related words within the phrase follow the norms of number, person, gender,
and tense agreement. Software that checks the syntax of a certain language that violates
its grammatical rules is called a grammar checker.

Approaches to Grammar Checker Several studies on grammar checkers have been


conducted for a variety of languages. These grammar checker studies take various
approaches. The most common approaches for developing grammar checkers are
statistical, rule-based, deep learning, and hybrid grammar checkers (Aynadis Temesgen,
2013). Rule Based Grammar checker The traditional technique requires explicitly
designing handmade features for rule-based grammar checking (Naber et al., 2003). The
input text is checked by manually developed rules in a rule-based grammar checker
technique.

This method involves the use of language specialists to create guidelines. The advantage
of a rule-based grammar checker approach is that it is simple to add, change, or remove
a rule, it offers a descriptive error notice, and it does not require the training data that
grammar checkers require. The disadvantage of a rule-based grammar checker
technique is that it takes time to develop rules and requires linguistic specialists for each
language to create each rule manually.

Many studies have been conducted utilizing rule-based grammar checking systems for
many languages, including English (Naber et al., 2003), Amharic (Aynadis Temesgen,
2013), and Afaan Oromo (Tesfaye, 2011). Statistical-Based Grammar Checker The
statistical grammar checker technique is also known as the data-driven grammar
checker approach or the machine learning-based grammar checker approach. A
statistical-based grammar checker technique, as opposed to a rule-based grammar
checker approach, uses a training corpus to learn what is right rather than utilizing a
manually constructed rule.

The corpus may be compiled manually or automatically from a variety of sources,


including journals, periodicals, newspapers, and other internet sites(Aynadis Temesgen,
2013). This method uses N-gram models to examine the grammatical structure of a
word sequence. A statistical technique generates a sequence of tags from a tagged
corpus. This approach examines a tag sequence in a sentence; if the sequence is familiar
or typical, the sentence is accurate; otherwise, if the tag sequence is odd or rare, the
sentence is erroneous.

It is difficult to give precise faults caused by these systems using this method. Many
different sorts of studies are being undertaken for many languages, including afa-oromo
(Desalegn, 2015), English, and Bangla (Alam et al., 2006). Deep learning Deep learning is
a discipline of machine learning that contains a successive layer or is primarily
concerned with methods motivated by brain structure and function, also known as an
Artificial neural network (Brownlee, 2019). Deep learning is a type of machine learning
technique that uses a large or deep neural network. In comparison to previous
classical machine learning methods, deep learning is learned using huge amounts of
data.

Deep learning can outperform classical machine algorithms when dealing with large
amounts of data (Brownlee, 2019). Deep learning can learn from labeled data and
automatically extract characteristics. The word 'deep' in deep learning represents the
number of hidden units, which indicates that many hidden layers arise in the network
depending on the nature of the issue. The number of hidden layers in typical machine
learning algorithms is minimal, for example, one or two, however in deep learning, the
number of hidden layers maybe 100, 150,200, and others (Brownlee, 2019).

For diverse NLP applications, many deep learning methods are employed. For many NLP
applications, the most often utilized algorithms are Recurrent Neural Network,
Convolutional Neural Network, and Deep Belief (Brownlee, 2019). This thesis focuses on
the algorithms employed in this investigation as well as other related ideas. Related
work (Chollampatt & Ng, 2018) Conducted research on A Multilayer Convolutional
Encoder-Decoder Neural Network for Grammatical Error Correction. This study aims to
improve the automatic correction of grammatical, orthographic, and collocation errors
in text using a multilayer convolutional encoder-decoder neural network.

The authors used the Edit operation and Language model feature for rescoring the
hypothesis. The authors used publicly available datasets from prior work, Lang-8, and
NUCLE to prepare parallel corpora. (Yuan, 2017) conducted research on English
grammar checkers. the authors proposed Grammatical error correction using neural
machine translation. The authors applied bidirectional RNN and an attention-based
model. The authors used a publicly available FCE dataset to develop the system. Finally,
the result shows an F0.5 score of 53.49%. However, they do not consider semantic error
correction. (Wang et al., 2020) conducted research on Chinese grammar checkers.

The authors proposed a Chinese Grammatical Correction Using a BERT-based


Pre-trained Model. They used the Original BERT-based and Chinese Robert WWM-ext
model to develop a model. The authors also use a transformer for correction. Finally,
they employed BERT-encoder and BERT-fused models to detect and correct types of
errors. However, the authors used the whole mask word to correct the error of the
sentence level, and as a result, the sentence was reconstructed, which lost the original
meaning of the sentence. (Desalegn, 2015) researched Afaan-Oromo grammar checkers.

The authors proposed a statistical afaan Oromo grammar checker to identify incorrect
AfaanOromo text. The authors applied two statistical-based techniques I.e token n-gram
and tag n-gram. The authors have used a total of 85 sentences. Finally, the evaluation of
the proposed system using token n-gram is a recall of 100%, the precision of 78.1%, and
F-measure of 89.0%, and the performance of the tag n-gram technique in identifying
incorrect sentences is a recall of 86%, precision of 82.6%, and F-measure of 84.3%.
However, the limitation of the proposed method is it does not identify which words
make the sentence incorrect.

(Gidey, 2021)conducted research on conducted research work on the Tigrinya


grammar checker. The authors proposed a dependency-based Tigrinya grammar
checker. This study aims to detect grammatical errors and make probable suggestions
for correct sentences for error sentences. The authors used different modules such as
the text processing module, language dependency module, dependency extraction
module, and grammar checker module. They use a total of 312 sentences with 1248
tokens.

Finally, the proposed system is evaluated and achieved an accuracy of


92.09%. However, the limitation of this proposed work is the authors does not work
syntactically correct and semantically incorrect sentence. (Gebremariam, 2019) has
designed and developed an Amharic grammar checker. The
authors proposed a dependency-based Amharic grammar checker. The main goal of
this work is to detect grammatical errors in Amharic sentences. In this study, the
authors used three parts ConNll-U Formatter, Dependency parser, and the grammar
Checker. Finally, they reported the performance of the proposed system is evaluated
with MaltEval 1.0, and the result shows 68.18% subject-verb agreement, 20%
adverb-verb agreement, 81.25% object-verb agreement, and also the tokenizer
performs
100% and the tagger performs 43%. However, the main limitation of the proposed
methods is that it does not detect semantic grammar errors.

(YARED, 2020) proposed Deep learning-based Amharic grammar error detection.


The main goal of this study is to detect the grammatical incorrect Amharic text. They
applied two deep learning algorithms to build the model. Long short-term memory
and bidirectional Long short-term memory are trained and evaluated with a total of
3,881 sentences having 50,000 tokens prepared by the authors. The authors have
prepared three corpora morphologically Syntactically tagged corpus, syntactically
tagged tokens, and grammar error detection corpus.
Finally, they reported the
performance for the first model using long-short-term memory the result shows an
accuracy of 88.27%, and for the second model using bidirectional long short-term
memory, the result shows an accuracy of 88.89%. However, the proposed solution has
a limitation on making a probable suggestion or correcting the incorrect Amharic text.
Summary There are works on local and foreign languages for grammar checkers which
can't correct Amharic grammar errors since the morphological and grammatical
structure differences.

To the best of our knowledge, there is no work done to correct incorrect text. Thus, this
study aims to develop an automatic morphology pos tag and grammar error detection
model.
Chapter-3: Methodology Overview In this chapter, the design and development of the
Amharic grammar error detection and correction model using a deep neural network
will be presented. The first section describes the main components and the general
architecture of the proposed system. The next section presents document preprocessing
for detection and correction.

The third section covers strategies and algorithms implemented in grammar error
detection and correction. Proposed Model general Architecture The general architecture
of the proposed system is shown in Figure 1. The proposed system has different
sub-components that are preprocessing, word Embedding, bidirectional long short-term
memory Recurrent neural network, grammar result, Attention-based neural machine
translation, and the corrected version of a given text. The input text is the annotated
dataset and it is going to be preprocessed using preprocessing component. Under
preprocessing module tokenization, normalization, morphological-based tag
generation, tag splitting, and sequence padding is done.

After text preprocessing the next component is word embedding which allows
converting text to numeric vectors for the development of the proposed model. After
the input is ready for the model, we trained the proposed grammar error detection and
correction model. In grammar error detection we have used the Bi-LSTM algorithm and
for grammar error correction we have used attention-based Neural Machine translation.
After the developed model was evaluated using performance metrics. Finally, the system
displays the corrected version of the text.

Text Preprocessing Tokenization Morphology Based tag Generation Tag splitting


Padding One hot representation Word Embedding Model Development Bi-LSTM for
detection Attention Based NMT for Correction Demonstration
Chapter-4: Results and Discussion Overview Data Collection and Preparation
Development Environment Experimentation Result of Morphology based tag generation
Comparison of the two models for tag generation Experimentation Result of Grammar
error detection Comparison of the two models for grammar error detection
Experimentation Result of Grammar error Correction Comparison of the two models for
grammar error correction Discussion
Chapter-5: Conclusion and Recommendation Conclusion Contribution of the study The
main contribution of this thesis work is listed below: We are preparing a dataset of
11,000 morphology-based tag sentence with 20000 tokens. We have prepared 5200
sentence parallel corpus for error detection and correction.

We have designed an automatic morphology-based tag generator and grammar error


detection and correction model. it uses as a tool for upstream NLP application. We
develop Future Work A grammar checker is a complex task, which needs different
downstream NLP tools. In this thesis work, we have designed Amharic grammar error
detection and correction model using a deep neural network that tries to correct a given
Amharic grammar error text.

The following are some recommendations we propose for future work: In this research,
we are considering only syntax error detection and correction. We suggest doing other
types of errors like semantic and punctuation errors. In this study, we have used
automatic morphology tag generation and Named entity recognition. Better results well
are achieved by using an automatic spell checker as part of their process. This work is
done by using BiLSTM and Neural machine translation with and without an
attention-based approach. We suggest doing with another deep neural network
approach like Transformer, BERT.
References Alam, M. J., Uzzaman, N., & Khan, M. (2006).

N-gram based Statistical Grammar Checker for Bangla and English. Ninth International
Conference on Computer and Information Technology (ICCIT 2006), 3–6. Aynadis
Temesgen. (2013). Design and Development of Amharic Grammar Checker (Issue
March). Addis Ababa University. Brownlee, J. (2019). What is Deep Learning? Machine
Learning Mastery.[Online]. https://doi.org/10.4018/978-1-7998-0414-7 Chollampatt, S.,
& Ng, H. T. (2018). A multilayer convolutional encoder-decoder neural network for
grammatical error correction. 32nd AAAI Conference on Artificial Intelligence, AAAI
2018, 5755–5762. https://doi.org/10.1609/aaai.v32i1.12069 Desalegn, A. (2015).

STATISTICAL AFAAN OROMO GRAMMAR CHECKER [Addis Ababa University].


http://eprints.ums.ac.id/37501/6/BAB II.pdf Gasser, M. (2011). HornMorpho: a system for
morphological processing of Amharic, Oromo, and Tigrinya. Conference on Human
Language Technology for Development, August, 94–99. Gebremariam, A. G. (2019).
DEPENDENCY BASED AMHARIC GRAMMAR CHECKER. Addis Ababa university. Gelbukh,
A. (2010). Special issue : Natural Language Processing and its Applications Research in
Computing Science. Proceedings of the 11th International Conference of Intelligence
Text Processing Computational Linguistics. http://ace.cs.ohiou.edu/~razvan/papers/
cicling10.pdf.

Gidey, M. A. (2021). Dependency-based Tigrinya Grammar Checker (Issue March). Addis


Ababa University. Gobena, M. (2010). Implementing an open source Amharic resource
grammar in GF (Issue November). Grundkiewicz, R., Junczys-Dowmunt, M., & Heafield, K.
(2019). Neural grammatical error correction systems with unsupervised pre-training on
synthetic data. ACL 2019 - Innovative Use of NLP for Building Educational Applications,
BEA 2019 - Proceedings of the 14th Workshop, 252–263.
https://doi.org/10.18653/v1/w19-4427 Henrich, V., & Reuter, T. (2009).
LISGrammarChecker: Language Independent Statistical Grammar Checking (Issue
February). papers2://publication/uuid/19D55F4B-E422-4630-BF78-2D8030B5D866
Jannya, V., & Pv, S. (2020). Grammar Error Correction using Seq2Seq.

9(6), 322–325. Larabi Marie-Sainte, S., Alalyani, N., Alotaibi, S., Ghouzali, S., & Abunadi, I.
(2019). Arabic natural language processing and machine learning-based systems. IEEE
Access, 7. https://doi.org/10.1109/ACCESS.2018.2890076 Liddy, E. D. (2001). Natural
language processing. In In Encyclopedia of Library and Information Science (2nd ed.).
NY. Marcel Decker, Inc. https://doi.org/10.1016/0004-3702(82)90032-7 Naber, D.,
Kummert, P. F., Fakultät, T., & Witt, A. (2003). A Rule-Based Style and Grammar Checker.
Strahan, T. E. (2009). Laurie Bauer, The Linguistics Student’s Handbook. Edinburgh:
Edinburgh University Press, 2007. Pp. ix + 387. Nordic Journal of Linguistics, 32(1),
165–174. https://doi.org/10.1017/s0332586509002078 Tesfaye, D. (2011). A rule-based
Afan Oromo Grammar Checker [Addis Ababa university]. In International Journal of
Advanced Computer Science and Applications (Vol. 2, Issue 8).
https://doi.org/10.14569/ijacsa.2011.020823 Tessema, T. T. (2014).

Word Sequence Prediction for Amharic Language. Addis Abeba University, October.
Wang, H., Kurosawa, M., Katsumata, S., & Komachi, M. (2020). Chinese Grammatical
Correction Using BERT-based Pre-trained Model. 0–5. http://arxiv.org/abs/2011.02093
YARED, D. A. (2020). Deep Learning Based Amharic Grammar Error Detection [BahirDar
University]. https://ir.bdu.edu.et/handle/123456789/12747 Yifru, M. (2010).
Morphology-Based Language Modeling for Amharic. August, 9–25. Yimam, B. (2000).
Amharic Grammar. Addis Ababa university Press.
available:https://machinelearningmastery.com/what-is-deep-learning/,Augest 16,2016.
Yimam, S. M., Ayele, A. A., Venkatesh, G., Gashaw, I., & Biemann, C. (2021).

Introducing various semantic models for amharic: Experimentation and evaluation with
multiple tasks and datasets. Future Internet, 13(11). https://doi.org/10.3390/fi13110275
Yuan, Z. (2017). Grammatical error correction in non-native English.
http://www.cl.cam.ac.uk/ Yuan, Z., Stahlberg, F., Rei, M., Byrne, B., & Yannakoudakis, H.
(2019). Neural and fst-based approaches to grammatical error correction. ACL 2019 -
Innovative Use of NLP for Building Educational Applications, BEA 2019 - Proceedings of
the 14th Workshop, 228–239. https://doi.org/10.18653/v1/w19-4424

INTERNET SOURCES:
-------------------------------------------------------------------------------------------
<1% - https://www.coursehero.com/file/113783301/CHAPTER-1docx/
<1% - https://www.sciencedirect.com/topics/engineering/natural-language-processing
<1% - https://monkeylearn.com/natural-language-processing/
<1% - https://oscarmini.com/proofreading-with-grammar-checker-guide/
<1% - https://docs.w3cub.com/ruby~3/syntax/pattern_matching_rdoc.html
<1% -
https://stackoverflow.com/questions/17751443/excel-vba-cant-get-a-match-error-unabl
e-to-get-the-match-property-of-the-wor
<1% - https://christophm.github.io/interpretable-ml-book/rules.html
<1% - https://www.coursehero.com/file/121470250/Journal-Assignmentdocx/
<1% -
https://stats.ijm.org/common/Morphological_Structure_And_Language_Acquisition/?full
display=B380
<1% - https://www.sciencedirect.com/science/article/pii/S1877050918321860
<1% -
https://www.ilo.org/wcmsp5/groups/public/---africa/documents/meetingdocument/wc
ms_319574.pdf
<1% -
https://www.semanticscholar.org/paper/Addis-Ababa-University-College-of-Natural-Sci
ences-Ali-Assabie/7d7852516e80d2ac3b245ec5fe350cdef56beb8d/figure/2
<1% -
https://www.coursehero.com/file/p3la1u3s/Longer-Sentences-are-prone-to-grammatica
l-errors-and-need-careful-thought-and/
<1% - https://www.youtube.com/watch?v=22v8LybJFyA
<1% -
https://ir.bdu.edu.et/bitstream/handle/123456789/12747/Yared%20Final%20Doc.pdf
<1% -
https://www.researchgate.net/publication/344160222_Recent_Trends_in_the_Use_of_Dee
p_Learning_Models_for_Grammar_Error_Handling
<1% -
https://deepai.org/publication/handwritten-optical-character-recognition-ocr-a-compre
hensive-systematic-literature-review-slr
<1% - https://www.coursehero.com/file/102956455/abstract2docx/
<1% - https://www.sciencedirect.com/science/article/pii/S0148296319304564
<1% - https://github.com/muhk01/Speech-Tagging-and-Entity-Recognition
<1% - https://www.sciencedirect.com/topics/neuroscience/natural-language-processing
<1% - https://authorzilla.com/x9ZYv/chapter-4-organization-of-the-thesis.html
<1% -
https://www.coursehero.com/file/p3u53buv/The-chapter-begins-with-a-brief-introducti
on-of-the-theme-and-presents-the-rules/
<1% - https://the-meaning.com/amharic_language.html
<1% -
https://repository.ju.edu.et/bitstream/handle/123456789/6993/412-Article%20Text-2599
-1-10-20220228%20%281%29.pdf
<1% - http://amharicteacher.com/hahu
<1% - https://www.sketchengine.eu/amharic-part-of-speech-tagset/
<1% -
https://www.reddit.com/r/Showerthoughts/comments/2ib76u/the_words_noun_verb_adj
ective_adverb_pronoun/
<1% - https://linguapress.com/grammar/adjectives.htm
<1% - https://brainly.in/question/38314872
<1% - https://www.youtube.com/watch?v=aKwmdZ7hOa4
<1% - https://www.gingersoftware.com/content/grammar-rules/preposition/
<1% - https://www.cs.upc.edu/~gatius/mai-ihlp/morpho2018.pdf
<1% -
https://semanticsmorphology.weebly.com/morphemes-and-their-compositions.html
<1% -
https://www.differencebetween.com/difference-between-inflectional-and-derivational-
morphology/
<1% - http://tabethadawkins.weebly.com/morphology-and-etymology.html
<1% - https://www.coursehero.com/file/61793680/lyonpdf/
<1% - https://www.coursehero.com/file/70864858/Quiz-docx/
<1% - https://www.academia.edu/8343839/sentences
<1% -
https://www.coursehero.com/file/p757ccqu/The-calls-can-be-classified-into-two-types-s
imple-and-complex-Simple-calls-have/
<1% -
https://www.khanacademy.org/humanities/grammar/syntax-sentences-and-clauses/intr
oduction-to-sentences/e/declarative--interrogative--and-imperative-sentences
<1% - https://www.masterclass.com/articles/interrogative-sentence-guide
<1% - http://facultyweb.ivcc.edu/rrambo/eng1001/sentences.htm
<1% - https://www.grammarwiz.com/parts-of-a-sentence.html
<1% -
https://www.researchgate.net/figure/Changing-the-verb-to-agree-with-the-subject-in-p
erson-gender-and-number_fig3_337226819
<1% - https://www.weegy.com/?ConversationId=JSWOL8S0
<1% - https://examples.yourdictionary.com/bad-grammar-examples.html

<1% -
https://www.coursehero.com/file/p3246vn/However-the-flow-of-information-must-be-c
arefully-planned-For-example-in-what/
<1% - https://github.com/muhadel/NLP_Topic-Modeling-Text-Classification
<1% -
https://thesai.org/Downloads/Volume2No8/Paper%2023-A%20rule-based%20Afan%20
Oromo%20Grammar%20Checker%20.pdf?iframe=true&width=100%25&height=100%2
5
<1% -
https://www.coursehero.com/file/p2a4aje/is-the-method-of-robot-programming-involv
es-the-use-of-a-programming-language/
<1% -
https://www.chegg.com/homework-help/many-studies-conducted-investigate-ability-cl
oud-seeding-inc-chapter-23-problem-11sdd-solution-9781285402536-exc
<1% -
https://www.researchgate.net/figure/Results-for-the-statistical-grammar-checker_tbl1_2
24178443
<1% - https://eng.ichacha.net/zaoju/approach%20sequence.html
<1% - https://www.theyworkforyou.com/debates/?id=1967-11-03a.511.1
<1% -
https://www.tutorialspoint.com/tensorflow/tensorflow_machine_learning_deep_learning.
htm
<1% -
https://www.researchgate.net/figure/Overview-of-the-FCN-model-The-number-of-hidd
en-units-and-layers-are-configurable-and_fig2_336008516
<1% -
https://www.researchgate.net/figure/Test-chart-of-the-optimal-number-of-hidden-layer
s-in-CNN_fig9_343634509
<1% -
https://medium.com/@azraarabdulahi/6-interesting-deep-learning-applications-for-nlp
-da3bd0839f97
<1% -
https://www.coursehero.com/file/p10jcnd9/13-Major-contributions-This-thesis-focuses-
on-the-development-of-the-execution/
<1% - https://aitopics.org/doc/conferences:C2B3DE0B
<1% -
https://www.researchgate.net/publication/322756374_A_Multilayer_Convolutional_Enco
der-Decoder_Neural_Network_for_Grammatical_Error_Correction
<1% -
https://www.researchgate.net/publication/270877911_Constrained_Grammatical_Error_C
orrection_using_Statistical_Machine_Translation
<1% -
https://www.semanticscholar.org/paper/Correction%3A-Wang-et-al.-Generalized-SCF-F
ormula-of-Wang-Luo/8b5c59fb435a91d541495c6dcee5fd96906ab7cc
<1% - https://www.catalyzex.com/paper/arxiv:2011.02093
<1% -
https://www.semanticscholar.org/paper/Statistical-Afaan-Oromo-Grammar-Checker-Des
alegn/6a7f96d76ebd726eb3e2f906c377991c9a968fa7
<1% - https://link.springer.com/article/10.1007/s10586-019-03045-6
<1% -
https://english.stackexchange.com/questions/242583/rephrase-to-the-best-of-my-know
ledge
<1% -
https://www.semanticscholar.org/paper/Design-and-Development-of-Amharic-Gramma
r-Checker-Temesgen/bbb89aba40458ae2af27e49574b42bb5671ab72b/figure/4
<1% -
https://essay-supply.com/the-chapter-appears-to-be-divided-into-five-main-sections-th
e-first-section-describes-three-viewpoi/
<1% -
https://www.researchgate.net/publication/351953233_Bidirectional_Octonion_Long-Shor
t_Term_Memory_Recurrent_Neural_Networks_for_Speech_Recognition
<1% - https://www.jetir.org/papers/JETIR1806682.pdf
<1% -
https://www.coursehero.com/file/p1p83m3d/The-following-are-some-pressing-question
s-related-to-health-that-we-propose-for/
<1% -
https://www.academia.edu/17321695/N_gram_based_statistical_grammar_checker_for_B
angla_and_English
<1% -
https://mathsgee.com/qna/16891/learning-contrast-machine-learning-algorithms
<1% - https://www.youtube.com/watch?v=wa5k7B9d1_0
<1% - http://nlp.seecs.edu.pk/publications/
<1% -
http://213.55.95.56/bitstream/handle/123456789/27003/Mehammedbrhan%20Abdelkad
r%20%202021.pdf?sequence=1
<1% -
https://www.researchgate.net/publication/335780758_Neural_Grammatical_Error_Correc
tion_Systems_with_Unsupervised_Pre-training_on_Synthetic_Data
<1% - https://facultyportal.psu.edu.sa/ViewProfile.aspx?instructorID=2116103
<1% -
https://www.academia.edu/7539488/Laurie_Bauer_The_Linguistics_Students_Handbook_
Edinburgh_Edinburgh_University_Press_2007_Pp_ix_387
<1% - https://www.bibsonomy.org/bibtex/6cd1878e11cb47ad2cd692fc44f0fb57
<1% -
https://www.coursehero.com/file/62246213/word-sequence-prediction-for-amharic-lan
guage-2pdf/
<1% -
https://1library.net/document/q5mjvme7-deep-learning-based-amharic-grammar-error
-detection.html
<1% - https://www.sciencegate.app/document/10.3390/fi13110275
<1% - https://www.cl.cam.ac.uk/research/nl/bea2019st/

You might also like