Simplification Complex Sentences in Indonesia Language Using Rule-Based Reasoning PDF
Simplification Complex Sentences in Indonesia Language Using Rule-Based Reasoning PDF
Complex sentence consist two or more single sentence. Future studies related on the simplification of text in
Every complex sentences has different conjunctions and some French[3] entitled Acquisition of Syntactic Simplification
are not wearing but use a comma.Complex sentences can be Rules for French. His research describes the simplification of
determined by looking at the use of conjunctions or syntax (syntactic simplification) is a data-driven approach that
punctuation used. People often difficult to understand what implements two methods. The first method is manual corpus
you want delivered in complex sentences. Using conjunctive analysis that aims to identify the word you would be
or sign can make a complex sentence into a different meaning simplified, then the second method is a semi-automatic that
and content of information. automatically identifies the simplified function informs
sentence simplification rules. The results of his research to
The problem is when simplification the sentence which overcome obstacles no longer need the data as parallel
divides complex sentences into a single sentence in which a resources and increase flexibility. In particular, syntactic
single sentence that has been simplified to alter the meaning simplification can explore domains on user-generated content
and content of information. Simplifying text (text as pre-editing for statistical machine translation.
simplification) is one of the fields of natural language
processing (NLP) which rewrite a sentence to reduce syntactic Research simplification of text above, this study will
complexity (syntactic complexity) and lexical complexity examine the simplification of complex sentences in Indonesian
(lexical complexity) without changing or eliminating the using Rule-Based Reasoning method based on rules and their
meaning of the sentence and fill in the information surface expression tagger post on the introduction of a class of
sentence[1]. In particular, the simplification of the sentence words in a sentence. In a complex sentence simplification is
has been developed in various countries. Development is done expected to facilitate the delivery of simplifying complex
by a variety of methods and rules. However, in Indonesia the sentences and the meaning of the sentence.
development of sentence simplification is not much, especially
simplification complex sentences III. PERFORMANCE
RIFKA WIDYASTUTI | INFORMATIC ENGINEERING | DEPARTEMENT OF COMPUTER SCIENCE | SRIWIJAYA UNIVERSITY 2014 1
Natural Language Processing is the area of research and
application that explores how computers can be used to
understand and manipulate the natural language text or speech c. Rule-Based Reasoning
to do something useful [4]. Rule-Based Reasoning is a decision support system which
also has a knowledge base. In this method, the settlement of
a. Preprocessing the problem based on an artificial intelligence approach using
Preprocessing is process of managing the data before the problem-solving techniques based on the rules contained in
processing data [5]. Preprocessing consist of case folding and the knowledge base [7].
tokenizing. Case folding is process of changing all the letters
in a document / sentence to lowercase. Only the letters 'a' [2] uses the rules of the component surface expression
through 'z' received [6] while the characters other than letters answer finder. Surface expression is the surface expression of
received are considered delimiter. Examples delimiter can be the sentence or the pattern used in the sentence. regulation of
seen in Table I. surface expression in the study can be found in appendix
Table I. Daftar Delimiter
Daftar Delimiter d. Complex Sentences
0 5 [ % ` . ? | ) ≥ Complex sentence is a merger of two or more single
1 6 ] ^ ~ , : ! - ∞ sentences using conjunctions. Examples of complex sentence
simplification can be seen in Figure II.
2 7 { & \\ / ; @ _ π
1. Tini berbelanja sayuran.
3 8 } * £ < ‘ # + ±
2. Tini memasak sayuran
4 9 \ ( € > ‘’ $ = ɸ 3. Tini berbelanja sayuran dan memasaknya
Tokenizing is process of identification the smallest units
(tokens) of a sentence structure (Triawati, 2009). Breaking
sentences into single words performed by scanning a sentence
using white space separators such as spaces, tabs, and newline.
Schematic of the process of folding and tokenizing case can be
seen in Table II.
Tabel II. Preprocessing Sentences Scheme
Preprocessing Sentences
Sentences : Ibu Pergi Ke Pasar
Case folding : ibu pergi ke pasar
Tokenizing : “ibu” “pergi” “ke” “pasar” Figure II. Chart of Complex Sentences (3)
b. Part of Speech
Part of Speech (POS) tagging is a process that is done to IV. EXPREMENTAL
determine the type of a word in the text. A simple form of this
process is the identification of words as adjectives, adverbial, Simplification complex sentence is not as easy as one
interjection, conjunction, noun, numerial, prepositions, might imagine, some people find it difficult to simplification
pronouns, verbs, etc. [5]. The process of determining the type complex sentences, especially during the learning process in
of words in a sentence can be seen in Figure I. schools. Therefore, need a applications to help the learning
process and make it more attractive. In this research,
simplification complex sentence process starts from
preprocessing which case folding and tokenizing. The results
of the research complex sentence preprocessing can be seen in
Figure III.
RIFKA WIDYASTUTI | INFORMATIC ENGINEERING | DEPARTEMENT OF COMPUTER SCIENCE | SRIWIJAYA UNIVERSITY 2014 2
Preprocessing Kalimat Majemuk
Contoh Kalimat :
“Tini Berbelanja Sayuran dan Ibu Memasaknya”
Hasil Proses Case Folding:
“tini berbelanja sayuran dan ibu memasaknya”
Hasil Proses Tokenizing:
CONCLUSION
RIFKA WIDYASTUTI | INFORMATIC ENGINEERING | DEPARTEMENT OF COMPUTER SCIENCE | SRIWIJAYA UNIVERSITY 2014 3
ICITA 2005. Third International Conference on,
The conclusion that can be take from this study are 2005, pp. 52-57 vol.1.
1. Methods of Rule-Based Reasoning can be used to [8] A. Chaer, Sintaksis Bahasa Indonesia: Pendekatan
simplification sentence and can be applied to the case of Proses: Rineka Cipta, 2009.
complex sentences which basically has a single and a [9] B. S. R. Chandrasekar "Automatic Induction of Rules
two-sentence conjunctions. for Text Simplification," Institute for Research in
2. Rules Surface Expression can be used to describe the Cognitive Science, 1996.
word before and after the conjunctive. So that the [10] P. M. Nugues, An Introduction to Language
compound sentence can be simplified by appropriate Processing with Perl and Prolog. Germany:
because it does not change the meaning and information Springer-Verlag Berlin Heidelberg, 2006.
after simplifying complex sentences. [11] W. Duch, "Rule-Based Methods," Department of
3. Sentence of 60 samples were available, the percentage of Informatics, Nicolaus Copernicus University, Poland,
complex sentences simplification results in Indonesian 2010.
using Rule-Based Reasoning on software as much as [12] S. D. Hasan Alwi, Hans Lapoliwa, Anton M.
93.3% of the 60 samples in which the existing manjemuk Moelino, "Tata Bahasa Baku Bahasa Indonesia," vol.
sentences, compound sentences there are four samples EdisiKetiga, ed. Jakarta: Pusat Bahasa
that can not be simplified appropriate. This is because an danBalaiPustaka, 2003, p. 475.
error occurred while defining the token word and sample [13] A. O. Hatem, N. Shaker, "Morphological Analysis
sentences compound does not have a compound sentence for Rule-Based Machine Translation," in Semantic
patterns that have been defined. Technology and Information Retrieval (STAIR), 2011
4. Results simplification of complex sentences are split into International Conference on, 2011, pp. 260-263.
two single sentences and the conjunctive word is [14] R. Ismoyo, Nasarius Sudaryono, Bahasa Indonesia
determined by the class defined. Just a word class of untuk Sekolah Dasar/MI Kelas 6. Jakarta: Pusat
each word in a sentence compound sentence is used to
simplify the process of using Rule-Based Reasoning.
Therefore, the software can simplify complex sentences
are not appropriate when an error in the definition of the
word class by NLP_ITB package.
REFERENCES
RIFKA WIDYASTUTI | INFORMATIC ENGINEERING | DEPARTEMENT OF COMPUTER SCIENCE | SRIWIJAYA UNIVERSITY 2014 4
APPENDIX A
Surface Expression Rules in Simplification Complex Sentences
RIFKA WIDYASTUTI | INFORMATIC ENGINEERING | DEPARTEMENT OF COMPUTER SCIENCE | SRIWIJAYA UNIVERSITY 2014 5
34. Kemudian Keterangan (noun) Predikat (verb) Conjunction on middle of complex sentences
35. Kemudian Objek (noun) Predikat (verb) Conjunction on middle of complex sentences
36. Meskipun - Subjek (noun) Conjunction in Front of Sentences
37. Meskipun - Predikat (verb) Conjunction in Front of Sentences
38. Lalu Objek (noun) Predikat (verb) Conjunction on middle of complex sentences
39. Ketika - Subjek (noun) Conjunction in Front of Sentences
40. Walaupun - Subjek (noun) Conjunction in Front of Sentences
41. Walaupun - Objek (noun) Conjunction in Front of Sentences
42. Agar Objek (noun) Predikat (verb) Conjunction in Front of Sentences
RIFKA WIDYASTUTI | INFORMATIC ENGINEERING | DEPARTEMENT OF COMPUTER SCIENCE | SRIWIJAYA UNIVERSITY 2014 6
APPENDIX B
EXAMPLE OF EXPERIMENT SIMPLIFICATION COMPLEX SENTENCES
RIFKA WIDYASTUTI | INFORMATIC ENGINEERING | DEPARTEMENT OF COMPUTER SCIENCE | SRIWIJAYA UNIVERSITY 2014 7