0% found this document useful (0 votes)

788 views

Shewa - NLP Project Report PDF

This document describes a project report on parsing Amharic sentences. It discusses using context-free grammar (CFG) and probabilistic context-free grammar (PCFG) approaches to parse sample Amharic sentences and display their parse trees. Code examples in Python are provided to demonstrate parsing a sentence using each approach and displaying the resulting parse tree. Challenges addressed include limited sample data and inability to test on longer, more complex sentences due to time constraints and lack of linguistic resources for Amharic.

Uploaded by

mekuriaw

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

788 views

Shewa - NLP Project Report PDF

Uploaded by

mekuriaw

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Bahir Dar University

Bahir Dar Institute of Technology

Faculty of Computing
MSc in Information Technology 1st Year (Regular)

Natural Language Processing (NLP)

Project Report paper on

“Amharic sentence Parse Tree”
.

By Shewandires Menan

ID: BDU1100033PR

Submitted to: Dr. Yaregal A.

August, 2020
NATURAL LANGUAGE PROCESSING PROJECT PAPER

Contents

1. Introduction ......................................................................................................................................... 1
2. Approaches used for this work .......................................................................................................... 1
3. Methodology ........................................................................................................................................ 2
4. Implementation ................................................................................................................................... 3
5. Challenges ............................................................................................................................................ 4
6. References ............................................................................................................................................ 5

BAHIR DAR UNIVERSITY | BIT i

NATURAL LANGUAGE PROCESSING PROJECT PAPER

1. Introduction
Parsing, one of the steps to design a functional NLP application and which can work in cooperation
and as input to other many NLP application like grammar and spell checker, spell correction, and
etc. In parsing the central point involves in manipulation, understanding, and parsing (breaking
down to manageable components), understand their context, relation with each other to
successfully identify their correctness. Sentences are the starting point when we come to analyzing
a written material or documents [1]. Syntax refers to the way words are related to each other in a
sentence. Then we can say that sentence parsing, which is also called syntactic parsing, is the
process of identifying how words can be put together to form correct sentence and determining
what structural role (lexical category) each word plays in the sentence and what phrases are
subparts of what other phrases or what other words modify which words of the central point of the
whole sentence constructed. A sentence parser outputs a parse structure that could be used as a
component in many applications including semantic analysis, machine translation, information
storage and retrieval of textual data etc., [2]. Today, parsers of different kinds (e.g. probabilistic,
rule based) have been developed for languages, which have relatively wider use nationally and/or
internationally (e.g., English, German, Chinese, etc. [3] My project work is focused on the
implementation of Amharic sentence that displays the parse tree for the sentence. To do sentence
parsing there are different methods, some of them are Context free Grammar (CFG) from rule-
based approach and Probability Context Free Grammar (PCFG) from statistical approach. Hence
my work is done using these two approaches, i.e., CFG and PCFG [4].

2. Approaches used for this work

The approaches I have used for this implementation as I mentioned on the above section, are CFG
and PCFG form statistical and non-statistical methods.
Context-free Grammar
A context-free grammar (CFG) is a formal system that describes a language by specifying how
any legal text can be derived from a distinguished symbol called the axiom, or sentence symbol.
[2] CFGs are a very important class of grammars for two reasons: The formalism is powerful
enough to describe most of the structure in natural languages, yet it is restricted enough so that efficient
parsers can be built to analyze sentences [3].

BAHIR DAR UNIVERSITY | BIT 1

NATURAL LANGUAGE PROCESSING PROJECT PAPER

Probabilistic Context-Free Grammars (PCFG) Parsing

PCFG is a context free grammar that associates a probability with each of its productions. It
generates the same set of parses for a text that the corresponding context free grammar does, and
assigns a probability to each parse. The probability of a parse generated by a PCFG is simply the
product of the probabilities of the productions used to generate it [1]. They produce a model of a
language based on real data, and therefore do not have to worry about things like grammatical
mistakes, which occur in real-life situations. Although PCFGs have many advantages, a critical
disadvantage is that context is not taken into account at all. In fact, a tri-gram (sequence of three
words in this case) model of a language would probably achieve better results, even though it takes
no account of internal structures in the language, more applicable to language like Amharic [3].

3. Methodology
The methodology I used to develop the implementation of Amharic Parse tree is, takes a set of
sample grammars 4 from simple to complex grammar production rules, and assigned those
probabilities for probabilistic approach parsing and draws their parse tree and specifies their
parsing structure based on the grammar.

To develop the implementation, talking source code wise: I have used a collection tools working
and supporting the main application for different purposes [2]. Below I have listed out the names.
❖ Python 3.7
❖ NLTK 3.2 Python Based Natural Language Processing Toolkit. (www.nltk.org)
❖ KeyMan Keyboard for Unicode Keyboard Writer (Amharic)
❖ PyScripter 3.7 for an interactive IDE for python.
In order to Setup my implementation, on a local environment, first python 3.7 must be installed
and then download NLTK 3.2 and install it under the python directory, because this used as library
inside a python code. Then you need to download NLTK data using python itself.

BAHIR DAR UNIVERSITY | BIT 2

NATURAL LANGUAGE PROCESSING PROJECT PAPER

4. Implementation
The first sample implementation of my work is the CFG approach for Amharic sentence parsing tree. The
source code and the output of the implementation is as follows: An example of a CFG is given below. For
a Sentence Like "አበበ የ ሰዉ አጥር ላይ ሆኖ አየ" can be represented using the following grammar.

S -> NP VP
VP -> V NP | V NP PP | NP V
PP -> P NP | P P
V -> "አየ" | "በላ" | "ተራመዳ"
NP -> "አበበ" | "ከበደ" | "ጫላ" | Det N| Det N N | Det N PP | N N | Det N N PP
Det -> "የ" | "ለ"
N -> "ሰዉ" | "ውሻ" |"አጥር"| "ድመት" | "መናፈሻ"
P -> "በ" | "ላይ" | "በኩል"|"ሆኖ"| "ከ"

The Syntax Parse Structure for the above example and its Parse Tree Using the developed
application looks like the following respectively: (S (NP አበበ) (VP (NP (Det የ) (N ሰዉ) (N አጥር)
(PP (P ላይ) (P ሆኖ))) (V አየ)))

Output is:

And the second implementation of my work is PCFG approach for Amharic sentence parsing
tree. The source code and the output of the implementation is as follows:

Example of PCFG grammar is shown below and, the approach is explained in a topic below the
figure.

BAHIR DAR UNIVERSITY | BIT 3

NATURAL LANGUAGE PROCESSING PROJECT PAPER

S -> NP VP [1.0]
VP -> V NP [0.2] VP -> V NP PP [0.3] VP -> NP V [0.1] VP -> NP Adj V [0.4]
PP -> P NP [0.2] PP -> P P [0.8]
V -> "አየ" [0.8] V -> "በላ" [0.1] V -> "ተራመደ" [0.1]
NP -> "አበበ" [0.2] NP -> "ከበደ" [0.1] NP ->"ጫላ" [0.1] NP -> Det N [0.1] NP -> Det N N [0.1]
NP -> Det N PP [0.1] NP -> N N [0.1] NP -> Det N N PP [0.2]
Det -> "የ" [0.9] Det -> "ለ" [0.1] N -> "ሰዉ [0.4]
N -> "ውሻ" [0.1] N -> "አጥር" [0.2] N -> "ድመት" [0.1] N -> "መናፈሻ" [0.1]
P -> "በ" [0.1] P ->"ላይ" [0.4] P -> "በኩል" [0.1] P ->"ሆኖ" [0.3] P ->"ከ" [0.1]
Adj ->"ትንሽ" [1.0]
The Syntax Parsed Structural Output using Viteberi algorithm using the above grammar is shown
below, with a final summed up probabilistic value.

Code Example Using Python

viterbi_parser = nltk.ViterbiParser(grammer)
sent = "አበበ የ ሰዉ አጥር ላይ ሆኖ ትንሽ አየ".split()
print (viterbi_parser.parse(sent))

Output of the above grammar and Viterberi_Parser in My application using Python

(S (NP አበበ) (VP (NP (Det የ) (N ሰዉ) (N አጥር) (PP (P ላይ) (P ሆኖ))) (Adj ትንሽ) (V አየ)))
(p=8.84736e-05)

5. Challenges
There are some challenges that occurred when doing the projects.
1. This study uses a very small sample prepared for the purpose of the work due to lack of
time and finding well organized corpus, machine editable dictionary, POS tagged words
and unable to find specially a POS tagger application for Amharic.
2. The prototype developed in the report/study parses is assumed to be supporting a 10 and
more composed -word Amharic sentences but, the to gain the real outcome of the prototype
developed, again due mainly to time constraint, lack of linguistic ability to possibility
determine grammar rules and probabilistic rules.
3. This report does not incorporate more advanced topic like ambiguity resolution, but showed sample
parsing using probabilistic approaches.

BAHIR DAR UNIVERSITY | BIT 4

NATURAL LANGUAGE PROCESSING PROJECT PAPER

6. References
[1] A. Alemu, "Automatic Sentence Parsing For Amharic Text An Experiment Using
Probabilistic Context Free Grammars," A Thesis Submited In Partial Fulfilment Of The
Requirement For The Degree Of Master Of Scinece In Information Science, 2002.
[2] "Natural language processing toolkit" Accessed from http://www.nltk.org/.

[3] Daniel Jurafsky & James H. Martin, "Speech and Language Processing: An introduction
to natural language processing, Computational linguistics, and speech recognition", 2007.

[4] Abiyot Bayou, "Design and Development of Word Parser for Amharic Language",
Masters Thesis, Addis Ababa University. 2000.

BAHIR DAR UNIVERSITY | BIT 5

The Absolutely True Diary of A Part-Time Indian
55% (170)
The Absolutely True Diary of A Part-Time Indian
35 pages
No Glamour Language - Middle School
91% (11)
No Glamour Language - Middle School
228 pages
Artful Sentences
100% (19)
Artful Sentences
314 pages
For Good - Wicked
100% (12)
For Good - Wicked
9 pages
Flylady Beginner Baby Steps
100% (26)
Flylady Beginner Baby Steps
13 pages
The Love Map 20 Questions Game
100% (13)
The Love Map 20 Questions Game
2 pages
The Complete Stories by Flannery O'Connor
92% (13)
The Complete Stories by Flannery O'Connor
309 pages
Grammar Practice Workbook
73% (82)
Grammar Practice Workbook
56 pages
In Christ Alone
100% (9)
In Christ Alone
6 pages
Marcus T. Bottomley - Real Magick For Real Problems
93% (92)
Marcus T. Bottomley - Real Magick For Real Problems
64 pages
Call Me Al - Sheet Music
100% (3)
Call Me Al - Sheet Music
6 pages
100 Writing Lessons
100% (20)
100 Writing Lessons
240 pages
WR Beg Grade 3 4
100% (2)
WR Beg Grade 3 4
113 pages
Javascript Leetcode Examples
No ratings yet
Javascript Leetcode Examples
34 pages
Word Roots 2
100% (7)
Word Roots 2
257 pages
Papa Jims Herbal Book
99% (69)
Papa Jims Herbal Book
57 pages
Baby Names
86% (7)
Baby Names
480 pages
Windows 7 Activation Key
No ratings yet
Windows 7 Activation Key
2 pages
Spelling Grade1 PDF
100% (5)
Spelling Grade1 PDF
80 pages
The Best Poems Ever - The Greatest Poetry of All Time
100% (3)
The Best Poems Ever - The Greatest Poetry of All Time
23 pages
I Don't Know How To Love Him-JC Superstar
100% (6)
I Don't Know How To Love Him-JC Superstar
4 pages
Guide To The Srewtape Letters by C.S. Lewis
100% (5)
Guide To The Srewtape Letters by C.S. Lewis
13 pages
Anthology of Jazz Charts
91% (23)
Anthology of Jazz Charts
973 pages
The Outsiders Study Guide
100% (3)
The Outsiders Study Guide
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
7 pages
CH 4 - Semantic Analysis PDF
100% (1)
CH 4 - Semantic Analysis PDF
36 pages
Information Retrieval - Question Bank
No ratings yet
Information Retrieval - Question Bank
3 pages
Unit - Ii 2.1 Syntax Analysis
No ratings yet
Unit - Ii 2.1 Syntax Analysis
122 pages
Deep Learning Approach For Ethiopian Banknote Denomination Classification and Fake Detection System
No ratings yet
Deep Learning Approach For Ethiopian Banknote Denomination Classification and Fake Detection System
8 pages
04 - 05-AI-Knowledge and Reasoning
No ratings yet
04 - 05-AI-Knowledge and Reasoning
61 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
Programming Language Design Issues
No ratings yet
Programming Language Design Issues
47 pages
Seminar Information System
No ratings yet
Seminar Information System
18 pages
1.introduction To Schema Refinement: Problems Caused by Redundancy
No ratings yet
1.introduction To Schema Refinement: Problems Caused by Redundancy
44 pages
Compiler Design PPT Final
No ratings yet
Compiler Design PPT Final
16 pages
Database Management System Assignment
No ratings yet
Database Management System Assignment
8 pages
Design Program Logic WDDBA L III
No ratings yet
Design Program Logic WDDBA L III
17 pages
Automata and Complexity Theory Reading Material
No ratings yet
Automata and Complexity Theory Reading Material
107 pages
NLP Based Automatic Answer Script Evaluation
No ratings yet
NLP Based Automatic Answer Script Evaluation
9 pages
ERP Lab Manual
No ratings yet
ERP Lab Manual
33 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
04 - 05-AI-Knowledge and Reasoning
No ratings yet
04 - 05-AI-Knowledge and Reasoning
61 pages
Admas University: Research Methods in Computer Science
No ratings yet
Admas University: Research Methods in Computer Science
61 pages
Project Presentation Template
No ratings yet
Project Presentation Template
14 pages
Machine Learning (15CS73) Question Bank
No ratings yet
Machine Learning (15CS73) Question Bank
2 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
Atm System PDF
100% (1)
Atm System PDF
19 pages
Cert Study Plan
No ratings yet
Cert Study Plan
3 pages
FP Tree Growth: Frequent Pattern Growth Algorithm
100% (1)
FP Tree Growth: Frequent Pattern Growth Algorithm
2 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Chapter 4 Array
100% (1)
Chapter 4 Array
24 pages
CS1352-Principles of Compiler Design Question Bank
100% (1)
CS1352-Principles of Compiler Design Question Bank
2 pages
Issues in Knowledge Acquisition
No ratings yet
Issues in Knowledge Acquisition
8 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
UNIT-6 Important Questions & Answers
No ratings yet
UNIT-6 Important Questions & Answers
20 pages
Information Storage and Retrieval - 783
100% (1)
Information Storage and Retrieval - 783
12 pages
System Analysis and Design 17210 - 1338959710 PDF
No ratings yet
System Analysis and Design 17210 - 1338959710 PDF
10 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
006 Practical List of DM-2023
No ratings yet
006 Practical List of DM-2023
1 page
L-2.9 Hmac Cmac
No ratings yet
L-2.9 Hmac Cmac
14 pages
NLP Unit 1 Notes
100% (1)
NLP Unit 1 Notes
19 pages
AI-BASED-MOCK-INTERVIEW-EVALUATOR-AN-EMOTION-AND-CONFIDENCE-CLASSIFIER-MODEL
No ratings yet
AI-BASED-MOCK-INTERVIEW-EVALUATOR-AN-EMOTION-AND-CONFIDENCE-CLASSIFIER-MODEL
8 pages
Understanding Inputs and Outputs of Mapreduce
No ratings yet
Understanding Inputs and Outputs of Mapreduce
13 pages
Parkinson's Disease Detection
No ratings yet
Parkinson's Disease Detection
88 pages
Prolog Lab File
0% (2)
Prolog Lab File
20 pages
Machine Learning Tutorial PDF
No ratings yet
Machine Learning Tutorial PDF
56 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
Candidate Elimination Algorithm
No ratings yet
Candidate Elimination Algorithm
24 pages
RM4151 Class Notes3
No ratings yet
RM4151 Class Notes3
14 pages
Q.No Aim/Principle/ Apparatus Required/procedure Tabulation/Circuit/ Program/Drawing Calculation & Results Viva-Voce Record Total 1 5 25 10 10 10 100 2 5 25 10 Q.NO. Experiment List
67% (3)
Q.No Aim/Principle/ Apparatus Required/procedure Tabulation/Circuit/ Program/Drawing Calculation & Results Viva-Voce Record Total 1 5 25 10 10 10 100 2 5 25 10 Q.NO. Experiment List
5 pages
Internship Report
No ratings yet
Internship Report
13 pages
R24-M.Tech(CSE) course Structure and Syllabus (1)
No ratings yet
R24-M.Tech(CSE) course Structure and Syllabus (1)
73 pages
Krr Unit i Notes
No ratings yet
Krr Unit i Notes
32 pages
What Is Apache Flume?: Collecting, Aggregating, and Moving Large Amounts of Log Data. in
No ratings yet
What Is Apache Flume?: Collecting, Aggregating, and Moving Large Amounts of Log Data. in
8 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Probabilistic Context Free Grammar For Urdu: Keywords
No ratings yet
Probabilistic Context Free Grammar For Urdu: Keywords
8 pages
Natural Language Processing: Parsing
No ratings yet
Natural Language Processing: Parsing
18 pages
NLP CHAPTER 3
No ratings yet
NLP CHAPTER 3
23 pages
Makalah Sociolinguistics
No ratings yet
Makalah Sociolinguistics
8 pages
Create Database ABC
No ratings yet
Create Database ABC
2 pages
Part of Speech Tagging
100% (2)
Part of Speech Tagging
13 pages
Part of Speech Tagging
100% (2)
Part of Speech Tagging
13 pages
Data Model: Types of Data Models
No ratings yet
Data Model: Types of Data Models
3 pages
Basic Apparel Production
No ratings yet
Basic Apparel Production
1 page
Wireless Transmission
No ratings yet
Wireless Transmission
39 pages
Poetry Structure Form
No ratings yet
Poetry Structure Form
4 pages
Santiago, National High School Senior High School Department
50% (2)
Santiago, National High School Senior High School Department
4 pages
FROZEN Music From The Motion Picture Soundtrack
100% (2)
FROZEN Music From The Motion Picture Soundtrack
72 pages
Mad Lips From Outer Space
100% (2)
Mad Lips From Outer Space
44 pages
Writing Prompts
71% (7)
Writing Prompts
752 pages
You Can Write Poetry
100% (5)
You Can Write Poetry
126 pages
Taylorswiftlovestory PDF
100% (2)
Taylorswiftlovestory PDF
13 pages
The Novel-Writing Training Plan PDF
100% (10)
The Novel-Writing Training Plan PDF
57 pages
Poverty Employment ESI Notes
No ratings yet
Poverty Employment ESI Notes
9 pages
Unit 11 - Late Adulthood
No ratings yet
Unit 11 - Late Adulthood
14 pages
Gehu 209 - Paleolithic Age & Human Evolution (Notes 1) (1) 2
No ratings yet
Gehu 209 - Paleolithic Age & Human Evolution (Notes 1) (1) 2
36 pages
Condensate Presentation
No ratings yet
Condensate Presentation
1 page
Estabilidad Grandes Ángulos
No ratings yet
Estabilidad Grandes Ángulos
16 pages
Catalog Transformatoare Cofi2011
No ratings yet
Catalog Transformatoare Cofi2011
7 pages
Car Heater Not Blowing Air - Here's How To Fix It - Rustyautos - Com - Reader View
No ratings yet
Car Heater Not Blowing Air - Here's How To Fix It - Rustyautos - Com - Reader View
10 pages
Gujarat Technological University: Integrated Master of Business Administration
No ratings yet
Gujarat Technological University: Integrated Master of Business Administration
3 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
Angels and Demons Text
100% (1)
Angels and Demons Text
36 pages
Kyle Mcevoy - Test Automation in Python
No ratings yet
Kyle Mcevoy - Test Automation in Python
144 pages
Acad Calander-Jan-Jun - 2024
No ratings yet
Acad Calander-Jan-Jun - 2024
1 page
Effects of Forest Bathing Shinrin Yoku o
No ratings yet
Effects of Forest Bathing Shinrin Yoku o
18 pages
2021 2 0-Catalog
No ratings yet
2021 2 0-Catalog
100 pages
NATO First Aid Kits
No ratings yet
NATO First Aid Kits
16 pages
Early Childhood Education MCQs by Doc4shares-Com
50% (2)
Early Childhood Education MCQs by Doc4shares-Com
86 pages
Eisermann-Avendano 2018 Birds-Guatemala
No ratings yet
Eisermann-Avendano 2018 Birds-Guatemala
82 pages
5 Axis Intro Lesson 1
No ratings yet
5 Axis Intro Lesson 1
44 pages
Palladium - Wikipedia, The Free Encyclopedia
No ratings yet
Palladium - Wikipedia, The Free Encyclopedia
12 pages
DPS JHS Student Handbook SY 2023 2024
No ratings yet
DPS JHS Student Handbook SY 2023 2024
56 pages
Sustainable biofloc systems for marine shrimp Samocha - Read the ebook now with the complete version and no limits
100% (3)
Sustainable biofloc systems for marine shrimp Samocha - Read the ebook now with the complete version and no limits
42 pages
VFT
No ratings yet
VFT
5 pages
A comparative study of adhesion test methods for hard coatings
No ratings yet
A comparative study of adhesion test methods for hard coatings
17 pages

Shewa - NLP Project Report PDF

Uploaded by

Shewa - NLP Project Report PDF

Uploaded by

Bahir Dar University

Bahir Dar Institute of Technology

Natural Language Processing (NLP)

Project Report paper on

Submitted to: Dr. Yaregal A.

BAHIR DAR UNIVERSITY | BIT i

2. Approaches used for this work

BAHIR DAR UNIVERSITY | BIT 1

Probabilistic Context-Free Grammars (PCFG) Parsing

BAHIR DAR UNIVERSITY | BIT 2

BAHIR DAR UNIVERSITY | BIT 3

Code Example Using Python

Output of the above grammar and Viterberi_Parser in My application using Python

BAHIR DAR UNIVERSITY | BIT 4

BAHIR DAR UNIVERSITY | BIT 5

You might also like