0% found this document useful (0 votes)
48 views

Laboratory Assignment Two

This document provides instructions for a laboratory assignment on natural language processing. It discusses developing human language processing systems using a 6-step method: 1) understand the problem, 2) state assumptions, 3) perform behavior analysis, 4) design the system, 5) implement the system, and 6) evaluate the system. The document outlines 3 laboratory assignments on speech signal analysis, grammar of language, and developing a simple grammar checker. It provides tasks for students to complete for each laboratory assignment, including using Praat software to analyze speech and building a phrase structure grammar model.

Uploaded by

Judah Praise
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Laboratory Assignment Two

This document provides instructions for a laboratory assignment on natural language processing. It discusses developing human language processing systems using a 6-step method: 1) understand the problem, 2) state assumptions, 3) perform behavior analysis, 4) design the system, 5) implement the system, and 6) evaluate the system. The document outlines 3 laboratory assignments on speech signal analysis, grammar of language, and developing a simple grammar checker. It provides tasks for students to complete for each laboratory assignment, including using Praat software to analyze speech and building a phrase structure grammar model.

Uploaded by

Judah Praise
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

O.BÁFÉ.MI AWÓLÓ.WÒ.

UNIVERSITY
ILÉ-IFÈ., NIGERIA
FACULTY OF TECHNOLOGY
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
THIS DOCUMENT IS NOT FOR SALE !
CPE 510: Introduction to Natural Language Processing and Its
Applications
Rain Semester, 2021-2022 Session
GROUP LABORATORY ASSIGNMENT 02: Application

INSTRUCTION FOR THE ASSIGNMENT


1. You are to carry out the tasks in this assignment within your own Group.
2. YOU ARE TO UPLOAD: A PDF version of your report. The hardcopy
should also be submitted after the upload.
3. YOU ARE TO UPLOAD: A PDF version of the scanned copy of the attend-
ance sheet of the meetings of your Group.

This document is meant to support your laboratory work in CPE510. Attempts has
been made to present the content of this document as accurately as possible. How-
ever, cases of typographical, grammatical or other errors are likely. This is uninten-
tional and I will be happy if you could please draw my attention to such errors (email
to: [email protected]) as soon as you find them.

O.DÉ.JO.BÍ O.dé.túnjı́ Àjàdı́

M AY, 2023
CONTENTS

1 Introduction 2
1.1 HLP System Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Understand the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 State Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Behaviour Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Laboratory Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Assignment Submission Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Laboratory II: Speech Signal Analysis 6


2.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Quantification of tone, pitch and emotion in spoken language . . . . . . . . . . . . . . 9
2.1.2 Quantification of phone in spoken language . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Quantification of speaker feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Speech based applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Using Praat software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Task 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Laboratory III: Grammar of Language 12


3.1 Theoretical treatment of language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Regular grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Context-neutral grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Grammatical formulation of written human language . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Phrase structure grammar modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1 The VT symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2 The VN symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.3 The S symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.4 The P symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Simple Grammar Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Task 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 Task 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.3 Task 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.4 Task 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1
CHAPTER 1

INTRODUCTION

The practical classes in this course will, among other things, allow you to further explore the ideas
and concepts discussed during the lectures. Specifically, the laboratory assignments in this course
have three goals;

1. They help you to better understand the general concepts and principles discussed in CPE-510
course lectures. They also help to guide your private studies, by allowing you to explore specific
concepts in the subject matter of Human (Natural) Language Processing (HLP) techniques and
their applications.

2. They help to improve your proficiency in HLP systems development and computational prob-
lem solving in general.

3. They allow your instructor(s) to evaluate and assess your understanding of the subject matter
in this course.

Accordingly, you are strongly advised to make sure that all the work which you submit for assess-
ment arise out of your efforts. You are permitted and encouraged to discuss general solution design
(i.e., heuristics/ algorithms/grammar) with the course lecturers as well as the course and laboratory
instructors. You may also receive help with specific debugging problems during the practical ses-
sions from the laboratory instructors. However, you are expected to work independently or only
within your own team (where applicable) when in the laboratory. Remember that the motto of this
university is for learning and culture.

1.1 HLP System Development Process


In this course, we will adopt the term Human Language Processing (HLP), rather than the more
popular title (Natural Language Process (NLP)), for the reasons already discussed during our classes.
Specifically, the question as to whether nature has a language or not is outside the purview of this
course. This course is concerned with the phenomenon of human language; the native instrument
that humans use to communicate within the self and with others. For example, this course will not

2
investigate the call and response system used by non-human natural entities such as whales, birds,
monkeys and so on.
The development processes of a HLP systems is similar to that of conventional computational
systems. During lectures, we have discussed the possibility of designing and implementing abstract
machines that model the language of a system. We have identified six steps in the design process:

1.1.1 Understand the Problem


The first action in this step is to give the target system or solution a name. The main activity in
this step is to get a clear understanding of the language of the target system. Case examples of
the possible expressions and terminologies usage characteristics of the system can be very helpful
in this task. Note that, unlike conventional computing systems, some of the characteristics of HLP
systems are not directly observable or describable definitively. A careful analysis of case expressions
generated or recognised by the system, and the corresponding responses to them, can provide useful
insights into the system’s language.

1.1.2 State Assumptions


State all assumptions you are making regarding your model. The assumption must, first of all, takes
cognisance of the nature and capability of the system. Make sure that you are able to justify your
assumptions using reasonable arguments. Appropriate assumptions include those justified using evid-
ence based on the environment in which the system will work, the resource available for solving the
problem, the target users of the problem, etc.

1.1.3 Behaviour Analysis


This comprises two tasks: (i) System space-state and and (ii) System dynamics analysis. The system
state-space (Input-Output space) analysis task involves the identification of each and every relevant
question (input) that can be posed to the system and the expected corresponding response(output).
The elements and structure of the Input/Output pattern are named and labelled using appropriate
variable types. Note that HLP systems Input/Output may be coded in many different forms such as
text, sign and/or stream of waveforms corresponding to speech. In the System state analysis task,
the observable states of the HLP system are identified and labelled. The states are the set of possible
configuration the system can assume during its operations. By configuration we mean how the various
components of a system is arranged and connected. The dynamics of the system specifies how the
system transits from one state into another during operation.

1.1.4 System Design


This involves studying possible solution strategies and methods. After this, the most appropriate
computing tools for representing the behaviour of the HLP system analysed in Step 3 is selected.
Possible design tools include: semantic network, heuristics, formal language, state transition dia-
grams and graphs, plan, etc. At the end of this process, a document containing the HLP system

3
is generated. This document contains a representation of the behaviour of an HLP system and the
desired solution. This task corresponds to the design task in engineering.

1.1.5 Implementation
Reduce the system behaviour designed in Step 4. into a computing artefact using appropriate software
or programming language codes on an appropriate hardware. The programming language and tools
selected will be informed by Step 4. They must have features appropriate for implementing your
HLP system. We will be adopting Python as the programming language in this course. Tools such
as the Natural Language Tool kit (NLTK) (Python based), JFLAP (system design) and Praat (speech
data collection analysis and simulation) will also be used.

1.1.6 Evaluation
This involves testing the system implemented in Step 5. to ensure compliance with problem specific-
ation. The test of a HLP system is in two stages: (i) Inside test and (ii) Outside test. In the inside
test, the performance of the system is assessed using data from the case examples used for the system
design. When the system has done well on the inside test, the the outside test is applied. In the Out-
side test, the performance of the system is assessed using data that are NOT from the case examples
used for the system design. Based on the outcomes of the evaluation, a decision to correct, modify or
deploy the system is taken as appropriate. The goal of the evaluation process is to determine how
closely the system mimics the target human language behaviour. Speed of operation, memory
used and other parameters for evaluating conventional system, are desirable but not sufficient criteria.

1.2 Laboratory Reports


The documentation of your experiments is very important and will be given much attention during the
grading of your laboratory work. You should, therefore, ensure that your laboratory reports follow
the format stated in Table 1.1. It should include information about your observations as much as
possible. Also note that the items listed as 1, 2, 3, 6, 7, 8, and 13 must be included in your reports.
Items listed as 5, 9, 10, 11 are important. While items listed as 4 and 12 should be included when
used. The presentation of your laboratory report should be clean, clear and legible.

1.3 Assignment Submission Dates


The dates when you will be required to submit the reports of your assignments will be posted on
the course web-page. It may also be announced during lectures. You should endeavour to submit
your reports on, or before, such dates. Late submission will attract mark deductions. If you are
having issues with any of the packages you are using in the laboratory, please seek assistance from
the laboratory coordinators well in advance enough to allow you time to complete your assignment.

4
Table 1.1: Experiment Document Contents

Ser. Item names Item Descriptions


No.
1. Name and Title of Report NAME: Laboratory One.
TITLE: Language and Grammar Modelling
2. Author and date Your full name(s), Identification numbers, and Department
of major (this must be listed for all group members if it is a
group work)
3. Contents Description Table of Contents
List of Figures (If any)
List of Tables (If any)
4. Glossary of Notation and List and definition of all uncommon symbols and terms as
Terms used in this report.
5. Introduction Brief discussions, stating the background of the present
work.
6. Problem statement Statement of the problem
Case inputs and corresponding outputs
Mathematical models of problem.
Computational models of problem.
Supporting theory and physical laws for solution (if any)
7. Objectives of Experiment General Objective(s) of the lab. work
Specific Objective to be accomplished in this experiment
8. Experiment procedure Initial setting/material and tools.
Experiment processes.
Machine design
Machine implementation.
Final setting or configuration (if any)
9. Discussion of results Observations, analysis of method used, significance of result
obtained.
10. Technological interpretation Illustrate, with examples, the real-life interpretation and ap-
and Application of results plications of the result of your experiment.
11. Conclusion and Suggestions Summary and general observation, next course of action.
for further works
12. Appendixes Tables of values and results
Program listings.
13. References List all the literature cited in your work.

5
CHAPTER 2

LABORATORY II: SPEECH SIGNAL ANALYSIS

2.1 Preamble
The process ascribed to the activities of human speech organ is the object of study in the speech
production. Figure 2.1 depicts the anatomy of the principal organs in the human speech production
process. These organs are responsible for the articulation of speech. The scientific study of how these
organs works together during the speech production process as well as the characterisations of the
acoustic manifestation of the speech produced is the subject matter of Phonology and Phonetics.
The data and narratives from the fields of Phonology and Phonetics are essential to the devel-
opment of some HLP applications such as speech synthesis and speech recognition systems. For
example, an essential aspect in the development of text-to-speech synthesis system is to figure out
how to convert digital texts into speech sound waveforms. This development process is informed by
data and procedure grounded in phonology and phonetics. This is also the case in the design of a
speech recognition system.
In this part of the laboratory, we are concern with the waveforms produced by the human organs.
We want to interrogate how the data of these waveforms can be used to characterise spoken language
expression. This is with the aim to develop appropriate computing applications for selected human
languages to the extent that technology permits. The waveform in reference are those that are digitally
captured, represented, manipulate and store through microphone. Such waveforms will normally be
represented as annotated streams of binary signals.
In order to investigate spoken human language we need a simplified model by which we can
represent the important feature of the human speech organs. Such model is depicted in Figure 2.2.
Speech is the physical sensorily accessible waveform and signals. How the activities of the faculty
of language influences the organs of language, and how the activities of the organs of language
culminate to manifest as speech, remains a subject of debate and mystery. These are outside the
purview of human language processing. Note that there is a difference between the sound of speech
and the symbol of sound. The sound of speech is what the human sensual organ perceives or capture
in spoken language. The symbols of sound are ascribed by consensus and used to represent, for
example, letter in an orthography (writing system). The digital or binary signal of the symbols of

6
Figure 2.1: The Human speech organ

Figure 2.2: A model of the human speech organ

7
sound is what is represented and manipulated by electronic instruments and computing machines.
The aim of the set of experiments in this laboratory is to explain our observation about the cor-
relation between speech, based on sounds recorded as waveforms, and features ascribed to them.
Most indigenous West African languages are tonal. This implies that, the tone and phone constitute
their perception. Therefore, tone and phone are mutually inclusive and inextricably intertwined in the
sound of spoken tone language.
Consider the phone and tone in the Standard Yorùbá two (2) syllable (bi-syllabic) word Koko
depicted in Table 2.1. You will observe that all the words comprises the string Koko which itself
contains two phones Ko and ko. However, the tone accompanying each phone determines the word.
The possible word and non-word strings is listed in Table 2.1. The English language gloss of respect-
ive word is in the last column of in Table 2.1. In summary, it will be observed that the information
encoded in each word is a function of the phone and tone in its constituent syllables.

Table 2.1: Forms of Yorùbá term Koko


Ser. No. Terms Phones Tones English(Gloss)

1. Koko Koko MM Difficulty/ hardness

2. Kókò Koko HL Coco yam

3. Kókó Koko HH Substance/Lump

4. Kòkó Koko LH Cocoa

5. kòkò Koko LL Pottery

6. Kokò Koko ML Join or juxtapose

7. Kóko Koko HL a bird

8. Kokó Koko MH XX (a morpheme)

9. Kòko Koko LM XX (a morpheme)

The phones in the word are represented with the symbols ko and ko. The concatenation of these
symbols, that is koko, represents the phone of the word. There are three (3) discrete tones in Yorùbá:
Neutral, Low and High. These are represented with the symbols M, L, and H, respectively. The
neutral tone is often called Mid-tone in conventional discourse on Yorùbá. The Low (L) and High
(H) tone emerges from the Neutral tone (M) in the spoken Yorùbá language. The structure of the two
syllables are drawn from the same symbols. However, each of the syllables in a word carries a tone.
In the word in Item No. 1 in Table 2.1, the two syllables carry the Neutral (or Mid) tone. The two
syllables in the word in Item No. 4, carries Low (L) and High tone (H), respectively.
Each of the spoken word is a continuous stream of sound in which the “tone” and “phone” are
intertwined. In the written form, however, each syllable and word is composed through a discrete
combination of symbols representing tones and phones. In the written Yorùbá language, the symbol
for each syllable is composed by stacking a tone on its corresponding phone.

8
As discussed during lectures, spoken language is a happening while written language is a process.
In fluent Yorùbá speech, tones and phones are emerges through mutually inclusive, complementary
and intertwined language phenomenon. In written Yorùbá language, tones and phones are constructed
through mutually exclusive and independent set of rules. It is important to note that the sensory
experiences of human speech is in continuum whereas quasi-continuum is ascribed to the sound
of spoke words. The discreteness and/or letters ascribed to spoken word are for the purpose of
orthography (written text). In this regard, the written text is a circumscription (an approximation)
of its spoken expression. As discussed during the lecture, there is no letter in the spoken human
language. More fundamentally, there is no quantity or number in spoken human language.
Speech signals are recorded through various instrument. Modern electronic devices make it pos-
sible to ascribed number to recorded spoken language. Modern computing machines make it possible
to represent and store recorded speech digitally. The following are the numerical cues for various fea-
tures of spoken languages. A sample of such recording is the spoken sentence “Bàbá àgbè. ti ta kòkó”
depicted in Figure 2.3.

2.1.1 Quantification of tone, pitch and emotion in spoken language


In speech signal, the cue for the acoustic correlate of tone is Fundamental frequency of the speech
waveforms. Fundamental frequency is represented with the symbols F 0. The unit Hertz (Hz) is used
to scale frequencies. The F 0 Hz is used to cue the pitch of spoken sounds. Generally, women speech
have higher F 0 than men. Some emotional characteristics of speakers, such as excitement, sadness,
happiness, and so on, are known to manifest in the F 0 patterns of speech. F 0 cues the tone of spoken
words. For example, the high (H) tone in the syllable Kò, is cued by the Fundamental frequency (F 0)
in its recorded spoken expression.

2.1.2 Quantification of phone in spoken language


Format frequencies are the resonance of the Fundamental frequencies. The first two formants of
speech waveform, that is F 1 and F 2, cues the phone of the speech sound. For example, the phone
(ko) in the syllable Kò, is cued by the F 1 and F 2 in its recorded spoken expression.

2.1.3 Quantification of speaker feature


Higher formants, i.e. F 3 and F 4, are speaker dependent. What this means is that each human speaker
of a language has unique F 3 and F 4 patterns.

2.2 Speech based applications


The data in respect of these feature of the recorded speech signal are used to characterise acoustic
cue. The features can then be used in the development of HLP applications:

1. To develop a tone recognition/synthesis system, the F 0 data in the speech waveform are re-
quired.

9
2. To develop a phone recognition/synthesis system, the F 1 and F 2 data in the speech signal are
required.

3. To develop a syllable or speech recognition/synthesis system, the F 0 together with the F 1 and
F 2 data in the speech signal are required.

4. To develop a speaker recognition system the F 3 and F 4 data are required.

5. To develop an application to ascribe emotion to spoken human expression, F 0, F 3 and F 4 are


required

2.2.1 Using Praat software


The image in Figure 2.3 is a screen shot of the phrase “Bàbá àgbè. ti ta kòkó’’ (The farmer (old man)
has sold cocoa) spoken by a male adult native speaker of Yorùbá. There are five (5) distinct Panel in
the image. The top most panel is the speech signal waveform. Below this is a panel that contains the
spectrograph of the signal waveform. The formant frequencies are indicated with the RED sparkles
of lines on top of the spectrograph. The Fundamental frequency F 0 is indicated as the blue line (the
horizontal line just below the sparkle last RED line) on the spectrograph. The Formant line closest to
the F 0 is the first formant (F 1). The one immediately above the first formant is the second formant
(F 2) and so on.
The next three (3) panel indicate the “Phrase”, “Word” and “Tone” tiers of the recorded speech.
When the speech is annotated, the boundaries of each of these tiers are indicated by a tick blue vertical
line. Only the phrase tier has been annotated in Figure 2.3.
As depicted in the menu list at the top of the image, the Praat software allows you to carry
out a number of tasks and manipulation on recorded speech signals. The Praat software and help
documents can be downloaded from
http://www.fon.hum.uva.nl/praat/ ’

2.2.2 Task 1
1. Download the Praat software, install and explore its features, particularly those relating to
speech signal analysis.

2. Select any six words, with not more that three syllables, from the vocabulary of kinship terms
generated in Laboratory I (cf. Table 2.2). Record the English and those in the African indigen-
ous language of your choice. For example “Father” and “Baba” for English and Yorùbá speech.
You are expected to create two set of data samples; one with a FEMALE voice and the other
with a MALE voice. Each word must be recorded such that the speech signal is clear and clean
(no background noise).

3. Explore and study the waveforms corresponding to syllables in each of the words you recorded.

10
Figure 2.3: A Praat Screen shot of the phrase “Bàbá àgbè. ti ta kòkó”

4. Observe and extract the fundamental frequency F 0 in the speech signals of the first and last
syllables in each word. Discuss the average F 0 for the make and female voices. Also observe
the pattern of the first two formants, i.e. F 1 and F 2.

5. Observe and extract the third and fourth formants, i.e. F 3 and F 4. Discuss the average F 0 for
individual members of your Laboratory group.

6. Record at least two (2) isolated syllables that comprises any of the words in item 2 above and
discuss the features of the F 0 vis-a-vis the one in the word sample. HINT: study the beginning,
middle and end of the F 0 waveform.

7. Document experiments 1 to 6 as well as your reflections on your observations.

Table 2.2: Vocabulary of English kinship


1. Person 7. Mother 13. Daughter
2. Brother 8. Father 14. Niece
3. Sister 9. Inlaw 15. Parent
4. Wife 10. Grand parent 16. Child
5. Husband 11. Ancestor 17. Uncle
6. Son 12. Family 18. Nephew

11
CHAPTER 3

LABORATORY III: GRAMMAR OF LANGUAGE

As discussed in the lecture, the faculty of human language is outside the ambit of meaningful dis-
cussed. The activities of the faculty of language and the expression generated through the organs of
language is a happening. The object of study in Human language process is the habitual Instrument
of human communication. This is called the instrument of language in this course.
A theoretically based categorisation of these languages have been presented in the conventional
literature.

3.1 Theoretical treatment of language


The Noam Avram Chomsky Hierarchy reflects a certain order of complexity in the categorisation
of the instrument of language. There are Four (4) levels (see Table 3.1) in the Chomsky hierarchy
of language. These are: Type 0, Type 1, Type 2 and Type 3. Each level corresponds to a type of
language. The hierarchy is numbered in descending or complexity. Therefore, the Type 0 language
is the most complex while Type 3 is the simplest. This hierarchy is definitive because a language
cannot belong to two hierarchy. The language in each hierarchy is prescribed by a unique set of
criteria comprising Primitive-terms, axioms (Prime and auxiliary) as well as set of rules to which the
structure of a valid expression must conform. The Chomsky hierarchy is used in formal language
theory as well as in computing. Locating a languages in an hierarchy will assist in selecting the most
appropriate computational tools for its implementation.
The written language is an (crude) approximation of spoken language. Written language is a
discrete, consensual and/or formal tool for precise message encoding and communication. The object
of this experiment in this laboratory assignment is, therefore, different from the continuous spoken
language, which was the subject of Laboratory II. In this experiment, we will explore the formal
structure of language in terms of its element and how they are combined in the formation of the
structure of a valid expressions. We have discussed during lectures that the Grammar of a language

12
Table 3.1: Chomsky hierarchy

Type Language Description Rule Agency and Process

0 Infinitely re- Rules are used A → a∞ Turing Machine, Count-


cursively enu- to replace infinite able and infinite recurs-
merable instances. States ive
are outside the
ambit of material.

1 Context- Rules are applied A → a; B C Linearly Bounded Auto-


sensitive contextually on mata. Countable and Se-
variables. States lectively finite
are outside the
ambit of symbols.

A → b; C D (Non-monotonic logic)

2 Context- Rules are applied A → BC Push-Down Auto-


Neutral universally on mata (Algorithmic and
constant. States memory) (Monotonic
are variables logic)

3 Regular Rules are applied A → aB; Finite Automata (Pars-


on primitive- A → a; ing and Register)
term. States are A → Ba
constants

defines its structure.

DEFINITION: Grammar is the symbolic formulation of the structure of a language in


terms of its elements and how they are connected in the representation of valid sentences.

Formally, we can define a Grammar as a Four(4): Tuple

G ::= hΣ, V, P, Si

where:

Σ Is the alphabet of the grammar. It is also called the set of Terminal symbols.

V Is the set of non-terminal symbols or variables.

P Is a set of rules call Production. It is written in the form A → b meaning that A can be re-written
as b. It is also called the set of re-written rules.

S Is the Start symbol. It is used to indicate the beginning of an expression (or sentence).

13
The type of production determines the grammar which in turn determines the language type.
Most of the grammar we shall be working with in this course will be in two categories: (i) linear
and (ii) non-linear.

3.1.1 Regular grammar


A regular language finds expression through one-dimensional (linear) structure. Regular language
is formulated with Type 3 grammar in the Chomsky Hierarchy. The state of an agency of regular
grammar is realised through register and instances in its transition are realised using parser. Imple-
mentation of agency of regular language is realised as Finite State Automaton such as: (i) Finite
state acceptor, (ii) Finite state generator and (iii) Finite state Transducer. Rules for formulating the
grammar for regular language includes:

(i.) A → Ba, (Right linear/ Generator rule)

(ii.) A → Aa, (Recursive generator rule)

(iii.) A → a, (Terminal rule)

(iv.) A → aB. (Left linear/ Acceptor rule)

(v.) A → aA. (Recursive acceptor rule)

3.1.2 Context-neutral grammar


A context-neutral language finds expression through two-dimensional (non-linear) structure. Context-
neutral language is formulated with Type 2 grammar in the Chomsky Hierarchy. The instances of state
of an agency of context-neutral language is realised through memory and instance of its transition are
realised using algorithm. Simple memory implementation of context-neutral agency is realised using
Push-down Automata. Typical rules for formulating this class of grammar includes:

(i.) A → BC (Context-neutral)

(ii.) A → AB (Recursive context-neutral)

In the Chomsky hierarchy, the language formulated with context-neutral grammar is more power-
ful than that formulated with regular grammar. This implies that the computing process that can be
expressed (i.e. strings that can be generated) by an agency of regular grammar can also be processed
by an agency context-neutral language. In the formulation of the instrument of language presented
during this course, regular language is subsumed in context-neutral language.

3.2 Grammatical formulation of written human language


As discussed in the lecture, there is no universal set of criteria and/or rules to which the human
spoken language subscribed. Therefore, there is no universe grammar with which the spoken human

14
language can be formulated and expressed in human language. A fundamental criteria for the formu-
lation of a grammar is an alphabet. There is no alphabet in the spoken human language. The set of
letters ascribed to the sounds in spoken human language is for the purpose of creating an orthography
(written language).
Therefore, the idea of a grammar, as used in this course, applies to the written human language
only. An expression in a written language comprises (i) Assertion and (ii) Relation. Assertion are
ascribed to instances in the universe of discourse of the expression. A relation establishes the logical
connection between two (2) or more instances with which an assertion had been ascribed. A relation
can also give expression to the transition reckoned in the universe of discourse.
A sentence is the Prime-axiom of language expression. A sentence comprises any, or all, of the
following three (3) items:

(i.) The agency of an action (subject) (Onı́s.e);

(ii.) The action performed (verb) (Ís.e);

(iii.) The object or agency that suffered the action (object) (Elés.e);

For example, in the sentence:

Ade carried the chair.

Ade is the agency of the action. The action performed is Carry and the Chair is the object that
suffered the action. In this expression, Ade is also called the Subject and the Chair the Object while
Carry is the Verb. This is the basis of the (SVO) formulation ascribed to the structure of valid
expression in the English language.

Subject-Verb-Object

Computationally, the verb is the relation which serves the role of an operator or function. Relation
is an operation (verb) that an agency gives expression to by manipulating operands (nouns). In this
case the subject and object are the operands.
Based on the above analysis, the structure of a valid sentence can be formulated as composing a
Subject followed by a Verb followed by an Object.

Sentence = Subject + V erb + Object


The + operator in the above formulation indicates the string concatenation operation. Consider-
ing that the agency is the major operand, the formulation above can be codified as follows:

V erb(Subject, Object)

Carry(Ade, chair)
NOTE that, SVO structure is NOT universe to all instrument of human languages. Some lan-
guages, such as Bambara, use VSO structure. Indeed, the SVO structure is not strictly observed in

15
most human language expression. Most sentences in the English and Yorùbá languages conforms
with the SVO structure. However, there are differences in the treatment of assertion placement in the
structure of valid expression.
The above English language sentence can be translated into Yorùbá as:

Adé gbé àga náà.

It can be computationally represented as:

Gbé(Adé, àga)

Note that, whereas the object, that is “Chair”, is located at the end in the structure of English sentence,
the Yorùbá equivalent “Àga” is not.

3.3 Phrase structure grammar modelling


The P hrase structure grammar (PSG) provides a useful computational tool for modelling simplified
instances of the written human language. As stated above, PSG corresponds to level Two (2) in the
Chomsky hierarchy.
A phrase structure grammar G is defined by a Four (4) tuple as:

G = hVT , VN , S, P i

The symbols in this formulation are further explained in the context of the definition of the in-
strument of human language as discussed during our lectures.

3.3.1 The VT symbol


VT is the set of symbols representing instances of Primitive-term in the universe of discourse to
which the instrument of language gives expression. VT comprises finite instance of terminal symbols.
Each terminal symbol is unique, atomic and immutable. What this implies is that a terminal symbol
CANNOT be further expanded or reduced to smaller forms. VT constitute the alphabet of written
language. VT constitutes the words in spoken language.
VT is the alphabet of a written language. Once VT has been defined for a written language, no
new symbol can be added to the set. Also no symbol can be removed from the set. Every instance
of Primitive-term remain as is the grammar of a language. Instance of Primitive-term are the leaf of
a tree representation of an expression. This corresponds to the alphabet of the language formulated
by the grammar. The simplest representation of an expression is obtained with symbols drawn from
Primitive-term. As discussed during lecture, Primitive-term are self-evidence by virtue of sensory
experience. Sensorily accessible self-evidence is structural and definite.

16
3.3.2 The VN symbol
VN is the finite set of symbols representing instances of Auxiliary axioms. Each symbol corresponds
to a sub-expression (sub-sentence) in the language formulated with the grammar. Each Auxiliary-
axiom symbol can be further expanded or reduced into simpler composition of with simpler Auxiliary-
axioms and/or Primitive-terms. The simplest expansion of an Auxiliary-axiom is obtained with sym-
bols drawn from Primitive-term. Auxiliary-axioms the string, sub-string, pattern, sub-pattern and
constant in the grammatical formulation of regular language. Auxiliary-axioms the variables and
part-of speech such as verb, noun, phase, adjective, adverb, determinant in the grammatical formu-
lation of context-neutral languages. Auxiliary-axioms form the stem of a tree representation of an
expression.

3.3.3 The S symbol


S is the symbol representing the distinguished Prime-axiom of a language expression. A Prime-
axiom is an axiom that has no axiom. The symbol representing the Prime-axiom corresponds to an
whole expression in the language formulated with the grammar. The Prime-axiom can be further sim-
plified, expanded or reduced using Auxiliary-axioms. The sentence is the Prime-axiom in language
expression. Prime-axiom form the root in a tree representation of an expression of the language
modelled by the grammar. As discussed during lecture, Prime-axioms are self-evidence by virtue of
self-reflection. Not that self-reflection is logical and polar, not sensory and structural.

3.3.4 The P symbol


P is the symbol that represents the finite set of rewrite rules. It is also called the productions of
the grammar. It comprises the rules that define how Primitive-term symbol can be to replaced or
expand (or re-written) Auxiliary-axioms. Some rules can also define how instances of symbols of
Auxiliary-axiom can be used to replaced or expand (or re-written) the distinguish Prime-axiom. Rule
are written as α → β . Where: (i) α is S in a rule that expands the Prime-axiom or a symbol drawn
from VN : (ii) β is a symbol drawn from VN ∪ VT . Note that S = α in a Prime-rule and β = VT in
Terminal-rules.
If G is a phrase structure grammar, then L(G) is the language generated by the phrase structure
grammar G. L(G) is all the possible valid-structure that can be generated with G. An expression
is admissible in a language if its structure valid. A valid-structure is that which conforms to the
grammar G of a language. A context-neutral expression is admissible when its sentence encodes
correct logic.
The grammar in Table 3.2 is the Backus Naur Form (BNF) representation of a simple grammar
for modelling English expressions. We adopt the modified BNF format in this course. In that format:

1. An instance Prime-axiom is represented with all capital letter. For example: hDAT Ai.

2. An instance of Auxiliary-axiom is represented first letter capital: For example as hDatai.

17
3. An instance Primitive-term is represented with small letter except for digit which is represented
as is. For example a.

4. An instance of rule is represented as hBi ::⇒ b . The symbols ::⇒ implies “can be rewritten
as”.

5. The option or choice notation is represented with a vertical line, that is |. For example,
B ::⇒ b|q implies, B can be replaced by b or q in an expression.

6. The concatenation operation is implied in the consecutive arrangement of symbols. For ex-
ample hBi ::⇒ hCihDi implies that the axiom B can be written as a C followed by a D.

3.4 Simple Grammar Checker


Table 3.2 contains the grammar of simple English. Rule 1 states that, a sentence is composed of
a noun phrase followed by a verb phrase. Rule 2 states that a noun phrase is either a noun or a
determiner followed by a noun. Rule 3 states that a verb phrase is either a verb or a verb followed by
a noun phrase. The other rules state that instances in the set of object that constitutes a noun, a verb
and a determiner.
A corresponding simple Yorùbá grammar is shown in Table 3.3. Look carefully at Rule 2.

Table 3.2: Simple English Grammar


No. Production
1 hSENTENCEi ::= hN oun phraseihV erb phrasei
2 hN oun phrasei ::= hN ouni|hdeterminerihN ouni
3 hV erb phrasei ::= hV erbi|hV erbihN oun phracei
4 hN ouni ::= {List of all noun}
5 hV erbi ::= {List of all verb}
6 hDetermineri ::= {List of all determiner}

Table 3.3: Simple Yorùbá Grammar


No. Production
1 hSENTENCEi ::= hN oun phraseihV erb phrasei
2 hN oun phrasei ::= hN ouni|hN ounihDetermineri
3 hV erb phrasei ::= hV erbi|hV erbihN oun phracei
4 hN ouni ::= {List of all noun}
5 hV erbi ::= {List of all verb}
6 hDetermineri ::= {List of all determiner}

18
3.4.1 Task 1
Compose any Six (6) sentences from the kinship vocabulary you generated in Laboratory II (c.f.
Table 2.2). Each member of the group should generate a sentence. Your sentence should have the
following features:

1. Each sentence should have not more than six words.

2. Use declarative sentences only.

3. Using the above grammar, analyse and discuss the six (6) English sentences using parse trees.

4. Using the NLTK grammar tool in Python, explore the correctness of the English language
sentences.

3.4.2 Task 2
1. Based on the indigenous African language selected in Laboratory II, discuss the grammar for
sentence formulation in Table 3.2.

2. Repeat the tasks you executed for the English language using the indigenous language selected.

3. Illustrate cases of imprecision and inconsistency based on your data.

4. Discuss your observation and reflections on the grammars and processes of the two languages.

3.4.3 Task 3
Using the Python programming language (you could use NLTK toolkit):

1. Develop a software for checking the correctness of English sentences, based on the grammar
defined above.

2. Develop a software for checking the correctness of the indigenous language your selected,
based on the grammar you defined above.

3. Test your system with at least six examples of correct and incorrect sentences. Your evaluation
should be limited to the database generated in Laboratory II. Observe and document the kind
of sentences that your system will fail to correctly it grammar.

4. Discuss your observation and reflections how the laboratory activity will inform the develop-
ment of a translation system between English and the indigenous language you chose.

19
3.4.4 Task 4
Review the ELIZA chatbot in Laboratory II.

1. Design and implement a chatbot based on the English data you collected in Laboratory II.

2. Your chatbot machine should be able to answer question such as “Who is a father?”. If the
machine responded “A male parent”, you should be able to follow with the question “Who is a
parent”, and so forth.

3. Repeat the above experiment for your indigenous language of choice.

4. Reflect on and explain your observation in respect the systems you developed and ELIZA.

20
3.5 Bibliography
1. Searle, J. (1999) Mind, Language, and Society: Doing Philosophy in the Real World, Weiden-
feld & Nicolson, London, 1999.

2. Campbell, J. (1982) Grammatical Man: Information, Entropy, Language, and Life. Simon and
Schuster, New York.

3. H. L. Dreyfus and S. E. Dreyfus(2004) From Socrates to Expert Systems: The Limits and
Dangers of Calculative Rationality The URL:http://socrates.berkeley.edu/
hdreyfus/html/paper socrates.html (Last visited July, 2011)

4. A. M. Turing(1995) Computing Machinery and Intelligence, in E. A. Feigenbaum and J. Feld-


man (Eds.) Computer & Thought, AAAI Press The MIT Press, London

5. T. Munakata(2007) Beyond Silicon: New Computing Paradigms, Communication of the ACM,


vol. 50, No. 9, pp30–34.

6. E. Cambria and B. White (2014) Jumping NLP Curves: A Review of Natural Language Pro-
cessing Research; IEEE Computational Intelligence Magazine, May 2014, pp. 48–57.

7. Boroditsky, L. (2011) How Language Shapes Thought: The languages we speak affect our
perceptions of the world Scientific American, pp 63–65.

8. Chomsky, N. (1956) Three Models for the description of language, IEEE Trans. on Info. The-
ory, Vol. 2, No. 3, pp.113–124.

9. Jäger, Gerhard and Roger, J. (2012) Formal language theory: refining the Chomsky hierarchy,
Phil. Trans. Research Soc., Vol. 367, pp. 1956–1970

10. Crystal, D. (2000) Language death, Cambridge university Press, ISBN: 978-0-521-01271-3

11. Joshi, A.J.(1991) Natural Language Processing, Science, New Series, Vol. 253, N0. 5025, pp.
1242–1249

21

You might also like