Morphological Parsing
Morphological Parsing
UNIT 2
Syntax: Formal Grammars of English - Word Level Analysis: Regular Expressions - Finite-
State Automata - Syntactic Analysis / Parsing: Context-free Grammar – Types of Parsing:
Morphological Parsing, Syntactic Parsing, Statistical Parsing, Probabilistic Parsing,
Constituency Parsing -Spelling Error Detection and correction - Words and Word classes-
Part-of Speech Tagging.
2.3 PARSING
The word ‘Parsing’ whose origin is from Latin word ‘pars’ (which means ‘part’), is used to
draw exact meaning or dictionary meaning from the text. It is also called Syntactic analysis
or syntax analysis.
Syntax analysis determines the syntactic structure of a text and checks the text for
meaningfulness, comparing the rules of formal grammar of the language.
Grammar is very essential and important to describe the syntactic structure of well-formed
programs. A mathematical model of grammar was given by Noam Chomsky in 1956, which
is effective for writing computer languages.
Mathematically, a grammar G can be formally written as a 4-tuple (N, T, S, P) where −
N or VN = set of non-terminal symbols, i.e., variables.
T or ∑ = set of terminal symbols.
S = Start symbol where S ∈ N
P denotes the Production rules for Terminals as well as Non-terminals. It has the form
α → β, where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN
Context free grammar, also called CFG, is a notation for describing languages and a superset
of Regular grammar. It can be seen in the following diagram −
CFG consists of finite set of grammar rules with the following four components −
Set of Non-terminals: It is denoted by V. The non-terminals are syntactic variables that
denote the sets of strings, which further help defining the language, generated by the
grammar.
Set of Terminals: It is also called tokens and defined by Σ. Strings are formed with the basic
symbols of terminals.
Set of Productions: It is denoted by P. The set defines how the terminals and non-terminals
can be combined. Every production(P) consists of non-terminals, an arrow, and terminals
(the sequence of terminals). Non-terminals are called the left side of the production and
terminals are called the right side of the production.
Start Symbol: The production begins from the start symbol. It is denoted by symbol S. Non-
terminal symbol is always designated as start symbol.
Regular Grammar
Regular grammars (RGs) are CFGs that generate regular languages. A regular grammar is
a CFG where productions are restricted to two forms, either A → a or A → aB,
where A, B ∈ NT and a ∈ T. Regular grammars are equivalent to regular expressions; they
encode precisely those languages that can be recognized by a DFA.
Notice that productions in a regular grammar have at most one nonterminal on the right-hand
side and that this nonterminal always occurs at the end of a production.
Constituency Grammar
Phrase structure grammar, introduced by Noam Chomsky, is based on the constituency
relation. That is why it is also called constituency grammar. It is opposite to dependency
grammar.
Dependency Grammar
Parse tree that uses Constituency grammar is called constituency-based parse tree; and the
parse trees that uses dependency grammar is called dependency-based parse tree.
Top-down Parsing
Bottom-up Parsing
Top-down Parsing: In this kind of parsing, the parser starts constructing the parse tree from
the start symbol and then tries to transform the start symbol to the input. The most common
form of topdown parsing uses recursive procedure to process the input. The main
disadvantage of recursive descent parsing is backtracking.
Bottom-up Parsing: In this kind of parsing, the parser starts with the input symbol and tries to
construct the parser tree up to the start symbol.
In deep parsing, the search strategy will give It is the task of parsing a limited part of the
a complete syntactic structure to a sentence. syntactic information from the given task.
It is suitable for complex NLP applications. It can be used for less complex NLP
applications.
Dialogue systems and summarization are Information extraction and text mining are
the examples of NLP applications where the examples of NLP applications where
deep parsing is used. deep parsing is used.
The process of deriving a string is called as derivation. A Parse tree / Derivation tree may
be defined as the graphical depiction of a derivation. The start symbol of derivation serves as
the root of the parse tree. In every parse tree, the leaf nodes are terminals and interior nodes
are non-terminals. A property of parse tree is that in-order traversal will produce the original
input string.
1. Leftmost Derivation
The process of deriving a string by expanding the leftmost non-terminal at each step is
called as leftmost derivation. i.e the input is scanned and replaced from left to the right.
2. Rightmost Derivation
The process of deriving a string by expanding the rightmost non-terminal at each step is
called as rightmost derivation. i.e the sentential form of an input is scanned and replaced
from right to left. The sentential form in this case is called the right-sentential form.
Notes
For unambiguous grammars, Leftmost derivation and Rightmost derivation represents
the same parse tree.
For ambiguous grammars, Leftmost derivation and Rightmost derivation represents
different parse trees.
In the example, the given grammar was unambiguous. That is why, leftmost derivation and
rightmost derivation represents the same parse tree.
Leftmost Derivation Tree = Rightmost Derivation Tree
Properties Of Parse Tree
Root node of a parse tree is the start symbol of the grammar.
Each leaf node of a parse tree represents a terminal symbol.
Each interior node of a parse tree represents a non-terminal symbol.
Parse tree is independent of the order in which the productions are used during
derivations.
Problem
Consider the grammar-
S → bB / aA
A → b / bS / aAA
B → a / aS / bBB
Solution
1. Leftmost Derivation
S → bB
→ bbBB (Using B → bBB)
→ bbaB (Using B → a)
→ bbaaS (Using B → aS)
→ bbaabB (Using S → bB)
→ bbaabaS (Using B → aS)
→ bbaababB (Using S → bB)
→ bbaababa (Using B → a)
2. Rightmost Derivation
S → bB
→ bbBB (Using B → bBB)
→ bbBaS (Using B → aS)
→ bbBabB (Using S → bB)
→ bbBabaS (Using B → aS)
→ bbBababB (Using S → bB)
→ bbBababa (Using B → a)
→ bbaababa (Using B → a)
3. Parse Tree
Whether we consider the leftmost derivation or rightmost derivation, we get the above
parse tree.
The reason is given grammar is unambiguous.
2.3.5 PARSER
The main roles of the parser include the following:
To report any syntax error.
To recover from commonly occurring error so that the processing of the remainder of
program can be continued.
To create parse tree.
To create symbol table.
To produce intermediate representations (IR).
A parser is basically a procedural interpretation of grammar. It finds an optimal tree for the
given sentence after searching through the space of a variety of trees.
i. Recursive descent Parser
ii. Shift-reduce Parser
iii. Chart Parser
iv. Regexp parser
v. Dependency Parser
vi. Morphological Parser
vii. Constituency Parser
Regexp parsing is one of the mostly used parsing technique. Following are some important
points about Regexp parser
As the name implies, it uses a regular expression defined in the form of grammar on
top of a POS-tagged string.
It basically uses these regular expressions to parse the input sentences and generate a
parse tree.
v. Dependency Parser
Dependency Parsing (DP) refers to examining the dependencies between the words of a
sentence to analyze its grammatical structure. Dependency parsing doesn’t make use of
phrasal constituents or sub-phrases. Instead, the syntax of the sentence is expressed in terms
of dependencies between words — that is, directed, typed edges between words in a graph.
More formally, a dependency parse tree is a graph G = (V, E) where the set of
vertices V contains the words in the sentence, and each edge in E connects two words. The
graph must satisfy three conditions:
Additionally, each edge in E has a type, which defines the grammatical relation that occurs
between the two words.
Let’s see what the previous example looks like if we perform dependency parsing:
As we can see, the result is completely different. With this approach, the root of the tree is the
verb of the sentence, and edges between words describe their relationships.
For example, the word “saw” has an outgoing edge of type nsubj to the word “I”, meaning
that “I” is the nominal subject of the verb “saw”. In this case, we say that “I” depends
on “saw”.
Morphemes are smallest meaning-bearing units. Example: We can break the word foxes into
two, fox and -es. We can see that the word foxes, is made up of two morphemes, one
is fox and other is -es.
Morphemes can be divided into two types
i. Stems
It is the core meaningful unit of a word. We can also say that it is the root of the word.
Example: In the word foxes, the stem is fox.
Affixes − As the name suggests, they add some additional meaning and grammatical
functions to the words. For example, in the word foxes, the affix is − es.
Affixes can also be divided into following four types −
o Prefixes − As the name suggests, prefixes precede the stem. For example, in
the word unbuckle, un is the prefix.
o Suffixes − As the name suggests, suffixes follow the stem. For example, in the
word cats, -s is the suffix.
o Infixes − As the name suggests, infixes are inserted inside the stem. For
example, the word cupful, can be pluralized as cupsful by using -s as the
infix.
o Circumfixes − They precede and follow the stem. There are very less
examples of circumfixes in English language. A very common example is ‘A-
ing’ where we can use -A precede and -ing follows the stem.
ii. Word Order
The order of the words would be decided by morphological parsing.
Morphology
Morphology is the study of the following:
The formation of words.
The origin of the words.
Grammatical forms of the words.
Use of prefixes and suffixes in the formation of words.
How parts-of-speech (PoS) of a language are formed.
Morphological parsing is the problem of recognizing that a word breaks down into smaller
meaningful units called “morphemes” producing some sort of linguistic structure for it.
Requirements for building a Morphological parser
Let us now see the requirements for building a morphological parser
Lexicon: This includes the list of stems and affixes along with the basic information about
them. For example, the information like whether the stem is Noun stem or Verb stem, etc.
Morphotactics: It is basically the model of morpheme ordering. In other sense, the model
explaining which classes of morphemes can follow other classes of morphemes inside a
word. For example, the morphotactic fact is that the English plural morpheme always follows
the noun rather than preceding it.
Orthographic rules: These spelling rules are used to model the changes occurring in a word.
For example, the rule of converting y to ie in word like city+s = cities not citys.
The goal of morphological parsing is to find out what morphemes a given word is built from.
For example, a morphological parser should be able to tell us that the word cats is the plural
form of the noun stem cat, and that the word mice is the plural form of the noun stem mouse.
So, given the string cats as input, a morphological parser should produce an output that looks
similar to cat N PL. Here are some more examples:
mouse mouse N SG
mice mouse N PL
foxes fox N PL
To get from the surface form of a word to its morphological analysis, we are going to proceed
in two steps.
Step 2: Use a lexicon of stems and affixes to look up the categories of the stems and the
meaning of the affixes.
cat + s will get mapped to cat NP PL
fox + s to fox N PL.
We will also find out now that foxe is not a legal stem. This tells us that
splitting foxes into foxe + s was actually an incorrect way of splitting foxes, which
should be discarded.
Note: For the word houses splitting it into house + s is correct.
Here is a picture illustrating the two steps of our morphological parser with some examples.
The constituency parse tree is based on the formalism of context-free grammars. In this type
of tree, the sentence is divided into constituents, that is, sub-phrases that belong to a specific
category in the grammar.
In English, for example, the phrases “a dog”, “a computer on the table” and “the nice sunset”
are all noun phrases, while “eat a pizza” and “go to the beach” are verb phrases.
The grammar provides a specification of how to build valid sentences, using a set of rules.
As an example, the rule VP V NP means that we can form a verb phrase (VP) using a verb
(V) and then a noun phrase (NP).
While we can use these rules to generate valid sentences, we can also apply them the other
way around, in order to extract the syntactical structure of a given sentence according to the
grammar.
Example of a constituency parse tree for the simple sentence, “I saw a fox”:
A constituency parse tree always contains the words of the sentence as its terminal
nodes. Usually, each word has a parent node containing its part-of-speech tag (noun,
adjective, verb, etc…), although this may be omitted in other graphical representations.
All the other non-terminal nodes represent the constituents of the sentence and are
usually one of verb phrase, noun phrase, or prepositional phrase (PP).
To sum things up, constituency parsing creates trees containing a syntactical representation of
a sentence, according to a context-free grammar. This representation is highly hierarchical
and divides the sentences into its single phrasal constituents.
Probabilistic parsing uses dynamic programming algorithms to compute the most likely
parse(s) of a given sentence, given a statistical model of the syntactic structure of a language.
CFG: A context free grammar consists of:
1. a set of non-terminal symbols N
2. a set of terminal symbols Σ (disjoint from N)
3. a set of productions, P, each of the form A → α, where A is a non-terminal and α is a string
of symbols from the infinite set of strings (Σ ∪ N)
4. a designated start symbol
Probabilistic CFGs / Stochastic Grammars (PCFGs)
Probabilistic CFGs Augments each rule in P with a conditional probability: A → β [p] where
p is the probability that the non-terminal A will be expanded to the sequence β.
Often referred to as P(A → β) or P(A → β|A).
Why are PCFGs useful?
• Assigns a probability to each parse tree T
• Useful in disambiguation – Choose the most likely parse – Computing the probability of a
parse If we make independence assumptions, P(T) = Qn∈T p(r(n)).
• Useful in language modeling tasks
Example:
Statistical parsing is the task of computing the most probable parse of a sentence given a
probabilistic (or weighted) context-free grammar (CFG). The weights of the probabilistic or
weighted CFG are typically learned on a corpus of texts.
Parsing natural language presents several challenges that don’t occur when parsing
programming languages. The reason for this is that natural language is often ambiguous,
meaning there can be multiple valid parse trees for the same sentence.
Let’s consider for a moment the sentence, “I shot an elephant in my pajamas”. It has two
possible interpretations: one where the man is wearing his pajamas while shooting the
elephant, and the other where the elephant is inside the man’s pajamas.
These are both valid from a syntactical perspective, but humans are able to solve these
ambiguities very quickly – and often unconsciously – since many of the possible
interpretations are unreasonable for their semantics or for the context in which the sentence
occurs. However, it’s not as easy for a parsing algorithm to select the most likely parse tree
with great accuracy.
To do this, most modern parsers use supervised machine learning models that are trained on
manually annotated data. Since the data is annotated with the correct parse trees, the model
will learn a bias towards more likely interpretations.
2.3.8 CONCLUSION
• Basic parsing approaches without constraints is not practical in real applications
• Whatever approach taken, bear in mind that the lexicon is the real bottleneck
• There’s a real trade-off between coverage and efficiency, so it’s a good idea to
sacrifice broad coverage (e.g. domain-specific parsers, controlled language), or use a
scheme that minimizes the disadvantages (e.g. probabilistic parsing)
– From computational perspective, a parser provides a formalism for writing
linguistic rules and an implementation which can apply the rules to an input
text
– An interface to allow grammar development and testing (eg tracing rules,
showing trees) and an interface with the application of which it is a part that
may be hidden to the end-user are necessary
• All of the above tailored to meet the needs
REFERENCES
1. https://www.gatevidyalay.com/parse-tree-derivations-automata/