0% found this document useful (0 votes)
38 views

Unit - 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Unit - 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A context-free grammar (CFG) is a list of rules that define the set of all well-formed sentences

in a language. Each rule has a left-hand side, which identifies a syntactic category, and a right-
hand side, which defines its alternative component parts, reading from left to right.
E.g., the rule s --> np vp means that "a sentence is defined as a noun phrase followed
by a verb phrase." Figure 1 shows a simple CFG that describes the sentences from a
small subset of English.

A sentence in the language defined by a CFG is a series of words that can be derived by
systematically applying the rules, beginning with a rule that has S on its left-hand side. A
parse of the sentence is a series of rule applications in which a syntactic category is replaced
by the right-hand side of a rule that has that category on its left-hand side, and the final rule
application yields the sentence itself. E.g., a parse of the sentence "the giraffe dreams" is:
S => np vp
=> det n vp
=> the n vp
=> the giraffe vp
=> the giraffe iv
=> the giraffe dreams
A convenient way to describe a parse is to show its parse tree, which is simply a graphical
display of the parse. Figure shows a parse tree for the sentence "the giraffe dreams". Note that
the root of every subtree has a grammatical category that appears on the left-hand side of a rule,
and the children of that root are identical to the elements on the right-hand side of that rule.
Dependency Grammar
Traditionally, a dependency grammar belongs to the class of grammars that
emphasize words rather than constituents. Grammars that are built primarily
on constituents are known as phrase structure grammars. Phrase structure grammars are
thus constituent-based, while dependency grammars are word-based.
While phrase structure grammars see sentences and clauses structured in terms of constituents,
dependency grammars assume that sentence and clause structure derives from dependency
relationships between words. The difference is illustrated in the trees below:
Tree (or phrase marker) (a) shows what a traditional phrase structure grammar would view
as the structure of the sentence They killed the man with a gun. One can see, for instance, that
the preposition with forms a prepositional phrase with the noun phrase a gun. Compare this to
the dependency tree (b): here the preposition dominates the noun gun, which in turn dominates
the article a.
When one compares the number of nodes across the two trees, one finds that (a) contains 12
nodes, while (b) only contains 7 nodes. On the assumption that two representations convey
the same utterance, the one that does so without explanatory deficiencies is called
more minimal with respect to the other. In general, dependency grammars are
more minimal than phrase structure grammars because they assume less structure.
A second point to acknowledge is that the term grammar can have two meanings; one
meaning is rather general and refers to how linguistic units are structured. In this broader
sense, grammar is a hyperonym to syntax, morphology, and phonology. The narrow
meaning refers only to syntax. The word grammar in dependency grammar is traditionally
understood in the narrow sense, i.e. dependency grammars are theories of syntax, but not
theories of morphology or phonology. The motivation for understanding dependency
grammars as dependency syntaxes comes from their being word-based. Historically,
dependency grammars have struggled to establish themselves in the broader sense
of grammar.
The grammar explained here differs from traditional dependency grammars insofar as it
is catena-based. Since catenae operate in syntax as well as in morphology, a catena-based
dependency grammar can hope to become a grammar in the broader sense. Even though the
ultimate aim is to see the distinction between syntax and morphology as a continuum, the
discussion here will branch out into a syntax section and a morphology section.
The section on core notions has established the concept of the catena, and how it relates
to strings, components, and constituents. The purpose of this section is to show
that catenae play an important role in many phenomena seen as central in syntax research.
These areas include displacement, idiom formation, ellipsis, predicate structure,
and constructions. These phenomena will be briefly addressed in turn. Prior to this, several
other important terms will be introduced.
The terms string, catena, component, and constituent have already been defined. In addition
to these, the following terms are also necessary: root, head, dependent, governor,
and governee.
string: a word or combination of words that is
continuous with respect to precedence
catena: a word or combination of words that is
continuous with respect to dominance
component: a word or combination of words that is a
string and a catena
constituent: a component that is complete
root: the one word in a given catena that is not
dominated by any other word in that catena
head: the one word that immediately dominates a given
catena
dependent: a constituent that is immediately dominated by a
given word
governor: the one word that licenses the appearance of a
given catena
governee: a catena the appearance of which is licensed by
a given word

A context free grammar (CFG) is in Chomsky Normal Form (CNF) if all production rules
satisfy one of the following conditions:
 A non-terminal generating a terminal (e.g.; X->x)
 A non-terminal generating two non-terminals (e.g.; X->YZ)
 Start symbol generating ε. (e.g.; S-> ε)
Consider the following grammars,
G1 = {S->a, S->AZ, A->a, Z->z}
G2 = {S->a, S->aZ, Z->a}
The grammar G1 is in CNF as production rules satisfy the rules specified for CNF. However,
the grammar G2 is not in CNF as the production rule S->aZ contains terminal followed by
non-terminal which does not satisfy the rules specified for CNF.
Note –
 For a given grammar, there can be more than one CNF.
 CNF produces the same language as generated by CFG.
 CNF is used as a preprocessing step for many algorithms for CFG like CYK(membership
algo), bottom-up parsers etc.
 For generating string w of length ‘n’ requires ‘2n-1’ production or steps in CNF.
 Any Context free Grammar that do not have ε in it’s language has an equivalent CNF.
How to convert CFG to CNF?
Step 1. Eliminate start symbol from RHS.
If start symbol S is at the RHS of any production in the grammar, create a new production
as:
S0->S
where S0 is the new start symbol.
Step 2. Eliminate null, unit and useless productions.
If CFG contains null, unit or useless production rules, eliminate them. Step 3. Eliminate
terminals from RHS if they exist with other terminals or non-terminals. e.g,; production rule
X->xY can be decomposed as:
X->ZY
Z->x
Step 4. Eliminate RHS with more than two non-terminals.
e.g,; production rule X->XYZ can be decomposed as:
X->PZ
P->XY
Example – Let us take an example to convert CFG to CNF. Consider the given grammar G1:
S → ASB
A → aAS|a|ε
B → SbS|A|bb
Step 1. As start symbol S appears on the RHS, we will create a new production rule S0->S.
Therefore, the grammar will become:
S0->S
S → ASB
A → aAS|a|ε
B → SbS|A|bb
Step 2. As grammar contains null production A-> ε, its removal from the grammar yields:
S0->S
S → ASB|SB
A → aAS|aS|a
B → SbS| A|ε|bb
Now, it creates null production B→ ε, its removal from the grammar yields:
S0->S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS| A|bb
Now, it creates unit production B->A, its removal from the grammar yields:
S0->S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
Also, removal of unit production S0->S from grammar yields:
S0-> AS|ASB| SB| S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
Also, removal of unit production S->S and S0->S from grammar yields:
S0-> AS|ASB| SB
S → AS|ASB| SB
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
Step 3. In production rule A->aAS |aS and B-> SbS|aAS|aS, terminals a and b exist on RHS
with non-terminates. Removing them from RHS:
S0-> AS|ASB| SB
S → AS|ASB| SB
A → XAS|XS|a
B → SYS|bb|XAS|XS|a
X →a
Y→b
Also, B->bb can’t be part of CNF, removing it from grammar yields:
S0-> AS|ASB| SB
S → AS|ASB| SB
A → XAS|XS|a
B → SYS|VV|XAS|XS|a
X→a
Y→b
V→b
Step 4: In production rule S0->ASB, RHS has more than two symbols, removing it from
grammar yields:
S0-> AS|PB| SB
S → AS|ASB| SB
A → XAS|XS|a
B → SYS|VV|XAS|XS|a
X→a
Y→b
V→b
P → AS
Similarly, S->ASB has more than two symbols, removing it from grammar yields:
S0-> AS|PB| SB
S → AS|QB| SB
A → XAS|XS|a
B → SYS|VV|XAS|XS|a
X→a
Y→b
V→b
P → AS
Q → AS
Similarly, A->XAS has more than two symbols, removing it from grammar yields:
S0-> AS|PB| SB
S → AS|QB| SB
A → RS|XS|a
B → SYS|VV|XAS|XS|a
X→a
Y→b
V→b
P → AS
Q → AS
R → XA
Similarly, B->SYS has more than two symbols, removing it from grammar yields:
S0 -> AS|PB| SB
S → AS|QB| SB
A → RS|XS|a
B → TS|VV|XAS|XS|a
X→a
Y→b
V→b
P → AS
Q → AS
R → XA
T → SY
Similarly, B->XAX has more than two symbols, removing it from grammar yields:
S0-> AS|PB| SB
S → AS|QB| SB
A → RS|XS|a
B → TS|VV|US|XS|a
X→a
Y→b
V→b
P → AS
Q → AS
R → XA
T → SY
U → XA
So this is the required CNF for given grammar.

Since Natural language can be open to multiple interpretations at times, this would pass on to
the computers who will try to understand the natural language input given to them. Often, it
can be difficult to fully understand a sentence when we are not given enough context or if there
is poor grammar.

In this article we will be going over many different types of ambiguities that are found in NLP.

Part Of Speech (POS) Tagging Ambiguity

POS tagging refers to the process of classifying words in a text to a part of speech - whether
the word is a verb, noun, etc. Often, you will find that the same word can take on multiple
classifications for its part of speech depending on how the sentence is constructed. For
example, it is quite common to see words that can be used both as a verb or a noun −

 I need to mail my friend the files. (Verb)


 I need to find the mail that was sent to me. (Noun)
Structural Ambiguity

This ambiguity arises because the same exact sentence can be interpreted differently based on
how the sentence is parsed. Take the following sentence −

The boy kicked the ball in his jeans.

This sentence can be construed as the boy either kicking the ball while wearing his jeans, or
kicking the ball while the ball was in the jeans. This depends on how the sentence is parsed.

Scope Ambiguity

Here we look at ambiguities that occur due to quantifiers. Taking a look back at some math
logic terminology, or just basic grammar, we know that words like ‘every’ and ‘any’ would
come to mind.

Take the following sentence −

All students learn a programming language.

This sentence, due to the scope created with the sequential use of quantifiers ‘all’ followed by
‘a’, can have two different meanings. The two meanings are that −

 The first is that all students learn the same programming language.
 They all learn a language that doesn’t have to be the same one.
Lexical Ambiguity

Certain words have the property that they can have multiple different meanings. There are two
forms of lexical ambiguity that exist: Polysemy and Homonymy.
Polysemy − This is when two words are the same but have a different meaning depending on
the usage, i.e the word Foot. Foot can describe the body part, or the foot of the building.
Essentially, you are describing the base of something with the word foot.

Homonym − This occurs when a word has the same spelling or pronunciation, but has different
meanings overall. While superficially the same they are completely different in meaning. The
word bass for example can be referring to the musical instrument, or a type of fish. Another
example, which is given here to clarify that not just spelling but pronunciation is important too,
is horse and hoarse. These two have similar pronunciations, but horse refers to the animal and
hoarse refers to a sore throat.

Semantic Ambiguity

Now, instead of a word having multiple meanings, sentences can have multiple meanings
depending on the context. For example, the sentence “He ate the burnt lasagna and pie” could
mean one of two things −

 That the lasagna was burnt and the pie wasn’t.


 That both were burnt.

Lexical ambiguity can be deemed a subtype of semantic ambiguity.

Referential Ambiguity

Referential ambiguity occurs when a phrase can have multiple interpretations due to the use of
multiple objects and the referencing not being clear. For example, take the sentence −

I looked at Michelle with the telescope.

This can mean two things depending on who has the telescope.

 Michelle herself was carrying the telescope.


 The person saying the sentence was using a telescope to see Michelle.
Anaphoric Ambiguity

Here we have a loosely similar ambiguity to referential ambiguity, but more fixated on
pronouns . The use of pronouns can cause some confusion if there are multiple people being
mentioned in a sentence. Take the following sentence −

Michelle told Romany that she ate the cake.

Now, from the sentence alone it is not exactly clear whether ‘she’ is referring to Michelle or
Romany.

Parsing and its relevance in NLP

The word ‘Parsing’ whose origin is from Latin word ‘pars’ (which means ‘part’), is used to
draw exact meaning or dictionary meaning from the text. It is also called Syntactic analysis or
syntax analysis. Comparing the rules of formal grammar, syntax analysis checks the text for
meaningfulness. The sentence like “Give me hot ice-cream”, for example, would be rejected
by parser or syntactic analyzer.

In this sense, we can define parsing or syntactic analysis or syntax analysis as follows −

It may be defined as the process of analyzing the strings of symbols in natural language
conforming to the rules of formal grammar.

We can understand the relevance of parsing in NLP with the help of following points −

 Parser is used to report any syntax error.


 It helps to recover from commonly occurring error so that the processing of the
remainder of program can be continued.
 Parse tree is created with the help of a parser.
 Parser is used to create symbol table, which plays an important role in NLP.
 Parser is also used to produce intermediate representations (IR).
Deep Vs Shallow Parsing
Deep Parsing Shallow Parsing

In deep parsing, the search strategy will


It is the task of parsing a limited part of the
give a complete syntactic structure to a
syntactic information from the given task.
sentence.

It can be used for less complex NLP


It is suitable for complex NLP applications.
applications.

Dialogue systems and summarization are Information extraction and text mining are
the examples of NLP applications where the examples of NLP applications where
deep parsing is used. deep parsing is used.

It is also called full parsing. It is also called chunking.


Various types of parsers

As discussed, a parser is basically a procedural interpretation of grammar. It finds an optimal


tree for the given sentence after searching through the space of a variety of trees. Let us see
some of the available parsers below −

Recursive descent parser

Recursive descent parsing is one of the most straightforward forms of parsing. Following are
some important points about recursive descent parser −

 It follows a top down process.


 It attempts to verify that the syntax of the input stream is correct or not.
 It reads the input sentence from left to right.
 One necessary operation for recursive descent parser is to read characters from the input
stream and matching them with the terminals from the grammar.
Shift-reduce parser

Following are some important points about shift-reduce parser −

 It follows a simple bottom-up process.


 It tries to find a sequence of words and phrases that correspond to the right-hand side
of a grammar production and replaces them with the left-hand side of the production.
 The above attempt to find a sequence of word continues until the whole sentence is
reduced.
 In other simple words, shift-reduce parser starts with the input symbol and tries to
construct the parser tree up to the start symbol.
Chart parser

Following are some important points about chart parser −

 It is mainly useful or suitable for ambiguous grammars, including grammars of natural


languages.
 It applies dynamic programing to the parsing problems.
 Because of dynamic programing, partial hypothesized results are stored in a structure
called a ‘chart’.
 The ‘chart’ can also be re-used.
Regexp parser

Regexp parsing is one of the mostly used parsing technique. Following are some important
points about Regexp parser −

 As the name implies, it uses a regular expression defined in the form of grammar on
top of a POS-tagged string.
 It basically uses these regular expressions to parse the input sentences and generate a
parse tree out of this.
Cocke–Younger–Kasami (CYK) Algorithm



Grammar denotes the syntactical rules for conversation in natural language. But in the theory
of formal language, grammar is defined as a set of rules that can generate strings. The set of all
strings that can be generated from a grammar is called the language of the grammar.
Context Free Grammar:
We are given a Context Free Grammar G = (V, X, R, S) and a string w, where:
 V is a finite set of variables or non-terminal symbols,
 X is a finite set of terminal symbols,
 R is a finite set of rules,
 S is the start symbol, a distinct element of V, and
 V and X are assumed to be disjoint sets.
The Membership problem is defined as: Grammar G generates a language L(G). Is the given
string a member of L(G)?
Chomsky Normal Form:
A Context Free Grammar G is in Chomsky Normal Form (CNF) if each rule if each rule of G is
of the form:
 A –> BC, [ with at most two non-terminal symbols on the RHS ]
 A –> a, or [ one terminal symbol on the RHS ]
 S –> nullstring, [ null string ]
Cocke-Younger-Kasami Algorithm
It is used to solves the membership problem using a dynamic programming approach. The
algorithm is based on the principle that the solution to problem [i, j] can constructed from
solution to subproblem [i, k] and solution to sub problem [k, j]. The algorithm requires the
Grammar G to be in Chomsky Normal Form (CNF). Note that any Context-Free Grammar can
be systematically converted to CNF. This restriction is employed so that each problem can only
be divided into two subproblems and not more – to bound the time complexity.
How does the CYK Algorithm work?
For a string of length N, construct a table T of size N x N. Each cell in the table T[i, j] is the
set of all constituents that can produce the substring spanning from position i to j. The process
involves filling the table with the solutions to the subproblems encountered in the bottom-up
parsing process. Therefore, cells will be filled from left to right and bottom to top.
1 2 3
4 5

1 [1, 1] [1, 2] [1, 3] [1, 4] [1, 5]

2 [2, 2] [2, 3] [2, 4] [2, 5]

3 [3, 3] [3, 4] [3, 5]

4 [4, 4] [4, 5]
1 2 3
4 5

5 [5, 5]

In T[i, j], the row number i denotes the start index and the column number j denotes the end
index.

The algorithm considers every possible subsequence of letters and adds K to T[i, j] if the
sequence of letters starting from i to j can be generated from the non-terminal K. For
subsequences of length 2 and greater, it considers every possible partition of the subsequence
into two parts, and checks if there is a rule of the form A ? BC in the grammar where B and C
can generate the two parts respectively, based on already existing entries in T. The sentence
can be produced by the grammar only if the entire string is matched by the start symbol, i.e,
if S is a member of T[1, n].

Consider a sample grammar in Chomsky Normal Form:

NP --> Det | Nom


Nom --> AP | Nom
AP --> Adv | A
Det --> a | an
Adv --> very | extremely
AP --> heavy | orange | tall
A --> heavy | orange | tall | muscular
Nom --> book | orange | man

Now consider the phrase, “a very heavy orange book“:

a(1) very(2) heavy (3) orange(4) book(5)

Let us start filling up the table from left to right and bottom to top, according to the rules
described above:
1 2 3 4 5
a very heavy orange book

1
Det – – NP NP
a

2
Adv AP Nom Nom
very

3
A, AP Nom Nom
heavy

4
Nom, A, AP Nom
orange

5
Nom
book

The table is filled in the following manner:

1. T[1, 1] = {Det} as Det –> a is one of the rules of the grammar.


2. T[2, 2] = {Adv} as Adv –> very is one of the rules of the grammar.
3. T[1, 2] = {} as no matching rule is observed.
4. T[3, 3] = {A, AP} as A –> very and AP –> very are rules of the grammar.
5. T[2, 3] = {AP} as AP –> Adv (T[2, 2]) A (T[3, 3]) is a rule of the grammar.
6. T[1, 3] = {} as no matching rule is observed.
7. T[4, 4] = {Nom, A, AP} as Nom –> orange and A –> orange and AP –> orange are rules
of the grammar.
8. T[3, 4] = {Nom} as Nom –> AP (T[3, 3]) Nom (T[3, 4]) is a rule of the grammar.
9. T[2, 4] = {Nom} as Nom –> AP (T[2, 3]) Nom (T[4, 4]) is a rule of the grammar.
10. T[1, 4] = {NP} as NP –> Det (T[1, 1]) Nom (T[2, 4]) is a rule of the grammar.
11. T[5, 5] = {Nom} as Nom –> book is a rule of the grammar.
12. T[4, 5] = {Nom} as Nom –> AP (T[4, 4]) Nom (T[5, 5]) is a rule of the grammar.
13. T[3, 5] = {Nom} as Nom –> AP (T[3, 3]) Nom (T[4, 5]) is a rule of the grammar.
14. T[2, 5] = {Nom} as Nom –> AP (T[2, 3]) Nom (T[4, 5]) is a rule of the grammar.
15. T[1, 5] = {NP} as NP –> Det (T[1, 1]) Nom (T[2, 5]) is a rule of the grammar.
We see that T[1][5] has NP, the start symbol, which means that this phrase is a member of the
language of the grammar G.

The parse tree of this phrase would look like this:

In phrase structure grammars, such as generalised phrase structure grammar, head-driven


phrase structure grammar and lexical functional grammar, a feature structure is essentially a
set of attribute–value pairs. For example, the attribute named number might have the
value singular. The value of an attribute may be either atomic, e.g. the symbol singular, or
complex (most commonly a feature structure, but also a list or a set).
A feature structure can be represented as a directed acyclic graph (DAG), with the nodes
corresponding to the variable values and the paths to the variable names. Operations defined
on feature structures, e.g. unification, are used extensively in phrase structure grammars. In
most theories (e.g. HPSG), operations are strictly speaking defined over equations describing
feature structures and not over feature structures themselves, though feature structures are
usually used in informal exposition.
Often, feature structures are written like this:
Here there are the two features category and agreement. Category has the value noun
phrase whereas the value of agreement is indicated by another feature structure with the
features number and person being singular and third.
This particular notation is called attribute value matrix (AVM).
The matrix has two columns, one for the feature names and the other for the values. In this
sense a feature structure is a list of key-value pairs. The value might be atomic or another
feature structure. This leads to another notation for feature structures: the use of trees. In fact,
some systems (such as PATR-II) use S-expressions to represent feature structures.

You might also like