Lecture 2 Hierarchy of NLP & TF-IDF
Lecture 2 Hierarchy of NLP & TF-IDF
MODULE III
Understanding Natural Languages
TF-IDF
2
3
4
5
6
7
8
9
10
11
Q1. Apply Bag of words (BOW) method on following sentences and convert
to the vector form:
Sentence 1: This movie is very scary and long
Sentence 2: This movie is not scary and is slow
Sentence 3: This movie is spooky and good
12
Topic Modeling
• Uncovering hidden structures in sets of
texts or documents.
• Groups texts to discover latent topics.
• Assumes each document consists of a
mixture of topics and that each topic
consists of a set of words.
13
Topic Modeling
(Example)
14
Parsing
• Breaking down a given sentence into its
grammatical constituents.
• Example:
• “Who won the cricket worldcup in 2019?”
• “The swift black cat jumps over the wall”
15
Part-of-speech (POS) tagging
16
Constituency parsing
• Need to identify and define commonly
seen grammatical patterns.
• Divide words into groups, called
constituents, based on their grammatical
role in the sentence.
• Example:
• ‘Amitian — read — an article on Syntactic
Analysis’
17
Dependency Parsing
• Dependencies are established between
words themselves.
• Example:
• ‘Amitians attend classes’
18
Co-reference resolution
• Coreference resolution is the task of
finding all expressions that refer to the
same entity in a text.
Example:Two entities as ‘Michael Cohen’
and ‘Mr. Trump’
19
Word sense
disambiguation
• NLP involves resolving different kinds of
ambiguity.
• A word can take different meanings
making it ambiguous to understand.
• Word sense disambiguation (WSD) means
selecting the correct word sense for a
particular word.
20
Word sense
disambiguation
• Example:
• The word “bank”. It can refer to a financial
institution or the land alongside a river.
• These different meanings are called word
senses.
• Context can be used effectively to perform
WSD.
21
Named entity
recognition
• Identification of named entities such as
persons, locations, organisations which
are denoted by proper nouns.
• Example:
• “Michael Jordan is a professor at
Berkeley.”
22
Context free grammars
• It is the grammar that consists rules with a
single symbol on the left-hand side of the
rewrite rules. Let us create grammar to
parse a sentence
• “The bird pecks the grains”
23
Context free grammars
24
Context free grammars
• The parse tree breaks down the sentence
into structured parts so that the computer
can easily understand and process it.
• In order for the parsing algorithm to
construct this parse tree, a set of rewrite
rules, which describe what tree structures
are legal, need to be constructed.
25
Context free grammars
• These rules say that a certain symbol may
be expanded in the tree by a sequence of
other symbols.
• According to first order logic rule, if there
are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string
combined by NP followed by VP is a
sentence.
26
Context free grammars
• The rewrite rules for the sentence are as
follows −
27
Context free grammars
• The parse tree can be created as shown −
28
Context free grammars
• Now consider the above rewrite rules.
Since V can be replaced by both, "peck" or
"pecks", sentences such as "The bird peck
the grains" can be wrongly permitted.
i. e. the subject-verb agreement error is
approved as correct.
29
Context free grammars
• Merit − The simplest style of grammar,
therefore widely used one.
• Demerits −
They are not highly precise. For example,
“The grains peck the bird”, is a
syntactically correct according to parser,
but even if it makes no sense, parser
takes it as a correct sentence.
30
Context free grammars
• Demerits
To bring out high precision, multiple sets of
grammar need to be prepared.
It may require a completely different sets
of rules for parsing singular and plural
variations, passive sentences, etc., which
can lead to creation of huge set of rules
that are unmanageable.
31
Transformational
Grammar
• These are the grammars in which the
sentence can be represented structurally
into two stages.
• Obtaining different structures from
sentences having the same meaning is
undesirable in language understanding
systems.
• Sentences with the same meaning should
always correspond to the same internal
knowledge structures. 32
Transformational
Grammar
• In one stage the basic structure of the
sentence is analyzed to determine the
grammatical constituent parts and in the
second stage just the vice versa of the first
one.
• This reveals the surface structure of the
sentence, the way the sentence is used in
speech or in writing.
33
Transformational Grammar
34
Transformational Grammar
35
• Both of the above sentences are two
different sentences but they have same
meaning.
• Thus it is an example of a transformational
grammar.
• These grammars were never widely used in
computational models of natural language.
• The applications of this grammar are
changing of voice (Active to Passive and
Passive to Active) change a question to
declarative form etc. 36
TRANSITION NETWORK
38
39
• The transition from N1 to N2 will be made if
an article is the first input symbol.
• If successful, state N2 is entered.
• The transition from N2 to N3 can be made if
a noun is found next.
• If successful, state N3 is entered.
• The transition from N3 to N4 can be made if
an auxiliary is found and so on.
40
• Suppose consider a sentence “A boy is
eating a banana”.
• So if the sentence is parsed in the above
transition network then, first ‘A’ is an
article.
• So successful transition to the node N1 to
N2. Then boy is a noun (so N2 to N3), “is” is
an auxiliary (N5 to N6) and finally “banana”
is a noun (N 6 to N7) is done successfully.
• So the above sentence is successfully
41
TYPES OF TRANSITION
NETWORK
• There are generally two types of transition
networks like
1.Recursive Transition networks (RTN)
2.Augmented Transition networks (ATN)
42
Recursive Transition Networks (RTN)
43
• It permits arc labels to refer to other
networks and they in turn may refer back
to the referring network rather than just
permitting word categories.
• It is a modified version of transition
network.
• It allows arc labels that refer to other
networks rather than word category.
44
Augmented Transition Network
(ATN)
• An ATN is a modified transition network.
• It is an extension of RTN.
• The ATN uses a topdown parsing
procedure to gather various types of
information to be later used for
understanding system.
• It produces the data structure suitable for
further processing and capable of storing
semantic details.
45
• An augmented transition network (ATN) is
a recursive transition network that can
perform tests and take actions during arc
transitions.
• An ATN uses a set of registers to store
information.
• A set of actions is defined for each arc and
the actions can look at and modify the
registers.
• An arc may have a test associated with it.
46
• The arc is traversed (and its action) is
taken only if the test succeeds.
• When a lexical arc is traversed, it is put in
a special variable (*) that keeps track of
the current word.
• The ATN was first used in LUNAR system.
• In ATN, the arc can have a further arbitrary
test and an arbitrary action.
47
The structure of ATN
48