NLP StudyMaterial
NLP StudyMaterial
Processing
CSA4006
Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4
Topic: Introduction
NLP
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction
between computers and humans through natural language. NLP enables computers to understand,
interpret, and generate human language in a way that is both meaningful and valuable.
Tokenization: The sentence is first tokenized into noun, verb, adjective). Here is an example
individual words and punctuation marks. In this case, of the sentence with POS tags:
the sentence is tokenized as follows: [("The", "DT"), ("quick", "JJ"), ("brown",
["The", "quick", "brown", "fox", "jumps", "over", "the", "JJ"), ("fox", "NN"), ("jumps",
"lazy", "dog", "."]
(S
(NP (DT The) (JJ quick) (JJ brown) (NN fox))
(VP (VBZ jumps) (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))))
16
(. .))
In this parse tree, "S" represents the sentence, "NP" represents a noun phrase, "VP" represents a verb
phrase, "DT" represents a determiner, "JJ" represents an adjective, "NN" represents a noun, "VBZ"
represents a verb, and "IN" represents a preposition. The tree structure captures the hierarchical
relationships between the words in the sentence.
Training Large corpus Parallel seq of input and Massive corpora using
target sequence self-supervised and then
fine tuned
Parallelism Inherently parallelizable Less parallelizable Highly parallelizable
19
Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4
Topic: Introduction
Applications of NLP
Communication With Machines
2. Morphology
Morphology deals with the internal structure of words and how they are formed from smaller units
11
called morphemes. Morphological analysis helps in tasks like stemming (reducing words to their
base form) and lemmatization (reducing words to their dictionary form).
3. Syntax
Syntax involves the rules governing the structure of sentences. It includes understanding how
words combine to form phrases and sentences, and the relationships between different parts of
speech. Parsing techniques are used to analyze sentence structure.
6. Discourse
Discourse refers to the structure and organization of connected text or speech. NLP systems at this level
consider how sentences relate to each other and form coherent paragraphs or dialogues. Coreference
resolution (identifying which words refer to the same entity) is an important task in discourse analysis.
1. Ambiguity
2. Scale
3. Sparsity
4. Variation
13
5. Expressivity
6. Unmodeled Variables
7. Unknown representations
15
These different meanings are caused by a number of ambiguities.
First, the words duck and her are morphologically or syntactically ambiguous in their part-of-
speech. Duck can be a verb or a noun, while her can be a dative pronoun or a possessive
pronoun. Second, the word make is semantically ambiguous; it can mean create or cook. Finally,
the verb make is syntactically ambiguous in a different way. Make can be transitive, that is, taking
a single direct object, or it can be ditransitive, that is, taking two objects, meaning that the first
object (her) was made into the second object (duck). Finally, make can take a direct object and a
verb, meaning that the object (her) was caused to perform the verbal action (duck). Furthermore, in
a spoken sentence, there is an even deeper kind of ambiguity; the first word could have been eye or
the second word maid.
CSA4006-Dr. Anirban Bhowmick
Ambiguity
We often introduce the models and
algorithms we present throughout the book
as ways to resolve or disambiguate these
ambiguities. For example, deciding whether
duck is a verb or a noun can be solved by
part-of-speech tagging. Deciding whether 16
make means “create” or “cook” can be
solved by word sense disambiguation.
Resolution of part-of-speech and word
sense ambiguities are two important kinds of
lexical disambiguation
Challenges of Scale
Data Collection: Gathering and annotating large-scale linguistic data is resource-intensive and time-
consuming. 17
Model Complexity: More data often leads to larger and more complex models, which may require
specialized hardware and efficient training techniques.
Noise and Quality: As datasets grow, ensuring data quality becomes crucial, as noise can negatively
impact model performance.
Long Tail Distribution: The frequency distribution of words follows a "long tail" pattern, where a few
common words appear frequently, while the majority of words occur rarely.
Named Entities: Entities like names, locations, dates, and specialized terms are sparse in most text data.
Word Combinations: The number of possible word combinations is astronomically large, but most of these
combinations are never observed in real-world text
Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4
10
What will happen if we try to use this tagger/parser for social media?
11
12
13
World knowledge
I dropped the glass on the floor and it broke
I dropped the hammer on the glass and it broke
Example: "She's as busy as a bee." Example: "He's the Einstein of our group."
In this metaphor, the phrase "busy as a bee" implies This expression assumes knowledge about 14
that she is very industrious, but this meaning is not who Einstein was and what he symbolizes.
directly related to bees being busy insects. A model lacking this cultural context might
miss the intended comparison.
Example: "Oh great, another flat tire!"
This statement might be used in a situation where
someone is frustrated about a recurring problem, and
the words imply sarcasm despite the literal words
expressing annoyance.
Imagine you needed to search a Phone number, “91-98765-43210”, we can do the same:
r’\d{2}-\d{5}-\d{5}’
18
Ranges [A-Z]
19
21
A range of numbers can also be specified; so /{n,m}/ specifies from n to m occurrences of the previous
char or expression, while /{n,}/ means at least n occurrences of the previous expression
’abc’
”abc”
a b c
’’’abc’’’
r’abc’
Strings Again
’abc\n’
”abc\n” a b c newline
’’’abc
’’’ }
r’abc\n’
a b c \ n
Why so many?
’ vs ” lets you put the other kind inside
’’’ lets you run across many lines
all 3 let you show “invisible” characters (via \n, \t, etc.)
r’...’ (raw strings) can’t do invisible stuff, but avoid problems
with backslash
open(’C:\new\text.dat’) vs
open(’C:\\new\\text.dat’) vs
open(r’C:\new\text.dat’)
RegExprs are
Widespread
• shell file name patterns (limited)
• unix utility “grep” and relatives
• try “man grep” in terminal window
• perl
• TextWrangler →
• Python
Patterns in Text
• Pattern-matching is frequently useful
• Identifier: A letter followed by >= 0 letters or digits.
6
Example
re.findall(r"[AG]{3,3}CATG[TC]{4,4}[AG]{2,2}C[AT]TG[CT][CG][TC]", myDNA)
RegExprs in Python
http://docs.python.org/library/re.html
Simple RegExpr Testing
>>> import re
>>> str1 = 'what foot or hand fell fastest'
>>> re.findall(r'f[a-z]*', str1)
['foot', 'fell', 'fastest'] Definitely
recommend trying
>>> str2 = "I lack e's successor" this with examples
>>> re.findall(r'f[a-z]*',str2) to follow, & more
[]
They’re strings
Most punctuation is special; needs to be
escaped by backslash (e.g., “\.” instead of “.”) to
get non-special behavior
So, “raw” string literals (r’C:\new\.txt’) are
generally recommended for regexprs
Unless you double your backslashes judiciously
Patterns “Match” Text
r’T[AG]T[^GC].T’‘ACGTTGTAATGGTATnCT’
Matching one of several alternatives
8
RegExpr Semantics, 3:
Concatenation, Or, Grouping
You can group subexpressions with parens
If R, S are RegExprs, then
RS matches the concatenation of strings matched
by R, S individually
R | S matches the union–either R or S
?
r’TAT(A.|.A)T’’TATCATGTATACTCCTATCCT’
RegExpr Semantics, 4
Repetition
If R is a RegExpr, then
R* matches 0 or more consecutive strings
(independently) matching R
R+ 1 or more
R{n} exactly n
R{m,n} any number between m and n, inclusive
R? 0 or 1
Beware precedence (* > concat > |) ?
r’TAT(A.|.A)*T’‘TATCATGTATACTATCACTATT’
RegExprs in Python
By default
Case sensitive, line-oriented (\n treated specially)
Matching is generally “greedy”
Finds longest version of earliest starting match
Next “findall()” match will not overlap
import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(
r"([a-zA-Z][a-zA-Z0-9]*)\.[a-zA-Z0-9]{3}")
#Finds skidoo.bar amidst 23skidoo.barber; ok?
match = myrule.findall(filecontents)
print match
Basics of regexp construction
7
Wild cards
• WARNING:
– backslash is special in Python strings
– It’s special again in regexps
– This means you need too many backslashes
– We will use ”raw strings” instead
– Raw strings look like r"ATCGGC"
9
Using . and backslash
hw.\....
10
Zero or more copies
11
Repeats
12
simple testing
>>> import re
>>> string = 'what foot or hand fell fastest'
>>> re.findall(r'f[a-z]*', string)
['foot', 'fell', 'fastest']
Practice problem 1
• Write a regexp that will match any string that starts with ”hum” and
ends with ”001” with any number of characters, including none, in
between
13
Practice problem 2
14
Using the regexp
import re
myrule = re.compile(r".+\.py")
print myrule
<_sre.SRE_Pattern object at 0xb7e3e5c0>
15
Using the regexp
mymatch = myrule.search(myDNA)
print mymatch
None
mymatch = myrule.search(someotherDNA)
print mymatch
<_sre.SRE_Match object at 0xb7df9170>
16
All of these objects! What can they do?
17
All of these objects! What can they do?
18
A practical example
import re
myrule = re.compile(r".+\.py")
mystring = "This contains two files, hw3.py and uppercase.py."
mymatch = myrule.search(mystring)
print mymatch.group()
This contains two files, hw3.py and uppercase.py
# not what I expected! Why?
19
Matching is greedy
• And it even matches ”This contains two files, hw3.py and uppercase.py”
20
A practical example
import re
myrule = re.compile(r"[^ ]+\.py")
mystring = "This contains two files, hw3.py and uppercase.py."
mymatch = myrule.search(mystring)
print mymatch.group()
hw3.py
allmymatches = myrule.findall(mystring)
print allmymatches
[’hw3.py’,’uppercase.py’]
21
Practice problem 3
• Print out a list of all the legal file names you find
22
Practice problem 4
• Create a regexp which detects legal Microsoft Word file names that do
not contain any numerals (0 through 9)
• Print out the start location of the first such filename you encounter
• Test it on testre.txt
23
Practice problem
• Create a regexp which detects legal Microsoft Word file names that do
not contain any numerals (0 through 9)
• Print out the “base name”, i.e., the file name after stripping of the .doc
extension, of each such filename you encounter. Hint: use parenthesized
sub patterns.
• Test it on testre.txt
24
Practice problem 1 solution
Write a regexp that will match any string that starts with ”hum” and ends
with ”001” with any number of characters, including none, in between
myrule = re.compile(r"hum.*001")
25
Practice problem 2 solution
myrule = re.compile(r".+\.py")
26
Practice problem 3 solution
Create a regexp which detects legal Microsoft Word file names, and use it
to make a list of them
import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(r"[^ ]+\.[dD][oO][cC]")
matchlist = myrule.findall(filecontents)
print matchlist
27
Practice problem 4 solution
Create a regexp which detects legal Microsoft Word file names which do
not contain any numerals, and print the location of the first such filename
you encounter
import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(r"[^ 0-9]+\.[dD][oO][cC]")
match = myrule.search(filecontents)
print match.start()
28
Regular expressions summary
• They are not essential to using Python, but are very useful
29
Natural Language
Processing
CSA4006
Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4
valid_emails =
validate_email_addresses(email_list)
print("Valid Email Addresses:")
for email in valid_emails:
print(email)
CSA4006-Dr. Anirban Bhowmick
Finite-state Automata
The regular expression is more than just a convenient
metalanguage for text searching. First, a regular
expression is one way of describing a finite- state
automaton (FSA). Finite-state automata are the
theoretical foundation of a good deal of the
computational work. Any regular expression can be
implemented as a finite-state automaton.
12
Symmetrically, any finite-state automaton can be
described with a regular expression. Second, a
regular expression is one way of characterizing a
particular kind of formal language called a regular
language. Both regular expressions and finite-state
automata can be used to describe regular languages.
A third equivalent method of characterizing the
regular languages, the regular grammar
Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4
Potential solutions:
• Save backup states at each choice point
• Look-ahead in the input before making choice
• Pursue alternatives in parallel
• Determinize our NFSAs (and then minimize)
13
The usefulness of an automaton for defining a language is that it can express an infinite
set (such as this one above) in a closed form. Formal languages are not the same as
natural languages, which are the kind of languages that real people speak. In fact, a
formal language may bear no resemblance at all to a real language (e.g., a formal
language can be used to model the different states of a soda machine). But we often use
a formal language to model part of a natural language, such as parts of the phonology,
morphology, or syntax. The term generative grammar is sometimes used in linguistics to
mean a grammar of a formal language; the origin of the term is this use of an automaton
to define a language by generating all possible strings
CSA4006-Dr. Anirban Bhowmick
Another Example
we can also
have a higher level alphabet consisting of words. In this way we can write finite-state automata
that model facts about word combinations. For example, suppose we wanted to build an FSA that
modeled the subpart of English dealing with amounts of money. Such a formal language would
model the subset of English consisting of phrases like ten cents, three dollars, one dollar thirty-five
cents and so on.
18
19
21
Orthographic rules are general rules used when breaking a word into its stem and modifiers. An
example would be: singular English words ending with -y, when pluralized, end with -ies. Contrast this
to morphological rules which contain corner cases to these general rules. Both of these types of rules
are used to construct systems that can do morphological parsing
Morphological rules tell us the plural of goose is formed by changing the vowel.
For example the word fox consists of a single morpheme (the morpheme fox) while the word cats
consists of two the morpheme cat and the morpheme s
Types of Morpheme:
22
Free Morphemes (stem): These are complete words that can stand alone and carry meaning on their
own (e.g., "book," "run").
Bound Morphemes (affixes): These are meaningful units that cannot stand alone and must be attached
to a free morpheme to convey meaning. Bound morphemes include prefixes (e.g., "un-" in "undo"),
suffixes (e.g., "-ed" in "walked"), and infixes (inserted inside a word, like in some Tagalog verb forms). 23
Bounded morphemes, when it is added to a morpheme Free Morphemes stand alone as a word
it gives meaning Eg: Girl, cat, dog, little, Book, Bag
s in walks re in replay er in cheaper im in impossible
en in enlighten un in unable
A number of languages have extensive non concatenative morphology, in which morphemes are
combined in more complex ways
Another kind of non concatenative morphology is called templatic morphology or root and pattern
morphology This is very common in Arabic, Hebrew, and other Semitic languages
The Hebrew tri consonantal root lmd meaning ‘learn’ or ‘study’ can be combined with the active voice 25
CaCaC template to produce the word lamad,‘he studied’
The intensive CiCeC template to produce the word limed, ‘he taught’
The intensive passive template CuCaC to produce the word lumad ‘he was taught’
Syllabus
3
Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 4
10
Orthographic rules are general rules used when breaking a word into its stem and modifiers. An
example would be: singular English words ending with -y, when pluralized, end with -ies. Contrast this
to morphological rules which contain corner cases to these general rules. Both of these types of rules
are used to construct systems that can do morphological parsing
Morphological rules tell us the plural of goose is formed by changing the vowel.
For example the word fox consists of a single morpheme (the morpheme fox) while the word cats
consists of two the morpheme cat and the morpheme s
Types of Morpheme:
11
Free Morphemes (stem): These are complete words that can stand alone and carry meaning on their
own (e.g., "book," "run").
Bound Morphemes (affixes): These are meaningful units that cannot stand alone and must be attached
to a free morpheme to convey meaning. Bound morphemes include prefixes (e.g., "un-" in "undo"),
suffixes (e.g., "-ed" in "walked"), and infixes (inserted inside a word, like in some Tagalog verb forms). 12
Bounded morphemes, when it is added to a morpheme Free Morphemes stand alone as a word
it gives meaning Eg: Girl, cat, dog, little, Book, Bag
s in walks re in replay er in cheaper im in impossible
en in enlighten un in unable
A number of languages have extensive non concatenative morphology, in which morphemes are
combined in more complex ways
Another kind of non concatenative morphology is called templatic morphology or root and pattern
morphology This is very common in Arabic, Hebrew, and other Semitic languages
The Hebrew tri consonantal root lmd meaning ‘learn’ or ‘study’ can be combined with the active voice 14
CaCaC template to produce the word lamad,‘he studied’
The intensive CiCeC template to produce the word limed, ‘he taught’
The intensive passive template CuCaC to produce the word lumad ‘he was taught’
For example, English has the inflectional morpheme -s for marking the plural on nouns, and the
inflectional morpheme -ed for marking the past tense on verbs.
15
The meaning of the resulting word is easily predictable
– Derivation: the combination of a word stem with a grammatical morpheme, usually resulting in a
word of a different class, often with a meaning hard to predict exactly.
For example the verb computerize can take the derivational suffix -ation to produce the noun
computerization.
English nouns have only two kinds of inflection: an affix that marks plural and an affix that marks
possessive. For example, many (but not all) English nouns can either appear in the bare stem or
singular form, or take a plural suffix. Here are examples of the regular plural suffix -s (also spelled -es),
and irregular plurals: 16
17
The s form is used in the ‘habitual present’ form to distinguish the 3rd person singular ending (She jogs
every Tuesday) from the other choices of person and number (I/you/we/they jog every Tuesday)
The stem form is used in the infinitive form, and also after certain other verbs (I’d rather walk home, I 18
want to walk home)
The ing participle is used when the verb is treated as a noun called a gerund use
Eg: Fishing is fine if you live near water
Derivation in English is quite complex Its is the Combination of a word stem with a grammatical
morpheme, usually resulting in a word of a different class, often with a meaning hard to predict
exactly
A very common kind of derivation in English is the formation of new nouns, often from verbs or
adjectives This process is called nominalization
20
For example, the suffix -ation produces nouns from verbs ending often in the suffix -ize
(computerize/ computerization)
Importance:
Information retrieval
– Normalize verb tenses, plurals, grammar cases
● Machine translation
– Translation based on the stem
23
26
Syllabus
3
Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 4
10
13
Given the input, for example, cats, we would like to produce cat +N +PL.
• Two-level morphology, by Koskenniemi (1983)
– Representing a word as a correspondence between a lexical level
• Representing a simple concatenation of morphemes making up a word, and
– The surface level
• Representing the actual spelling of the final word.
• Morphological parsing is implemented by building mapping rules that maps letter sequences like cats on
the surface level into morpheme and features sequence like cat +N +PL on the lexical level.
CSA4006-Dr. Anirban Bhowmick
Finite-state transducers (FST)
The automaton we use for performing the mapping between these two
14
levels is the finite-state transducer or FST.
– A transducer maps between one set of symbols and another;
– An FST does this via a finite automaton.
• Thus an FST can be seen as a two-tape automaton which recognizes or generates pairs of
strings.
• The FST has a more general function than an FSA:
– An FSA defines a formal language
– An FST defines a relation between sets of strings.
• Another view of an FST:
– A machine reads one string and generates another.
CSA4006-Dr. Anirban Bhowmick
FST
FST as recognizer:
– a transducer that takes a pair of strings as input and output accept if the
string-pair is in the string-pair language, and a reject if it is not.
FST as generator:
– a machine that outputs pairs of strings of the language. Thus the output is
a yes or no, and a pair of output strings.
15
FST as transducer:
– A machine that reads a string and outputs another string.
FST as set relater:
– A machine that computes relation between sets.
18
Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4
12
13
14
15
16
These spelling changes can be thought as taking as input a simple concatenation of morphemes and
producing as output a slightly-modified concatenation of morphemes
The above equation is called Chomsky and Hall notation. A rule of the form a → b/c − d means rewrite
a as b, when it occurs between c and d. Since symbol " is null, replacing it means inserting some thing.
The symbol ∧ indicates morpheme boundary. These boundaries are deleted by including the symbol ∧ :
" in the default pairs for the transducer.
20
How about the word happy and its derived forms happily and happiness?
The full listing hypothesis proposes that all words of a language are listed in the mental lexicon
without any internal morphological structure
• Morphological structure is simply an epiphenomenon, and walk, walks, walked, happy, and happily
are all separately listed in the lexicon
The minimum redundancy hypothesis suggests that only the constituent morphemes are
represented in the lexicon, and when processing walks (whether for reading, listening, or talking) we
must access both morphemes (walk and s) and combine them
More recent experimental evidence suggests that neither the full listing nor the minimum redundancy
hypotheses may be completely true. Instead, it’s possible that some but not all morphological
relationships are mentally represented
For eg found that derived forms ( happily) are stored separately from their stem ( but that regularly
inflected forms ( are not distinct in the lexicon from their stems)
Marslen Wilson et al. (1994) found that spoken derived words can prime their stems, but only if the
meaning of the derived form is closely related to the stem.
• For example government primes govern, but department does not prime depart
A speech recognition system needs to have a pronunciation for every word it can recognize, and a
text-to-speech system needs to have a pronunciation for every word it can say
– Articulatory phonetics: focuses on how the vocal tract produces the sounds of language
The larynx contains two small folds of muscle, the vocal folds (often referred to non technically as the
vocal cords) which can be moved together or apart
Syllabus
3
Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 4
Plosive (or Stop): These consonants are produced by a complete closure of the vocal tract, causing a
momentary halt in the airflow before releasing it.
Example: /p/ in "pat," /b/ in "bat," /t/ in "top," /d/ in "dog," /k/ in "cat," /g/ in "go.“
Fricative: Fricatives are produced by narrowing the vocal tract, creating turbulent airflow and a continuous,
hissing sound.
Example: /f/ in "fan," /v/ in "van," /s/ in "sock," /z/ in "zebra," /ʃ/ in "shoe," /ʒ/ in "measure."
Affricate: Affricates begin with a stop-like closure and then transition into a fricative sound.
Syllabus
3
Module 3:
Syntax Parsing: Tagsets for English - Part of
Speech Tagging- Rule based Part-of-speech
Tagging- Stochastic Part-of speech Tagging-
4
Transformation-Based Tagging- Context-Free
Grammars for English - Context-Free Rules and
Trees- The Noun Phrase. The Verb Phrase and
Subcategorization- Grammar Equivalence
&Normal Form- Finite State & Context-Free
Grammars.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 3: Syntax
Parsing
Topic: Introduction
Tagsets for English
There are a small number of popular tagsets for English
13
15
iterate through the words in the text and apply the rules to each word in turn. For example:
“Nation” would be tagged as “noun” based on the first rule.
“Investment” would be tagged as “noun” based on the second rule.
“UNITED” would be tagged as “proper noun” based on the third rule.
“Running” would be tagged as “verb” based on the fourth rule.
17
18
Process:
Training Data:
See 0 0 2
pat 0 0 1
N M V <E>
<S> 3/4 1/4 0 0
N 1/9 3/9 1/9 4/9
M 1/4 0 3/4 0
V 4/4 0 0 0 23
Will as a modal
Can as a verb
Spot as a noun
24
Mary as a noun
1/4*3/4*3/4*0*1*2/9*1/9*4/9*4/9=0
CSA4006-Dr. Anirban Bhowmick
HMM-POS Tagging
When these words are correctly tagged, we get a probability greater than zero as shown
below
3/4*1/9*3/9*1/4*3/4*1/4*1*4/9*4/9=0.00025720164
25
26
27
Now there are only two paths that lead to the end, let us calculate the probability associated with each path.
<S>→N→M→N→N→<E> =3/4*1/9*3/9*1/4*1/4*2/9*1/9*4/9*4/9=0.00000846754
<S>→N→M→V→N→<E>=3/4*1/9*3/9*1/4*3/4*1/4*1*4/9*4/9=0.00025720164
Clearly, the probability of the second sequence is much higher and hence the HMM is going to tag each word in the sentence
according to this sequence.
Syllabus
3
Module 3:
Syntax Parsing: Tagsets for English - Part of
Speech Tagging- Rule based Part-of-speech
Tagging- Stochastic Part-of speech Tagging-
4
Transformation-Based Tagging- Context-Free
Grammars for English - Context-Free Rules and
Trees- The Noun Phrase. The Verb Phrase and
Subcategorization- Grammar Equivalence
&Normal Form- Finite State & Context-Free
Grammars.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 3: Syntax
Parsing
Topic: Introduction
Optimizing HMM with Viterbi Algorithm
The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden
states—called the Viterbi path—that results in a sequence of observed events, especially in the context of
Markov information sources and hidden Markov models (HMM).
In the previous section, we optimized the HMM and bought our calculations down from 81 to just two. Now
we are going to further optimize the HMM by using the Viterbi algorithm. Let us use the same example we
used before and apply the Viterbi algorithm to it
8
Now we are really concerned with the mini path having the lowest probability. The same procedure is
done for all the states in the graph as shown in the figure below
12
Basic Idea: do a quick and dirty job first, and then use learned rules to patch things up
13
Overcomes the pure rule-based approach problems of being too expensive, too slow, too tedious etc…
An instance of Transformation-Based Learning.
Start with a dumb statistical system and patch up the typical mistakes it makes.
How dumb?
Assign the most frequent tag (unigram) to each word in the input
7. Repeat:
Continue applying transformation rules and evaluating the tagging accuracy until a stopping criterion is
met, such as reaching a maximum number of iterations or achieving a desired level of accuracy.
8. Finalize Tags:
Once the iterative process is complete, the final POS tags are used as the output for the sentence.
The kind of implicit knowledge of your native language that you had mastered by the time you were 3 or 4
years old without explicit instruction, not necessarily the type of rules you were later taught in school.
Sentences have parts, some of which appear to have subparts. These groupings of words that go
together we will call constituents.
These units form coherent classes that behave in similar ways
For example, we can say that noun phrases can come before verbs
the man from Amherst is a Noun Phrase (NP) because the head man is a noun
extremely clever is an Adjective Phrase (AP) because the head clever is an adjective
down the river is a Prepositional Phrase (PP) because the head down is a preposition
killed the rabbit is a Verb Phrase (VP) because the head killed is a verb 18
Note that a word is a constituent (a little one). Sometimes words also act as phrases. In:
Joe grew potatoes.
Syllabus
3
Module 3:
Syntax Parsing: Tagsets for English - Part of
Speech Tagging- Rule based Part-of-speech
Tagging- Stochastic Part-of speech Tagging-
4
Transformation-Based Tagging- Context-Free
Grammars for English - Context-Free Rules and
Trees- The Noun Phrase. The Verb Phrase and
Subcategorization- Grammar Equivalence
&Normal Form- Finite State & Context-Free
Grammars.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 3: CFG
Topic: Introduction
Syntax
By syntax, we mean various aspects of how words are strung together to form components of
sentences and how those components are strung together to form sentences. syntax comes from the
Greek sy´ntaxis, meaning “setting out together or arrangement”,
The kind of implicit knowledge of your native language that you had mastered by the time you were 3 or 4
years old without explicit instruction, not necessarily the type of rules you were later taught in school.
Sentences have parts, some of which appear to have subparts. These groupings of words that go
together we will call constituents.
These units form coherent classes that behave in similar ways
For example, we can say that noun phrases can come before verbs
the man from Amherst is a Noun Phrase (NP) because the head man is a noun
extremely clever is an Adjective Phrase (AP) because the head clever is an adjective
down the river is a Prepositional Phrase (PP) because the head down is a preposition
killed the rabbit is a Verb Phrase (VP) because the head killed is a verb 10
Note that a word is a constituent (a little one). Sometimes words also act as phrases. In:
Joe grew potatoes.
The idea of basing a grammar on constituent structure dates back to Wilhem Wundt (1890), but not 12
formalized until Chomsky (1956), and, independently, by Backus (1959).
Consist of:
S → NP VP
→ Det NOM VP
→ The NOM VP
→ The Noun VP
→ The man VP 15
16
PP → Prep NP
NP → Noun PP
17
[S The mailman ate his [NP lunch [PP with his friend [PP from the cleaning staff [PP of the building
[PP at the intersection [PP on the north end [PP of town]]]]]]].
18
Top-Down Parsing and Bottom-Up Parsing are used for parsing a tree to reach the starting node of
the tree. Both the parsing techniques are different from each other. The most basic difference between
the two is that top-down parsing starts from top of the parse tree, while bottom-up parsing starts from
the lowest level of the parse tree.
If a goal can be rewritten in several ways, then there is a choice of which rule to apply (search
problem)
Can use depth-first or breadth-first search, and goal ordering.
21
22
Will do badly if there are many different rules for the same LHS. Consider if there are 600 rules
for S, 599 of which start with NP, but one of which starts with a V, and the sentence starts with
a V.
Useless work: expands things that are possible top-down but not there (no bottom-up evidence
for them).
Top-down parsers do well if there is useful grammar-driven control: search is directed by the 23
grammar.
Top-down is hopeless for rewriting parts of speech (pre-terminals) with words (terminals). In
practice that is always done bottom-up as lexical lookup.
Repeated work: anywhere there is common substructure
25
26
• a ”shift” action corresponds to pushing the next input symbol from the buffer onto the stack
• a ”reduce” action occurs when we have a rule’s RHS on top of the stack. To perform the reduction,
we pop the rule’s RHS off the stack and replace it with the terminal on the LHS of the corresponding
27
rule.
If you end up with only the Start symbol on the stack, then success!
If you don’t, and you cannot and no ”shift” or ”reduce” actions are possible,
backtrack.
Problem:
• Unable to deal with empty categories: termination problem, unless rewriting empties as
constituents is somehow restricted (but then it’s generally incomplete)
• Useless work: locally possible, but globally impossible
• Inefficient when there is great lexical ambiguity (grammar-driven control might help here).
Conversely, it is data-directed: it attempts to parse the words that are there
• Repeated work: anywhere there is common substructure.
Prenominal prehead modifiers are words or phrases that appear before the noun and modify it.
These modifiers provide additional information about the noun. Here's an example: 29
The big, red apple: In this noun phrase, "big" and "red" are prenominal prehead modifiers that
provide more details about the noun "apple.“
Postnominal (post head) modifiers are words or phrases that appear after the noun and
modify it. These modifiers also offer additional information about the noun. Here's an example:
The car with the broken windshield: In this noun phrase, "with the broken windshield" is a
postnominal modifier that provides more information about the noun "car."
a stop, the flights, that fare, this flight, those flights, any flights, some flight
Word classes that appear in the NP before the determiner are called
predeterminers .
A number of different kinds of word classes can appear in the NP between the 30
determiner and the head noun.
• Cardinal numbers Eg two friends, one stop
• Ordinal numbers include first, second, third etc but also words like next, last,
past, other, and another Eg the first one, the next day, the second leg, the last
flight, the other American flight, any other fares.
• Quantifiers many, few, several occur only with plural count nouns Eg many
fares
• The quantifiers much and a little occur only with noncount nouns
A car 31
Or simple possessives
John’s car
All the options for prenominal modifiers are combined with one rule as follows:
NP--(Det) (Card) (Ord) (Quant) (AP) Nominal
the use of parentheses () to mark optional constituents.
In the following examples, the verb phrases happen to all have only prepositional phrases after
verb. 34
• any of those (leaving on Thursday)
• any flights (arriving after eleven a.m)
• flights (arriving within thirty minutes of each other)
A postnominal relative clause (more correctly a restrictive relative clause), is a clause that often
begins with a relative pronoun (that and who are the most common)
NP Det Nominal ?
36
(O) This flight (X) This flights
(O) Those flights (X) Those flight
38
But, even though there are many valid VP rules in English, not all verbs are
allowed to participate in all those VP rules
We can subcategorize the verbs in a language according to the sets of VP rules
that they participate in
This is a modern take on the traditional notion of transitive/intransitive
Modern grammars may have 100 s or such classes
40
Syllabus
3
Module 4:
Semantics: Computational Desiderata for
Representations- Meaning Structure of
Language- First Order Predicate Calculus-
Elements of FOPC- The Semantics of FOPC- 4
Topic: Introduction
Semantic Analysis
Semantic analysis in natural language processing (NLP) refers to the process of understanding the
meaning of words, phrases, sentences, or even entire documents. It goes beyond syntactic analysis,
which focuses on the grammatical structure of language, to extract the underlying meaning and
context.
Here are some key aspects of semantic analysis in NLP:
Word Sense Disambiguation (WSD): Words often have multiple meanings depending on the context
in which they are used. WSD is the task of determining the correct sense of a word in a given 8
context. For example, the word "bank" could refer to a financial institution or the side of a river.
Named Entity Recognition (NER): NER involves identifying and classifying entities such as names of
people, organizations, locations, dates, and other specific terms in a text. This helps in
understanding the key entities and their relationships within a document.
Semantic Role Labeling (SRL): SRL aims to identify the roles of different components of a sentence,
such as the subject, object, and predicate. It helps in understanding the relationships between
entities and their actions in a given context.
Sentiment Analysis: While often associated more with the emotional aspect of language, sentiment
analysis also involves understanding the underlying meaning of text. It helps determine whether a piece
of text expresses a positive, negative, or neutral sentiment.
9
Semantic Similarity: This involves measuring the degree of similarity between words, phrases, or
sentences in terms of meaning. It is useful in tasks like information retrieval, document clustering, and
question answering.
Word Embeddings and Vector Representations: Techniques like word embeddings (e.g., Word2Vec,
GloVe, and BERT) represent words in a continuous vector space where semantically similar words are
closer in the vector space. This allows algorithms to capture semantic relationships between words.
Frame Semantics and Ontologies: Understanding the frames or scenarios in which words and phrases
are used can contribute to a deeper understanding of meaning.
CSA4006-Dr. Anirban Bhowmick
Meaning Representation Language
In natural language processing (NLP), meaning representation languages are formal languages or
frameworks used to represent the meaning of linguistic expressions in a structured and interpretable
way. These representations are essential for tasks such as semantic analysis, machine translation,
question answering, and other applications where understanding the meaning of natural language is
crucial.
But unlike parse trees, these representations aren’t primarily descriptions of the structure of the inputs 10
Consider the following everyday language tasks that require some form of semantic processing
To focus this discussion, we will consider in more detail the task of giving advice about restaurants to
tourists. In this discussion, we will assume that we have a computer system that accepts spoken
language queries from tourists and construct appropriate responses by using a knowledge base of
relevant domain knowledge.
12
Verifiability
Unambiguous Representations
Canonical Form
Inference and Variables
Expressiveness
The most straightforward way to implement this notion is make it possible for a system to compare,
or match the representation of the meaning of an input against the representations in its knowledge
base its store of information about its world.
Unambiguous representations are crucial for NLP tasks to enhance the accuracy and reliability of
natural language understanding systems.
The term inference to refer generically to a system’s ability to draw valid conclusions based on the
meaning representation of inputs and its store of background knowledge
It must be possible for the system to draw conclusions about the truth of propositions that are not
explicitly represented in the knowledge base, but are nevertheless logically derivable from the
propositions that are present 17
I’d like to find a restaurant where I can get vegetarian food.
In this examples, this request does not make reference to any particular restaurant
The user is stating that they would like information about an unknown and unnamed entity that is a
restaurant that serves vegetarian food
Answering this request requires a more complex kind of matching that involves the use of variables
A representation containing such variables as follows
This representation captures the expressiveness of the sentence by not only representing the basic 18
actions and entities but also incorporating additional details about the manner of applause and the
specific location of the event. It goes beyond a simple surface-level representation and delves into the
nuanced aspects of the sentence's meaning.
Examples:
1.In "The cat is on the mat," "The cat" is the argument of the predicate "is on the mat."
2.In "She likes to read books," "She" is the argument of the predicate "likes to read books." 20
3.In "The sun sets in the west," "The sun" is the argument of the predicate "sets in the west."
Make a reservation for this evening for a table for two persons at 8
The predicate argument structure is based on the concept underlying the noun reservation, rather
than make, the main verb in the phrase
This example gives rise to a four argument predicate structure like the following
23
Reservation(Today; 8PM ; 2)
Any useful meaning representation language must be organized supports the specification of
semantic predicate argument structures
This support must include support for the kind of semantic information that languages present
Variable arity predicate argument structures
The semantic labeling of arguments to predicates
The statement of semantic constraints on the fillers of argument roles
Syllabus
3
Module 4:
Semantics: Computational Desiderata for
Representations- Meaning Structure of
Language- First Order Predicate Calculus-
Elements of FOPC- The Semantics of FOPC- 4
Topic: Introduction
Review
Examples:
1.In "The cat is on the mat," "The cat" is the argument of the predicate "is on the mat."
2.In "She likes to read books," "She" is the argument of the predicate "likes to read books." 10
3.In "The sun sets in the west," "The sun" is the argument of the predicate "sets in the west."
Make a reservation for this evening for a table for two persons at 8
The predicate argument structure is based on the concept underlying the noun reservation, rather
than make, the main verb in the phrase
This example gives rise to a four argument predicate structure like the following
13
Reservation(Today; 8PM ; 2)
Any useful meaning representation language must be organized supports the specification of
semantic predicate argument structures
This support must include support for the kind of semantic information that languages present
Variable arity predicate argument structures
The semantic labeling of arguments to predicates
The statement of semantic constraints on the fillers of argument roles
1+2,
Where is John?
Atomic Propositions
Compound propositions
Compound Propositions:
Compound propositions are constructed by combining simpler or atomic propositions, using
parenthesis and logical connectives.
Example:
"It is raining today, and street is wet."
"Ankit is a doctor, and his clinic is in Mumbai."
Implication: In propositional logic, we have a connective that combines two propositions into a new 16
proposition called the conditional
Definition: If p and q are arbitrary propositions, then the biconditional of p and q is written: p ⇔
q and will be true iff either:
1. p and q are both true; or 17
((p ∧ (q ⇒ r)) ∨ s) ∧ t
EXAMPLE. Suppose we have a valuation 𝜐, such that:
𝜐(p) = F
𝜐(q) = T
𝜐(r) = F
Then we truth value of (p ∨ q) ⇒ r is
evaluated by:
Allows us to break sentences into predicates, subjects and objects, while also allowing us to
use quantifiers like “all”, “each”, “some” etc. 19
Blackburn & Bos make a strong argument for using first-order logic as the meaning
representation.
FOL formulae
Likes(x,y), In(x,y) 21
Gives(x,y,z)
Example
Every kid likes football ∀x kid(x) → likes(x, football)
∃, which looks like an inverted E, is used to represent them. We always use AND or conjunction symbols.
Example
Some people like Football. ∃x: people(x) ∧ likes Football(x)
∀x[Person(x) ∧ Happy(x)] 24
Everyone is happy
The semantic representation for this example is built up in a straightforward way from
semantics of the individual clauses through the use of the and ¬ operators 29
Syllabus
3
Module 4:
Semantics: Computational Desiderata for
Representations- Meaning Structure of
Language- First Order Predicate Calculus-
Elements of FOPC- The Semantics of FOPC- 4
Topic: Introduction
Review
10
All that John inherited was a book
Similar formulae:
∀x ¬ P ¬ ∃x P
12
Example:
Nobody likes John : ∀x ¬ like(x,John) ¬ ∃x like(x,John)
¬ ∀x P ∃x ¬ P
Example:
There is at least one person who doesnot like John: ¬ ∀x like (x,John) ∃x ¬ like
(x,John)
∀x P ¬ ∃x ¬ P
Example:
Everyone likes John : ∀x like(x,John) ¬ ∃x ¬ like(x,John)
13
∃x P ¬ ∀x ¬ P
Example:
There is at least one person who likes John: ¬ ∃x like (x,John) ¬ ∀x ¬ like (x,John)
15
3. Bind the meaning representation of the NPs to the variables in the meaning
representation of the verb to get the meaning representation of the whole sentence
Augment the lexicon and grammar rules with semantic attachment – devise a mapping between
rules of the grammar and rules of semantic representation (rule to rule hypothesis)
The text appearing within brackets specifies the meaning representation assigned to A as a function of
the semantic attachment of A’s constituents
{President} and {speaker} are meaning associated with the augmented rules
18
To combine 𝑵𝑷𝒔𝒆𝒎 and 𝒗𝒆𝒓𝒃𝒔𝒆𝒎 , y has to be replaced with speaker, not specified in 𝒗𝒆𝒓𝒃𝒔𝒆𝒎 .
Need to revise the semantic attachment for verb
CSA4006-Dr. Anirban Bhowmick
Example:
19
20
Syllabus
3
Module 4:
Semantics: Computational Desiderata for
Representations- Meaning Structure of
Language- First Order Predicate Calculus-
Elements of FOPC- The Semantics of FOPC- 4
Topic: Introduction
Review
10
Rule 3 if ∝ is a branching node, {𝛽, 𝛾} is the set of daughters and [| 𝛽 |] is a function whose domain
contain [| 𝛾 |], then [| ∝ |] = [|𝛽|] ([|𝛾|] )
15
Lexical Entries:
S=t
N=e
17
18
19
20
Syllabus
3
Module 4:
Semantics: Computational Desiderata for
Representations- Meaning Structure of
Language- First Order Predicate Calculus-
Elements of FOPC- The Semantics of FOPC- 4
The meaning representations of these examples all contain propositions concerning the
serving of lunch on flights
They differ with respect to the role that these propositions are intended to serve
• The normal interpretation for a representation headed by the DCL operator would be as a factual
statement to be added to the current knowledge base.
• Imperative sentences begin with a verb phrase and lack an overt subject. Because of the missing subject,
the meaning representation for the main verb phrase will consist of a λ expression with an unbound λ
variable representing this missing subject
The IMP operator can then be applied to this representation as in the following semantic
attachment.
11
Applying this rule
The following semantic attachment simply ignores the auxiliary, and with the exception of the YNQ 12
operator
Yes or No Questions should be thought as asking the whether the propositional part of its meaning
is true or false given the knowledge currently contained in the knowledge-base.
The following attachment produces a representation that consists of the operator WHQ, the
variable corresponding to the subject of the sentence, and the body of the proposition.
13
The question is not about the subject of the sentence but rather some other argument, or some 14
Syllabus
3
Module 5:
Machine Translation And Applications: Basic
Issues in Machine
Translation- Statistical Translation- Word
4
Alignment- Phrase based
Translation- Synchronous Grammars-
Applications of Natural Language
Processing: Spell Check- Summarization-
Language Translation.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 5: MT
What is Machine Translation?
● Product manuals
● Customer support
Social
Any multilingual NLP system will
● Travel (signboards, food)
involve some kind of machine
● Entertainment (books, movies, videos)
translation at some level
10
first general purpose electronic computers were not far off on the horizon — in the mid-1940s,
developers like Warren Weaver began to theorize about ways they could use computers to automate
the translation process.
11
Early RBMT systems include the Institute Textile de France’s TITUS and Canada’s METEO system,
among others. And while US-based research certainly slowed down after the ALPAC report, it didn’t
come to a complete stop — SYSTRAN, founded in 1968, utilized RBMT as well, working closely with
the US Air Force for Russian-English translation in the 1970s.
NMT engines use larger corpora than SMT and are more reliable when it comes to translating long
strings of text with complex sentence structures.
Although large language models (LLMs) perform a lot of other functions besides translation, some
thought leaders have presented tools like ChatGPT as the future of localization and, by extension,
MT.
14
15
Language registers
Formal: आप बैठिये Informal: तू बैि
Standard : मझ
ु े डोसा चाठिए Dakhini: मेरे को डोसा िोना
● Word Order
○ Underlying deeper syntactic structure 17
● Morphological Richness
○ Identifying basic units of words
18
19
20
Syllabus
3
Module 5:
Machine Translation And Applications: Basic
Issues in Machine
Translation- Statistical Translation- Word
4
Alignment- Phrase based
Translation- Synchronous Grammars-
Applications of Natural Language
Processing: Spell Check- Summarization-
Language Translation.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 5: MT
Review
Classis example: IBM work on French-English translation, using the Canadian Hansards.
(1.7 million sentences of 30 words or less in length)
9
Idea goes back to Warren Weaver (1949): suggested applying statistical and cryptanalytic
techniques to translation.
Have a model p(e|f) which estimates conditional probability of any English sentence e
given the French sentence f. Use the training corpus to set the parameters.
10
p(e) the language model
p(f|e) the translation model
Giving:
p(e,f) 𝑝 𝑒 𝑝(𝑓|𝑒)
p(e|f) = = =
p(f) 𝑝 𝑒 𝑝(𝑓|𝑒)
and
𝑎𝑟𝑔max 𝑝(𝑒|𝑓) = 𝑎𝑟𝑔max p e p(f|e)
𝑒 𝑒
11
13
14
Let’s see how to learn the translation model 𝑃(𝒇|𝒆)
16
17
18
19
20
Syllabus
3
Module 5:
Machine Translation And Applications: Basic
Issues in Machine
Translation- Statistical Translation- Word
4
Alignment- Phrase based Translation-
Synchronous Grammars- Applications of Natural
Language Processing: Spell Check-
Summarization- Language Translation.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 5: MT
Review
10
11
12
13
14
Alignments:
15
16
17
18
19
20
Next step: come up with an estimate for
21
22
23
25
26
28
29
30
31
32
33
Syllabus
3
Module 5:
Machine Translation And Applications: Basic
Issues in Machine
Translation- Statistical Translation- Word
4
Alignment- Phrase based Translation-
Synchronous Grammars- Applications of Natural
Language Processing: Spell Check-
Summarization- Language Translation.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 5: MT
Review
11
input information as possible to help the decoder get the best results. It’s the only information from the
input that the decoder will get.
The decoder
Layers of recurrent units — e.g., LSTMs — where each unit produces an output at a time step t. The
hidden state of the first unit is the encoder vector, and the rest of the units accept the hidden state
from the previous unit. The output is calculated using a softmax function to obtain a probability for
every token in the output vocabulary.
Because language consists of tokens and grammar, the problem with this model is it does not entirely
address the complexity of the grammar.
Specifically, when translating the nth word in the source language, the RNN was considering only the
1st n-word in the source sentence, but grammatically, the meaning of a word depends on both the 13
sequence of words before and after it in a sentence:
A solution: The bi-directional LSTM model. If we use a bi-directional model, it allows us to input the
context of both past and future words to create an accurate encoder output vector:
14
But then, the challenge then becomes, which word do we need to focus on in a sequence?
16
17