0% found this document useful (0 votes)

206 views

NLP StudyMaterial

This document provides an overview of natural language processing (NLP) techniques including rule-based NLP, statistical model-based NLP using Penn Treebank and Conditional Random Fields (CRFs), and modern approaches like Word2Vec, sequence-to-sequence models, and Transformers. It discusses tasks like named entity recognition, part-of-speech tagging, parsing and compares different NLP methods.

Uploaded by

tegeje5009

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

206 views

NLP StudyMaterial

Uploaded by

tegeje5009

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 540

Natural Language

Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 1
Syllabus

Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4

Regular Expressions-Basic Regular Expression

1990s: The use of large corpora and the 11

development of the Penn Treebank

revolutionize NLP. Introduction of part-of-speech
tagging and syntactic parsing.

2000s: More sophisticated statistical models like

Conditional Random Fields (CRFs) and word
embeddings (Word2Vec, GloVe) emerge. Shift
towards data-driven approaches.
CSA4006-Dr. Anirban Bhowmick
Brief History of NLP
Deep Learning and Modern NLP (2010s-Present)

2010s: Deep Learning redefines NLP with neural network

architectures like Recurrent Neural Networks (RNNs) and
Convolutional Neural Networks (CNNs).

2013: Introduction of Word2Vec by Mikolov et al., which learns word

embeddings from large text corpora. 12

2014: "Sequence to Sequence" models enable breakthroughs in

machine translation.

2018: Transformers, exemplified by the BERT model, revolutionize

NLP tasks by learning contextualized word representations.

Present: State-of-the-art models like GPT-3.5 achieve remarkable

performance across a wide range of NLP tasks using massive
amounts of data and computation.
CSA4006-Dr. Anirban Bhowmick
NLP-Rule based
Rule-based Natural Language Processing (NLP) is an approach to language processing that relies on a
set of predefined rules and patterns to analyze and extract information from text data. It contrasts with
machine learning-based NLP, which uses algorithms and models to learn patterns and make predictions
from data

Rule: If a text contains a date in the format

"dd/mm/yyyy" or "dd-mm-yyyy," extract it.
13

Example Text: "The project deadline is 25/09/2023,

and the meeting is scheduled for 30-09-2023."

Rule-Based NLP Output:

Extracted Date: "25/09/2023"

Extracted Date: "30-09-2023"

CSA4006-Dr. Anirban Bhowmick

NLP- Statistical model based
Statistical model-based Natural Language Processing (NLP) relies on the use of statistical techniques
and machine learning algorithms to analyze and understand text data. Unlike rule-based NLP, which relies
on predefined rules and patterns, statistical model-based NLP learns patterns and relationships from data

Task: Text Classification

Statistical Model: Support Vector Machine (SVM)

Example: Sentiment Analysis

CSA4006-Dr. Anirban Bhowmick

NLP-Penn Treebank based
The Penn Treebank is a widely used dataset in Natural Language Processing (NLP) that provides
annotated syntactic and structural information for English text. It uses a tree structure to represent the
grammatical and syntactic relationships within sentences. One common application of Penn Treebank-
based NLP is parsing sentences to analyze their grammatical structure

Task: Sentence Parsing Part-of-Speech (POS) Tagging: Each

token is assigned a POS tag that
"The quick brown fox jumps over the lazy dog." represents its grammatical category (e.g., 15

Tokenization: The sentence is first tokenized into noun, verb, adjective). Here is an example
individual words and punctuation marks. In this case, of the sentence with POS tags:
the sentence is tokenized as follows: [("The", "DT"), ("quick", "JJ"), ("brown",
["The", "quick", "brown", "fox", "jumps", "over", "the", "JJ"), ("fox", "NN"), ("jumps",
"lazy", "dog", "."]

CSA4006-Dr. Anirban Bhowmick

NLP-Penn Treebank based
Parsing: The Penn Treebank-based NLP system uses syntactic rules and information to parse the
sentence into a tree structure that represents its grammatical and syntactic relationships. The resulting
parse tree for the example sentence might look like this:

(S
(NP (DT The) (JJ quick) (JJ brown) (NN fox))
(VP (VBZ jumps) (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))))
16
(. .))

In this parse tree, "S" represents the sentence, "NP" represents a noun phrase, "VP" represents a verb
phrase, "DT" represents a determiner, "JJ" represents an adjective, "NN" represents a noun, "VBZ"
represents a verb, and "IN" represents a preposition. The tree structure captures the hierarchical
relationships between the words in the sentence.

CSA4006-Dr. Anirban Bhowmick

NLP-CRFs
Conditional Random Fields (CRFs) are a popular machine learning model used in Natural Language
Processing (NLP) for sequence labeling tasks, such as named entity recognition (NER), part-of-speech
tagging (POS), and chunking. CRFs are particularly effective at capturing dependencies between adjacent
labels in a sequence.
Example Sentence:
"Apple Inc. is headquartered in Cupertino, California."
Label Sequence (NER Tags): 17

Topic: Introduction
Applications of NLP
Communication With Machines

CSA4006-Dr. Anirban Bhowmick

Applications of NLP
Conversational Agents Conversational agents contain:
Building AI systems that can engage ● Speech recognition
in natural-sounding conversations ● Language analysis
with users. Used in customer ● Dialogue processing
support, virtual companions, and ● Information retrieval
mental health apps. ● Text to speech

Question Answering Text Generation

Developing systems that can Creating human-like text using
understand and answer questions models like OpenAI's GPT-3.
posed in natural language. Used Applications range from creative
in chatbots, virtual assistants, and writing to chatbots.
information retrieval.

CSA4006-Dr. Anirban Bhowmick

Applications of NLP
Machine Translation Sentiment Analysis
Automatically translating text from Analyzing text to determine the
one language to another. Google sentiment (positive, negative, neutral)
Translate and other translation expressed by the author. Applications
services heavily rely on NLP include brand monitoring, customer
techniques. feedback analysis, and social media
sentiment tracking.
10

Information Retrieval Named Entity Recognition (NER)

Improving search engines by Identifying entities like names, dates,
understanding user queries and locations, and more within a text. Used in
retrieving relevant information from a information extraction, chatbots, and language
large dataset. translation.

CSA4006-Dr. Anirban Bhowmick

Level Of Linguistic Knowledge
1. Phonetics and Phonology
At this level, NLP systems consider the sounds of speech. It involves understanding the
phonemes (distinct speech sounds) and the rules governing their pronunciation, as well as the
intonation patterns and stress in spoken language.

2. Morphology
Morphology deals with the internal structure of words and how they are formed from smaller units
11
called morphemes. Morphological analysis helps in tasks like stemming (reducing words to their
base form) and lemmatization (reducing words to their dictionary form).

3. Syntax
Syntax involves the rules governing the structure of sentences. It includes understanding how
words combine to form phrases and sentences, and the relationships between different parts of
speech. Parsing techniques are used to analyze sentence structure.

CSA4006-Dr. Anirban Bhowmick

Level Of Linguistic Knowledge
4. Semantics
Semantics is the study of meaning in language. NLP systems at this level aim to understand the
meaning of individual words, phrases, and sentences. This can involve tasks like word sense
disambiguation (determining the correct meaning of a word based on context) and semantic role
labeling (identifying the roles of words in a sentence, e.g., subject, object).
12
5. Pragmatics
Pragmatics refers to the use of language in context. It involves understanding implied meaning,
indirect speech acts, and the intentions behind statements. This level is crucial for understanding
sarcasm, irony, and other forms of figurative language.

6. Discourse
Discourse refers to the structure and organization of connected text or speech. NLP systems at this level
consider how sentences relate to each other and form coherent paragraphs or dialogues. Coreference
resolution (identifying which words refer to the same entity) is an important task in discourse analysis.

CSA4006-Dr. Anirban Bhowmick

Why NLP is Hard?

1. Ambiguity
2. Scale
3. Sparsity
4. Variation
13
5. Expressivity
6. Unmodeled Variables
7. Unknown representations

CSA4006-Dr. Anirban Bhowmick

Ambiguity
Ambiguity at multiple levels

Word senses: bank (finance or river ?)

Part of speech: chair (noun or verb ?)
Syntactic structure: I can see a man with a telescope 14

Multiple: I made her duck

Semantic: Time flies like an arrow; fruit flies like a banana
Phonological: I scream, you scream, we all scream for ice cream."
(The words "I scream" and "ice cream

CSA4006-Dr. Anirban Bhowmick

Ambiguity

15
These different meanings are caused by a number of ambiguities.
First, the words duck and her are morphologically or syntactically ambiguous in their part-of-
speech. Duck can be a verb or a noun, while her can be a dative pronoun or a possessive
pronoun. Second, the word make is semantically ambiguous; it can mean create or cook. Finally,
the verb make is syntactically ambiguous in a different way. Make can be transitive, that is, taking
a single direct object, or it can be ditransitive, that is, taking two objects, meaning that the first
object (her) was made into the second object (duck). Finally, make can take a direct object and a
verb, meaning that the object (her) was caused to perform the verbal action (duck). Furthermore, in
a spoken sentence, there is an even deeper kind of ambiguity; the first word could have been eye or
the second word maid.
CSA4006-Dr. Anirban Bhowmick
Ambiguity
We often introduce the models and
algorithms we present throughout the book
as ways to resolve or disambiguate these
ambiguities. For example, deciding whether
duck is a verb or a noun can be solved by
part-of-speech tagging. Deciding whether 16
make means “create” or “cook” can be
solved by word sense disambiguation.
Resolution of part-of-speech and word
sense ambiguities are two important kinds of
lexical disambiguation

Note: Word Sense Disambiguation (WSD) is a natural language

processing (NLP) task that focuses on determining the correct meaning or
sense of a word in a given context.

CSA4006-Dr. Anirban Bhowmick

Scale
Scale in NLP refers to the challenges and opportunities posed by the vast amounts of linguistic data
available for analysis. The scale of data in NLP presents both technical and computational challenges,
but it also enables the development of more sophisticated models and applications.

Challenges of Scale
Data Collection: Gathering and annotating large-scale linguistic data is resource-intensive and time-
consuming. 17

Computational Resources: Processing and analyzing massive datasets require significant

computational power and memory.

Model Complexity: More data often leads to larger and more complex models, which may require
specialized hardware and efficient training techniques.

Noise and Quality: As datasets grow, ensuring data quality becomes crucial, as noise can negatively
impact model performance.

CSA4006-Dr. Anirban Bhowmick

Scale
Opportunities of Scale
Improved Models: Large datasets enable the training
of more accurate and robust NLP models that can
capture subtle linguistic nuances.

Generalization: Models trained on extensive data have

the potential to generalize better across various 18
domains and languages.

Transfer Learning: Pretrained models on massive

datasets can be fine-tuned for specific tasks, reducing
the need for extensive task-specific data.

Multilingualism: Large-scale data allows models to

learn from multiple languages, enabling multilingual
applications.

CSA4006-Dr. Anirban Bhowmick

Sparsity
Sparsity is a common challenge in Natural Language Processing (NLP) that arises due to the vast and
diverse nature of human language. In NLP, sparsity refers to the phenomenon where the data space is
extremely large, but the actual data available for any specific point in that space is very limited. This can
have significant implications for various NLP tasks and models.

Causes of Sparsity in NLP

Vocabulary Size: Natural languages have extensive vocabularies with numerous words, many of which are 19
rare or domain-specific. The majority of words appear infrequently in any given text corpus.

Long Tail Distribution: The frequency distribution of words follows a "long tail" pattern, where a few
common words appear frequently, while the majority of words occur rarely.

Named Entities: Entities like names, locations, dates, and specialized terms are sparse in most text data.

CSA4006-Dr. Anirban Bhowmick

4. Advances in understanding of language in social

context

CSA4006-Dr. Anirban Bhowmick

Regular Expressions
Regular expressions (regex) are powerful tools used in Natural Language Processing
(NLP) to match and manipulate text patterns. They provide a concise and flexible way to
search, extract, and manipulate textual data.

Imagine you needed to search a string for a term, such as “Phone”.

16
“phone” in “Is the phone here?”
>>> True

Imagine you needed to search a Phone number, “91-98765-43210”, we can do the same:

“91-98765-43210” in “Her phone number is 91-98765-43210”

>>> True

Negations [^Ss] Carat means negation only when first in []

CSA4006-Dr. Anirban Bhowmick

Regular Expression
Pattern Matches
colou?r Optional previous char color colour

oo*h! 0 or more of previous char oh! ooh! oooh! ooooh!

o+h! 1 or more of previous char oh! ooh! oooh! ooooh!

baa+ baa baaa baaaa baaaaa

20
beg.n any character between beg and n begin begun begun beg3n

Regular Expressions: Anchors ^ $

• Identiﬁer: letterfollowed by ≥0 letters or digits.

[a-z][a-z0-9]* i count1 number2go
• TATA box: TATxyT where x or y is A
TAT(A.|.A)T TATAAT TATAgT TATcAT
• Number: one or more digits with optional
decimal point, exponent.
\d+\.?\d*(E[+-]?\d+)? 3.14 6.02E+23
Another Example
Repressed binding sites in regular Python

# assume we have a genome sequence in string variable myDNA

for index in range(0,len(myDNA)-20) :
if (myDNA[index] == "A" or myDNA[index] == "G") and
(myDNA[index+1] == "A" or myDNA[index+1] == "G") and
(myDNA[index+2] == "A" or myDNA[index+2] == "G") and
(myDNA[index+3] == "C") and
(myDNA[index+4] == "C") and
# and on and on!
(myDNA[index+19] == "C" or myDNA[index+19] == "T") :
print "Match found at ",index
break

6
Example

re.ﬁndall(r"[AG]{3,3}CATG[TC]{4,4}[AG]{2,2}C[AT]TG[CT][CG][TC]", myDNA)
RegExprs in Python

http://docs.python.org/library/re.html
Simple RegExpr Testing
>>> import re
>>> str1 = 'what foot or hand fell fastest'
>>> re.findall(r'f[a-z]*', str1)
['foot', 'fell', 'fastest'] Deﬁnitely
recommend trying
>>> str2 = "I lack e's successor" this with examples
>>> re.findall(r'f[a-z]*',str2) to follow, & more
[]

Returns list of all matching substrings.

Exercise: change it to ﬁnd strings
starting with f and ending with t
Exercise: In honor of the
winter Olympics, “-ski-ing”
• download & save war_and_peace.txt
• write py program to read it line-by-line, use
re.ﬁndall to see whether current line contains
one or more proper names ending in “...ski”;
print each. ['Bolkonski']
['Bolkonski']
['Bolkonski']
• mine begins: ['Bolkonski']
['Bolkonski']
['Razumovski']
['Razumovski']
['Bolkonski']
['Spasski']
...
['Nesvitski', 'Nesvitski']
RegExpr Syntax

They’re strings
Most punctuation is special; needs to be
escaped by backslash (e.g., “\.” instead of “.”) to
get non-special behavior
So, “raw” string literals (r’C:\new\.txt’) are
generally recommended for regexprs
Unless you double your backslashes judiciously
Patterns “Match” Text

Pattern: TAT(A.|.A)T [a-z][a-z0-9]*

Text: RATATaAT TAT! count1

RegExpr Semantics, 1
Characters

RexExprs are patterns; they “match” sequences

of characters
Letters, digits (& escaped punctuation like ‘\.’)
match only themselves, just once
r’TATAAT’ ‘ACGTTATAATGGTATAAT’
RegExpr Semantics, 2
Character Groups
Character groups [abc], [a-zA-Z], [^0-9] also
match single characters, any of the characters
in the group.
Shortcuts (2 of many):
. – (just a dot) matches any letter (except newline)
\s ≡ [ \n\t\r\f\v] (“s” for “space”)

r’T[AG]T[^GC].T’‘ACGTTGTAATGGTATnCT’
Matching one of several alternatives

• Square brackets mean that any of the listed characters will do

• [ab] means either ”a” or ”b”

• You can also give a range:

• [a-d] means ”a” ”b” ”c” or ”d”

• Negation: caret means ”not”

[^a-d] # anything but a, b, c or d

8
RegExpr Semantics, 3:
Concatenation, Or, Grouping
You can group subexpressions with parens
If R, S are RegExprs, then
RS matches the concatenation of strings matched
by R, S individually
R | S matches the union–either R or S

?
r’TAT(A.|.A)T’’TATCATGTATACTCCTATCCT’
RegExpr Semantics, 4
Repetition
If R is a RegExpr, then
R* matches 0 or more consecutive strings
(independently) matching R
R+ 1 or more
R{n} exactly n
R{m,n} any number between m and n, inclusive
R? 0 or 1
Beware precedence (* > concat > |) ?
r’TAT(A.|.A)*T’‘TATCATGTATACTATCACTATT’
RegExprs in Python

By default
Case sensitive, line-oriented (\n treated specially)
Matching is generally “greedy”
Finds longest version of earliest starting match
Next “ﬁndall()” match will not overlap

r".+\.py" "Two files: hw3.py and upper.py."

r"\w+\.py" "Two files: hw3.py and UPPER.py."

Exercise 3

Suppose “ﬁlenames” are upper or lower case

letters or digits, starting with a letter, followed
by a period (“.”) followed by a 3 character
extension (again alphanumeric). Scan a list of
lines or a ﬁle, and print all “ﬁlenames” in it,
without their extensions. Hint: use paren
groups.
Solution 3

import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(
r"([a-zA-Z][a-zA-Z0-9]*)\.[a-zA-Z0-9]{3}")
#Finds skidoo.bar amidst 23skidoo.barber; ok?
match = myrule.findall(filecontents)
print match
Basics of regexp construction

• Letters and numbers match themselves

• Normally case sensitive

• Watch out for punctuation–most of it has special meanings!

7
Wild cards

• ”.” means ”any character”

• If you really mean ”.” you must use a backslash

• WARNING:
– backslash is special in Python strings
– It’s special again in regexps
– This means you need too many backslashes
– We will use ”raw strings” instead
– Raw strings look like r"ATCGGC"

9
Using . and backslash

• To match ﬁle names like ”hw3.pdf” and ”hw5.txt”:

hw.\....

10
Zero or more copies

• The asterisk repeats the previous character 0 or more times

• ”ca*t” matches ”ct”, ”cat”, ”caat”, ”caaat” etc.

• The plus sign repeats the previous character 1 or more times

• ”ca+t” matches ”cat”, ”caat” etc. but not ”ct”

11
Repeats

• Braces are a more detailed way to indicate repeats

• A{1,3} means at least one and no more than three A’s

• A{4,4} means exactly four A’s

12
simple testing

>>> import re
>>> string = 'what foot or hand fell fastest'
>>> re.findall(r'f[a-z]*', string)
['foot', 'fell', 'fastest']
Practice problem 1

• Write a regexp that will match any string that starts with ”hum” and
ends with ”001” with any number of characters, including none, in
between

• (Hint: consider both ”.” and ”*”)

13
Practice problem 2

• Write a regexp that will match any Python (.py) ﬁle.

• There must be at least one character before the ”.”

• ”.py” is not a legal Python ﬁle name

• (Imagine the problems if you imported it!)

14
Using the regexp

First, compile it:

import re
myrule = re.compile(r".+\.py")
print myrule
<_sre.SRE_Pattern object at 0xb7e3e5c0>

The result of compile is a Pattern object which represents your regexp

15
Using the regexp

Next, use it:

mymatch = myrule.search(myDNA)
print mymatch
None
mymatch = myrule.search(someotherDNA)
print mymatch
<_sre.SRE_Match object at 0xb7df9170>

The result of match is a Match object which represents the result.

16
All of these objects! What can they do?

Functions offered by a Pattern object:

• match()–does it match the beginning of my string? Returns None or a

match object

• search()–does it match anywhere in my string? Returns None or a

match object

• findall()–does it match anywhere in my string? Returns a list of

strings (or an empty list)

• Note that findall() does NOT return a Match object!

17
All of these objects! What can they do?

Functions offered by a Match object:

• group()–return the string that matched

group()–the whole string
group(1)–the substring matching 1st parenthesized sub-pattern
group(1,3)–tuple of substrings matching 1st and 3rd parenthesized
sub-patterns

• start()–return the starting position of the match

• end()–return the ending position of the match

• span()–return (start,end) as a tuple

18
A practical example

Does this string contain a legal Python filename?

import re
myrule = re.compile(r".+\.py")
mystring = "This contains two files, hw3.py and uppercase.py."
mymatch = myrule.search(mystring)
print mymatch.group()
This contains two files, hw3.py and uppercase.py
# not what I expected! Why?

19
Matching is greedy

• My regexp matches ”hw3.py”

• Unfortunately it also matches ”This contains two ﬁles, hw3.py”

• And it even matches ”This contains two ﬁles, hw3.py and uppercase.py”

• Python will choose the longest match

• I could break my ﬁle into words ﬁrst

• Or I could specify that no spaces are allowed in my match

20
A practical example

Does this string contain a legal Python filename?

import re
myrule = re.compile(r"[^ ]+\.py")
mystring = "This contains two files, hw3.py and uppercase.py."
mymatch = myrule.search(mystring)
print mymatch.group()
hw3.py
allmymatches = myrule.findall(mystring)
print allmymatches
[’hw3.py’,’uppercase.py’]

21
Practice problem 3

• Create a regexp which detects legal Microsoft Word ﬁle names

• The ﬁle name must end with ”.doc” or ”.DOC”

• There must be at least one character before the dot.

• We will assume there are no spaces in the names

• Print out a list of all the legal ﬁle names you ﬁnd

• Test it on testre.txt (on the web site)

22
Practice problem 4

• Create a regexp which detects legal Microsoft Word file names that do
not contain any numerals (0 through 9)

• Print out the start location of the first such filename you encounter

• Test it on testre.txt

23
Practice problem

• Create a regexp which detects legal Microsoft Word ﬁle names that do
not contain any numerals (0 through 9)

• Print out the “base name”, i.e., the ﬁle name after stripping of the .doc
extension, of each such ﬁlename you encounter. Hint: use parenthesized
sub patterns.

• Test it on testre.txt

24
Practice problem 1 solution

Write a regexp that will match any string that starts with ”hum” and ends
with ”001” with any number of characters, including none, in between

myrule = re.compile(r"hum.*001")

25
Practice problem 2 solution

Write a regexp that will match any Python (.py) file.

myrule = re.compile(r".+\.py")

# if you want to find filenames embedded in a bigger

# string, better is:
myrule = re.compile(r"[^ ]+\.py")
# this version does not allow whitespace in file names

26
Practice problem 3 solution

Create a regexp which detects legal Microsoft Word file names, and use it
to make a list of them

import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(r"[^ ]+\.[dD][oO][cC]")
matchlist = myrule.findall(filecontents)
print matchlist

27
Practice problem 4 solution

Create a regexp which detects legal Microsoft Word file names which do
not contain any numerals, and print the location of the first such filename
you encounter

import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(r"[^ 0-9]+\.[dD][oO][cC]")
match = myrule.search(filecontents)
print match.start()

28
Regular expressions summary

• The re module lets us use regular expressions

• These are fast ways to search for complicated strings

• They are not essential to using Python, but are very useful

• File format conversion uses them a lot

• Compiling a regexp produces a Pattern object which can then be used

to search

• Searching produces a Match object which can then be asked for

information about the match

29
Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 4
Syllabus

Regular Expressions-Basic Regular Expression

CSA4006-Dr. Anirban Bhowmick

Formally

State Transition Table for SheepTalk

CSA4006-Dr. Anirban Bhowmick

Recognition and Rejection
The machine starts in the start state (q0), and iterates the following process: Check the next letter of
the input. If it matches the symbol on an arc leaving the current state, then cross that arc, move to the
next state, and also advance one symbol in the input. If we are in the accepting state (q4) when we
run out of input, the machine has successfully recognized an instance of sheeptalk. If the machine
never gets to the final state, either because it runs out of input, or it gets some input that doesn’t match
an arc or if it just happens to get stuck in some non-final state, we say the machine rejects or fails to
accept an input 14

Tape metaphor: a rejected input

CSA4006-Dr. Anirban Bhowmick

D-Recognize
The algorithm is called D-RECOGNIZE for
“deterministic recognizer”. D-RECOGNIZE
begins by setting the variable index to the
beginning of the tape, and current-state to
the machine’s initial state. D-RECOGNIZE
then enters a loop that drives the rest of the
15
algorithm. It first checks whether it has
reached the end of its input. If so, it either
accepts the input (if the current state is an
accept state) or rejects the input
(if not).

CSA4006-Dr. Anirban Bhowmick

D-Recognize
Before examining the beginning of the tape, the machine is in
state q0. Finding a b
on input tape, it changes to state q1 as indicated by the contents
of transition-table[q0,b]. It then finds an a and switches to state q2,
another a puts it in state q3, a third a leaves it in state q3, where it
reads the “!”, and switches to state q4. Since there is no more
input, the End of input condition at the beginning of the loop is
satisfied for the first time and the machine halts in q4. State q4 is 16
an accepting state, and so the machine has accepted the string
baaa! as a sentence in the sheep language. The algorithm will fail
whenever there is no legal transition for a given combination of
state and input. The input abc will fail to be recognized since there
is no legal transition out of state q0 on the input a. Even if the
automaton had allowed an initial a it would have certainly failed on
c, since c isn’t even in the sheeptalk alphabet! We can think of
these “empty” elements in the table as if they all pointed at one
“empty” state, which we might call the fail state or sink state.

CSA4006-Dr. Anirban Bhowmick

Formal Language
A formal language is a set of strings, each string composed of symbols from a finite
symbol-set called an alphabet (the same alphabet used above for defining an
automaton!). The alphabet for the sheep language is the set ∑ = {a,b, !}. Given a model m
(such as a particular FSA), we can use L(m) to mean “the formal language characterized
by m”. So the formal language defined by our sheeptalk automaton m in is the infinite set:
17

Concatenative Morphology & Non Concatenative
Morphology
Prefixes and suffixes are often called concatenative morphology since a word is composed of a
number of morphemes concatenated together
 Circumfixes (Not in English)
 Eg: In German, for example
 The past participle of some verbs formed by adding ge to the beginning of the stem and t to the
end
 so the past participle of the verb sagen (to say) is gesagt (said). 24

A number of languages have extensive non concatenative morphology, in which morphemes are
combined in more complex ways
 Another kind of non concatenative morphology is called templatic morphology or root and pattern
morphology This is very common in Arabic, Hebrew, and other Semitic languages

CSA4006-Dr. Anirban Bhowmick

Non Concatenative Morphology
In Hebrew, for example, a verb is constructed using two components a root, consisting usually of three
consonants ( and carrying the basic meaning, and a template, which gives the ordering of consonants
and vowels and specifies more semantic information about the resulting verb, such as the semantic
voice (e g active, passive, middle)

The Hebrew tri consonantal root lmd meaning ‘learn’ or ‘study’ can be combined with the active voice 25
CaCaC template to produce the word lamad,‘he studied’
 The intensive CiCeC template to produce the word limed, ‘he taught’
 The intensive passive template CuCaC to produce the word lumad ‘he was taught’

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 6
Syllabus

Syllabus
3
Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 4

Parsing with Finite-State Transducers-

Combining FST Lexicon and Rules- Lexicon-free
FSTs: The Porter Stemmer- Human
Morphological Processing- Speech Sounds
and Phonetic Transcription- The Phoneme and
Phonological Rules
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 1:
Introduction

Topic: Regular Expression

Review

CSA4006-Dr. Anirban Bhowmick

Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 9

Parsing with Finite-State Transducers-

10
Orthographic rules are general rules used when breaking a word into its stem and modifiers. An
example would be: singular English words ending with -y, when pluralized, end with -ies. Contrast this
to morphological rules which contain corner cases to these general rules. Both of these types of rules
are used to construct systems that can do morphological parsing

Morphological rules tell us the plural of goose is formed by changing the vowel.

CSA4006-Dr. Anirban Bhowmick

Morphemes
Morphemes: Morphemes are the smallest units of meaning in a language.

For example the word fox consists of a single morpheme (the morpheme fox) while the word cats
consists of two the morpheme cat and the morpheme s

Inflection
In English, only nouns, verbs, and sometimes adjectives can be inflected, and the number of affixes
is quite small.

English nouns have only two kinds of inflection: an affix that marks plural and an affix that marks
possessive. For example, many (but not all) English nouns can either appear in the bare stem or
singular form, or take a plural suffix. Here are examples of the regular plural suffix -s (also spelled -es),
and irregular plurals: 16

CSA4006-Dr. Anirban Bhowmick

Inflection

The irregular verbs are those that

have some more or less
idiosyncratic forms of inflection

CSA4006-Dr. Anirban Bhowmick

Inflection
An irregular verb can inflect in the past form (also called the preterite) by changing its vowel (eat/ate), or
its vowel and some consonants (catch/caught), or with no ending at all (cut/cut).

Orthographic Rules and FSTs

These spelling changes can be thought as taking as input a simple concatenation of morphemes and
producing as output a slightly-modified concatenation of morphemes

CSA4006-Dr. Anirban Bhowmick

Orthographic Rules and FSTs
We note that concatenating the morphemes can work to parse the words like “dog”, “cat”, “fox”, but this
simple method does not work when there is spelling change, like “foxes” is to be parsed into lexicons “fox
+N +PL” or “cats” is to be parsed into “cat +N +PL”, etc. This requires introduction of spelling rules
(also called orthographic rules). To account for the spelling rules, we introduce another tape, called
intermediate tape, which produces the output slightly modified, thus going from 2-level to 3-level
morphology. Such a rule maps from intermediate tape to surface tape. For plural nouns, the rule states,
“insert e on the surface tape just when lexical tape has a morpheme ending in x or z or s and next 17
morpheme is -s”. The examples are ox to oxes, and fox to foxes. The rule is stated as,

The above equation is called Chomsky and Hall notation. A rule of the form a → b/c − d means rewrite
a as b, when it occurs between c and d. Since symbol " is null, replacing it means inserting some thing.
The symbol ∧ indicates morpheme boundary. These boundaries are deleted by including the symbol ∧ :
" in the default pairs for the transducer.

CSA4006-Dr. Anirban Bhowmick

Orthographic Rules and FSTs

● Lexical: foxes +N +Pl

● Intermediate: fox^s# 18
● Surface: foxes

The transducer for the E insertion rule

• Core task – speech recognition acoustic waveform  output a string of words

–Text to speech synthesis
Sequence of text words  output an acoustic waveform

A speech recognition system needs to have a pronunciation for every word it can recognize, and a
text-to-speech system needs to have a pronunciation for every word it can say

CSA4006-Dr. Anirban Bhowmick

Contd.
• The science of phonetics aims to describe all the sounds of all the world’s languages

– Acoustic phonetics: focuses on the physical properties of the sounds of language

– Auditory phonetics: focuses on how listeners perceive the sounds of language

– Articulatory phonetics: focuses on how the vocal tract produces the sounds of language

 Phonetic alphabets: Pronunciation part of the field of phonetics

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 9
Syllabus

Syllabus
3
Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 4

Parsing with Finite-State Transducers-

Combining FST Lexicon and Rules- Lexicon-free
FSTs: The Porter Stemmer- Human
Morphological Processing- Speech Sounds
and Phonetic Transcription- The Phoneme and
Phonological Rules
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 2:
Morphology

Topic: FSA and FST

Vocal Organ Most speech sounds are produced by pushing air through the
vocal cords
– Glottis = the opening between the vocal cords
– Larynx = ‘voice box’

– Pharynx = tubular part of the throat above the larynx

– Oral cavity = mouth

– Nasal cavity = nose and the passages connecting it to the

CSA4006-Dr. Anirban Bhowmick

Consonants: Place of Articulation
Consonants are sounds produced with some restriction or closure in the vocal tract
• Consonants are classified based in part on where in the vocal tract the airflow is being restricted (the
place of articulation)
• The major places of articulation are bilabial, labiodental, interdental, alveolar, palatal, velar, uvular,
and glottal

CSA4006-Dr. Anirban Bhowmick

Consonants: Place of Articulation
1.Bilabial: The airflow is obstructed by bringing both lips together.
Example: /p/ in "pat," /b/ in "bat," /m/ in "mat."
2.Labiodental: The airflow is obstructed by placing the upper teeth against the lower lip.
Example: /f/ in "fan," /v/ in "van."
3.Interdental: The airflow is obstructed by placing the tip of the tongue between the teeth.
Example: /θ/ in "think," /ð/ in "this."
4.Alveolar: The airflow is obstructed by raising the front part of the tongue to the alveolar ridge, which is the bony
ridge just behind the upper front teeth.
Example: /t/ in "top," /d/ in "dog," /s/ in "sock."
5.Alveopalatal (or Palatoalveolar): The airflow is obstructed by raising the front part of the tongue to the area just
behind the alveolar ridge.
Example: /ʃ/ in "shoe," /ʒ/ in "measure," /tʃ/ in "cheese," /dʒ/ in "judge."
6.Palatal: The airflow is obstructed by raising the middle part of the tongue to the hard palate, which is the roof of the
mouth right behind the alveolar ridge.
Example: /j/ in "yes," /ʎ/ in some dialects of Spanish.
7.Velar: The airflow is obstructed by raising the back part of the tongue to the soft part of the palate (the velum).
Example: /k/ in "cat," /g/ in "go," /ŋ/ in "sing."
8.Glottal: The airflow is obstructed by closing or nearly closing the space between the vocal cords in the larynx.
Example: /h/ in "hat," the glottal stop /ʔ/ in some dialects, as in "uh-oh."

CSA4006-Dr. Anirban Bhowmick

Consonants: Manner of Articulation
Consonants can also be classified by their manner of articulation, which describes how the airflow is
obstructed or modified as they are produced. Here are some common manners of articulation for
consonants with examples:

Plosive (or Stop): These consonants are produced by a complete closure of the vocal tract, causing a
momentary halt in the airflow before releasing it.

Example: /p/ in "pat," /b/ in "bat," /t/ in "top," /d/ in "dog," /k/ in "cat," /g/ in "go.“

Fricative: Fricatives are produced by narrowing the vocal tract, creating turbulent airflow and a continuous,
hissing sound.

Example: /f/ in "fan," /v/ in "van," /s/ in "sock," /z/ in "zebra," /ʃ/ in "shoe," /ʒ/ in "measure."

Affricate: Affricates begin with a stop-like closure and then transition into a fricative sound.

Example: /tʃ/ in "cheese," /dʒ/ in "judge."

CSA4006-Dr. Anirban Bhowmick

Contd.
Nasal: Nasal consonants are produced by lowering the velum (soft part of the roof of the mouth),
allowing air to flow through the nasal cavity.

Example: /m/ in "mat," /n/ in "net," /ŋ/ in "sing."

Liquid: Liquids involve a relatively free airflow, with slight constriction in the vocal tract.

Lateral Liquid: /l/ in "let."

Retroflex Liquid: /ɹ/ in "red" (Note: The pronunciation of this sound can vary regionally.)
Glide (Semivowel): Glides are produced with a slight constriction in the vocal tract but are more
vowel-like in nature.

Example: /j/ in "yes," /w/ in "we."

Approximant: Approximants have a less constricted airflow than fricatives but more than glides.

Example: /ɹ/ in "red" (in some dialects), /ʋ/ in some languages.

These are the main manners of articulation for consonants.

CSA4006-Dr. Anirban Bhowmick

Vowel

Vowels are classified by how high or low the

tongue is, if the tongue is in the front or back of
the mouth, and whether or not the lips are
rounded
High vowels: [i] [ɪ] [u] [ʊ]
Mid vowels: [e] [ɛ] [o] [ə] [ʌ] [ɔ]
Low vowels: [æ] [a]
Front vowels: [i] [ɪ] [e] [ɛ] [æ]
Central vowels: [ə] [ʌ]
Back vowels: [u] [ɔ] [o] [æ] [a]

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 10
Syllabus

Syllabus
3
Module 3:
Syntax Parsing: Tagsets for English - Part of
Speech Tagging- Rule based Part-of-speech
Tagging- Stochastic Part-of speech Tagging-
4
Transformation-Based Tagging- Context-Free
Grammars for English - Context-Free Rules and
Trees- The Noun Phrase. The Verb Phrase and
Subcategorization- Grammar Equivalence
&Normal Form- Finite State & Context-Free
Grammars.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 3: Syntax
Parsing

If the word ends in “-ment,” assign the tag “noun.”

If the word is all uppercase, assign the tag “proper noun.”
If the word is a verb ending in “-ing,” assign the tag “verb.”

HMM-POS Tagging
N M V <E>
Next, we have to calculate the transition probabilities,
so define two more tags <S> and <E>. <S> is placed at <S> 3 1 0 0
the beginning of each sentence and <E> at the end as N 1 3 1 4
shown in the figure below. M 1 0 3 0
V 4 0 0 0

In the above figure, we can see that the <S> tag is 22

followed by the N tag three times, thus the first entry is

3.The modal tag follows the <S> just once, thus the
second entry is 1. In a similar manner, the rest of the
table is filled.
Next, we divide each term in a row of the table by the
total number of co-occurrences of the tag in
consideration, for example, The Model tag is followed
by any other tag four times as shown below, thus we
divide each element in the third row by four.

CSA4006-Dr. Anirban Bhowmick

HMM-POS Tagging

N M V <E>
<S> 3/4 1/4 0 0
N 1/9 3/9 1/9 4/9
M 1/4 0 3/4 0
V 4/4 0 0 0 23

CSA4006-Dr. Anirban Bhowmick

HMM-POS Tagging
Take a new sentence and tag them with wrong tags.
Let the sentence, ‘ Will can spot Mary’ be tagged
as-

Will as a modal
Can as a verb
Spot as a noun
24
Mary as a noun

Now calculate the probability of this sequence being

correct in the following manner.
The probability of the tag Model (M) comes after the tag <S> is ¼ as seen in the table. Also, the
probability that the word Will is a Model is 3/4. In the same manner, we calculate each and every
probability in the graph. Now the product of these probabilities is the likelihood that this sequence is
right. Since the tags are not correct, the product is zero.

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 11
Syllabus

Topic: Introduction
Optimizing HMM with Viterbi Algorithm
The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden
states—called the Viterbi path—that results in a sequence of observed events, especially in the context of
Markov information sources and hidden Markov models (HMM).

In the previous section, we optimized the HMM and bought our calculations down from 81 to just two. Now
we are going to further optimize the HMM by using the Viterbi algorithm. Let us use the same example we
used before and apply the Viterbi algorithm to it
8

CSA4006-Dr. Anirban Bhowmick

CSA4006-Dr. Anirban Bhowmick

Process
5. Apply Transformation Rules:
Iterate through the sentence and apply transformation rules to modify the POS tags generated by the
baseline tagger.

6. Evaluate the Updated Tags:

After applying a set of transformation rules to a sentence, evaluate the updated POS tags. If the tagging
accuracy improves, keep the updated tags; otherwise, revert to the previous tagging.
15

7. Repeat:
Continue applying transformation rules and evaluating the tagging accuracy until a stopping criterion is
met, such as reaching a maximum number of iterations or achieving a desired level of accuracy.

8. Finalize Tags:
Once the iterative process is complete, the final POS tags are used as the output for the sentence.

 Note that a word is a constituent (a little one). Sometimes words also act as phrases. In:
Joe grew potatoes.

Joe and potatoes are both nouns and noun phrases.

CSA4006-Dr. Anirban Bhowmick

Evidence constituency exists
1. They appear in similar environments (before a verb)
Kermit the frog comes on stage
They come to Massachusetts every summer
December twenty-sixth comes after Christmas
The reason he is running for president comes out only now.

But not each individual word in the constituent 19

*The comes our... *is comes out... *for comes out...

2. The constituent can be placed in a number of different locations

Constituent = Prepositional phrase: On December twenty-sixth
On December twenty-sixth I’d like to fly to Florida.
I’d like to fly on December twenty-sixth to Florida.
I’d like to fly to Florida on December twenty-sixth.
But not split apart
*On December I’d like to fly twenty-sixth to Florida.
*On I’d like to fly December twenty-sixth to Florida.
CSA4006-Dr. Anirban Bhowmick
20

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 12
Syllabus

Topic: Introduction
Syntax
By syntax, we mean various aspects of how words are strung together to form components of
sentences and how those components are strung together to form sentences. syntax comes from the
Greek sy´ntaxis, meaning “setting out together or arrangement”,

• that and after year last

• I saw you yesterday
• colorless green ideas sleep furiously
8

Why should you care?

Grammar checkers
Question answering
Information extraction
Machine translation

CSA4006-Dr. Anirban Bhowmick

Constituency
The idea: Groups of words may behave as a single unit or phrase, called a constituent.

E.g. Noun Phrase

Kermit the frog
they
December twenty-sixth
the reason he is running for president
9

CSA4006-Dr. Anirban Bhowmick

Constituent Phrases
For constituents, we usually name them as phrases based on the word that
heads the constituent:

 Note that a word is a constituent (a little one). Sometimes words also act as phrases. In:
Joe grew potatoes.

Joe and potatoes are both nouns and noun phrases.

CSA4006-Dr. Anirban Bhowmick

But not each individual word in the constituent 11

*The comes our... *is comes out... *for comes out...

2. The constituent can be placed in a number of different locations

CFG = Context-Free Grammar = Phrase Structure Grammar

= BNF = Backus-Naur Form

The idea of basing a grammar on constituent structure dates back to Wilhem Wundt (1890), but not 12
formalized until Chomsky (1956), and, independently, by Backus (1959).
Consist of:

Terminals: We’ll take these to be words

Top-Down Parsing and Bottom-Up Parsing are used for parsing a tree to reach the starting node of
the tree. Both the parsing techniques are different from each other. The most basic difference between
the two is that top-down parsing starts from top of the parse tree, while bottom-up parsing starts from
the lowest level of the parse tree.

CSA4006-Dr. Anirban Bhowmick

Top-down parsing
Top-down parsing is goal-directed

A top-down parser starts with a list of constituents to be built.

• It rewrites the goals in the goal list by matching one against the LHS of the grammar rules, and
expanding it with the RHS,
• attempting to match the sentence to be derived 20

If you end up with only the Start symbol on the stack, then success!

Noun Phase
Noun phrases can begin with a determiner, as follows:

a stop, the flights, that fare, this flight, those flights, any flights, some flight
Word classes that appear in the NP before the determiner are called
predeterminers .
A number of different kinds of word classes can appear in the NP between the 30
determiner and the head noun.
• Cardinal numbers Eg two friends, one stop
• Ordinal numbers include first, second, third etc but also words like next, last,
past, other, and another Eg the first one, the next day, the second leg, the last
flight, the other American flight, any other fares.
• Quantifiers many, few, several occur only with plural count nouns Eg many
fares
• The quantifiers much and a little occur only with noncount nouns

CSA4006-Dr. Anirban Bhowmick

Noun Phase
Noun phrases can start with determiners...
Determiners can be

Simple lexical items: the, this, a, an, etc.

A car 31
Or simple possessives

John’s car

Or complex recursive versions of that

John’s sister’s husband’s son’s car

Noun Phase
Rules for GerundVP constituents by duplicating all of our VP productions, substituting GerundV
for V.
• GerundVP -- GerundV NP
• GerundV PP
• GerundV
• GerundV NP PP
35
GerundV can then be defined as:
GerundV being/preferring/arriving /leaving/…

Assistant Professor
VIT Bhopal
Lecture : 13
Syllabus

Syllabus
3
Module 4:
Semantics: Computational Desiderata for
Representations- Meaning Structure of
Language- First Order Predicate Calculus-
Elements of FOPC- The Semantics of FOPC- 4

Syntax-Driven Semantic Analysis- Attachments

for a Fragment of English.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 4:
Semantic

Topic: Introduction
Semantic Analysis
Semantic analysis in natural language processing (NLP) refers to the process of understanding the
meaning of words, phrases, sentences, or even entire documents. It goes beyond syntactic analysis,
which focuses on the grammatical structure of language, to extract the underlying meaning and
context.
Here are some key aspects of semantic analysis in NLP:

Word Sense Disambiguation (WSD): Words often have multiple meanings depending on the context
in which they are used. WSD is the task of determining the correct sense of a word in a given 8
context. For example, the word "bank" could refer to a financial institution or the side of a river.

Named Entity Recognition (NER): NER involves identifying and classifying entities such as names of
people, organizations, locations, dates, and other specific terms in a text. This helps in
understanding the key entities and their relationships within a document.

Semantic Role Labeling (SRL): SRL aims to identify the roles of different components of a sentence,
such as the subject, object, and predicate. It helps in understanding the relationships between
entities and their actions in a given context.

CSA4006-Dr. Anirban Bhowmick

Semantic Analysis
Coreference Resolution: This involves determining when two or more expressions in a text refer to the
same entity. For example, in the sentence "John went to the store. He bought some groceries," resolving
the pronoun "He" to refer to "John" requires coreference resolution.

Sentiment Analysis: While often associated more with the emotional aspect of language, sentiment
analysis also involves understanding the underlying meaning of text. It helps determine whether a piece
of text expresses a positive, negative, or neutral sentiment.
9
Semantic Similarity: This involves measuring the degree of similarity between words, phrases, or
sentences in terms of meaning. It is useful in tasks like information retrieval, document clustering, and
question answering.

Word Embeddings and Vector Representations: Techniques like word embeddings (e.g., Word2Vec,
GloVe, and BERT) represent words in a continuous vector space where semantically similar words are
closer in the vector space. This allows algorithms to capture semantic relationships between words.

Does Maharani serve vegetarian food? 13

Serves(Maharani; Vegetarian Food)
 Input matched against the knowledge base of facts about a set of restaurants
 Matching the input proposition in its knowledge base, it can return an affirmative answer
 Otherwise, it must either say No if its knowledge of local restaurants is complete, or say that it does
not know

CSA4006-Dr. Anirban Bhowmick

Unambiguous Representations
 The domain of semantics is subject to ambiguity
 Single linguistic inputs can legitimately have different meaning representations assigned to them
based on the circumstances in which they occur.

The cat is on the mat

Ambiguity:
The phrase "on the mat" might have multiple interpretations, as it could refer to a physical location or 14
imply a scolding or disciplinary action.

Unambiguous representations are crucial for NLP tasks to enhance the accuracy and reliability of
natural language understanding systems.

CSA4006-Dr. Anirban Bhowmick

Vagueness
A concept closely related to ambiguity is vagueness
 Like ambiguity, vagueness can make it difficult to determine what to do with a particular input
based on its meaning representation
 Vagueness, however, does not give rise to multiple representations
 Consider the following request as an example
I want to eat Italian food
Use of the phrase Italian food may provide enough information for a restaurant advisor to provide 15
reasonable recommendations
 It is nevertheless quite vague as to what the user really wants to eat
 A vague representation of the meaning of this phrase may be appropriate for some purposes,
while a more specific representation may be needed for other purposes

CSA4006-Dr. Anirban Bhowmick

Canonical Form
The notion that single sentences can be assigned multiple meanings leads to the related phenomenon of
distinct inputs that should be assigned the same meaning representation

 Does Maharani have vegetarian dishes?

 Do they have vegetarian food at Maharani?
 Are vegetarian dishes served at Maharani? 16
 Does Maharani serve vegetarian fare?

CSA4006-Dr. Anirban Bhowmick

Inference and Variables
Can vegetarians eat at Maharani?

 The term inference to refer generically to a system’s ability to draw valid conclusions based on the
meaning representation of inputs and its store of background knowledge
 It must be possible for the system to draw conclusions about the truth of propositions that are not
explicitly represented in the knowledge base, but are nevertheless logically derivable from the
propositions that are present 17
 I’d like to find a restaurant where I can get vegetarian food.
 In this examples, this request does not make reference to any particular restaurant
 The user is stating that they would like information about an unknown and unnamed entity that is a
restaurant that serves vegetarian food
 Answering this request requires a more complex kind of matching that involves the use of variables
 A representation containing such variables as follows

Serves(x; Vegetarian Food)

CSA4006-Dr. Anirban Bhowmick

Expressiveness
Expressiveness in meaning representation in NLP refers to the ability of a representation system to
capture the richness and diversity of meanings present in natural language. An expressive
representation should be able to convey nuanced relationships, distinctions, and semantic intricacies
inherent in human language. Here's an example to illustrate expressiveness
The conference room echoed with the enthusiastic applause of the audience.

This representation captures the expressiveness of the sentence by not only representing the basic 18

actions and entities but also incorporating additional details about the manner of applause and the
specific location of the event. It goes beyond a simple surface-level representation and delves into the
nuanced aspects of the sentence's meaning.

CSA4006-Dr. Anirban Bhowmick

Meaning Structure of Language
These include a variety of conventional form
 Meaning associations
 Word order regularities
 Tense systems
 Conjunctions and quantifiers
 A fundamental predicate argument structure
19
A predicate is a statement about a subject that either is true or false. It expresses a property or a
relation. Predicates often use verbs to convey actions or states.
Examples:
The cat is on the mat.
Predicate: "is on the mat"
Subject: "The cat"

 Predicates: Primarily Verbs , VPs , Sentences, sometimes Nouns and NPs

 Arguments: Primarily Nouns, Nominals, NPs, PPs.

CSA4006-Dr. Anirban Bhowmick

Meaning Structure of Language
Argument:
An argument is a value that is applied to a function or, in logic, a subject that satisfies a predicate. In
simpler terms, it is what the predicate is about.

Examples:
1.In "The cat is on the mat," "The cat" is the argument of the predicate "is on the mat."
2.In "She likes to read books," "She" is the argument of the predicate "likes to read books." 20
3.In "The sun sets in the west," "The sun" is the argument of the predicate "sets in the west."

 Predicates: Primarily Verbs , VPs , Sentences, sometimes Nouns and NPs

 Arguments: Primarily Nouns, Nominals, NPs, PPs.

CSA4006-Dr. Anirban Bhowmick

Contd.
These examples can be classified as having one of the three syntactic argument frames
I want Italian food  NP want NP
I want to spend less than five dollars  NP want Inf VP
I want it to be close by here  NP want NP Inf VP
 These syntactic frames specify the number, position and syntactic category of the arguments that are
expected.
 The frame for the variety of want that appears in Example 1 specifies the following facts 21
 There are two arguments to this predicate.
 Both arguments must be NPs.
 The first argument is pre verbal and plays the role of the subject.
 The second argument is post verbal and plays the role of the direct object.

CSA4006-Dr. Anirban Bhowmick

Contd.
 Semantic roles and Semantic restrictions on these roles
 The notion of a semantic role can be understood by looking at the similarities among the arguments in
Examples 1 to 4.
 The study of roles associated with specific verbs and across classes of verbs is usually referred to as
thematic role or case role
 The notion of semantic restrictions arises directly from these semantic roles

 Consider the following phrases from the BERP corpus 22

An Italian restaurant under fifteen dollars

 In this example, the meaning representation associated with the preposition under can be seen as
having
 something like the following structure
Under(Italian Restaurant ; $15)
 Prepositions can be characterized as two argument predicates where the first argument is an object that
is being placed in some relation to the second argument

CSA4006-Dr. Anirban Bhowmick

Contd.
Another non verb based predicate argument structure example

Make a reservation for this evening for a table for two persons at 8

There are two types of Propositions:

Atomic Propositions
Compound propositions

CSA4006-Dr. Anirban Bhowmick

Propositional Logic
Atomic Propositions:
Definition: An atomic proposition is one whose truth or falsity does not depend on the truth or
falsity of any other proposition
Example:
"The Sun is cold“
2+2 is 4
15

Compound Propositions:
Compound propositions are constructed by combining simpler or atomic propositions, using
parenthesis and logical connectives.
Example:
"It is raining today, and street is wet."
"Ankit is a doctor, and his clinic is in Mumbai."

CSA4006-Dr. Anirban Bhowmick

Propositional Logic
Logical Connectives:

Implication: In propositional logic, we have a connective that combines two propositions into a new 16
proposition called the conditional

If it is raining, then the street is wet.

Let P= It is raining, and Q= Street is wet, so it is
represented as P → Q

If x is a variable, the existential quantifier will be ∃x:

For some x 23
There exists an x
For at least one x

Example
Some people like Football. ∃x: people(x) ∧ likes Football(x)

CSA4006-Dr. Anirban Bhowmick

Scope and Free & Bound Variables
∀x[Person(x)] ∧ Happy(x)

(Every x is a person) and x is happy

Everyone is a person and he is happy

∀x[Person(x) ∧ Happy(x)] 24

(Every x is a person and every x is happy)

Everyone is happy

CSA4006-Dr. Anirban Bhowmick

Examples
1. Some boys hate football

∃x: boys(x) ∧ hate( x, Football)

2. Every person who buys a Policy is smart

∀x ∀y: Person(x) ∧ Policy(y)^buys(x,y)Smart(x)

25
3. No person buys expensive Policy
∀x ∀y: Person(x) ∧ Policy(y)^expensive(y) ¬ buys(x,y)
4. Mary loves everyone
∀x: (person(x) → love (Mary, x))

5. Everyone loves everyone except himself

∀x ∀y: (x ≠y → L(x,y))

CSA4006-Dr. Anirban Bhowmick

Scope Ambiguity
Every student loves some teacher

(Every student)x loves (some teacher)y

One way : (Every student)x (some teacher)y x loves y

∀x [student(x)  ∃y[teacher(y) ∧ loves(x,y)]] 26

Another way : (some teacher)y (Every student)x x loves y

∃y[teacher(y)] ^ ∀x [student(x)  loves(x,y)]]

CSA4006-Dr. Anirban Bhowmick

Variables and Quantifiers
Consider the following example.

A restaurant that serves Mexican food near ICSI.

The following would be a reasonable representation of the

meaning of such a phrase. 27

Restaurant(x) ∧ Serves(x; Mexican Food) ∧ Near((Location of (x); Location of (ICSI))

CSA4006-Dr. Anirban Bhowmick

Contd.
For example, if AyCaramba is a Mexican restaurant near ICSI, then substituting
AyCaramba for x results in the following logical formula

Restaurant(AyCaramba) ∧ Serves(AyCaramba; Mexican Food) ∧ Near((Location of

John likes oranges but he doesn’t like apples

Mary is studying pharmacy or medicine

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
Everyone likes Venice

Horses are mammals which are animals

10
All that John inherited was a book

John inherited all of the books

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
Existential quantifier- ∃x p(x) and is read as: there exists one x such as p(x) or there is atleast one x
such as p(x)

There is at least one bird in the forest

John and Mary are siblings

11
There is one person who likes salad

Everyone likes someone and no one likes everyone

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
The negation connectives and the quantifiers have the highest priority. Then come the connectives of
conjunction and disjunction. After that, implication, and finally the biconditional has the lowest priority.

Similar formulae:

∀x ¬ P  ¬ ∃x P
12
Example:
Nobody likes John : ∀x ¬ like(x,John)  ¬ ∃x like(x,John)

¬ ∀x P  ∃x ¬ P
Example:
There is at least one person who doesnot like John: ¬ ∀x like (x,John)  ∃x ¬ like
(x,John)

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
Similar formulae:

Parse tree to Meaning Representation
How is the mapping from parse tree to meaning representation done?

Augment the lexicon and grammar rules with semantic attachment – devise a mapping between
rules of the grammar and rules of semantic representation (rule to rule hypothesis)

An augmented rule can take the form

The text appearing within brackets specifies the meaning representation assigned to A as a function of
the semantic attachment of A’s constituents

CSA4006-Dr. Anirban Bhowmick

Contd.
President nominates speaker

Noun  President {President}

Noun  Speaker {speaker}

{President} and {speaker} are meaning associated with the augmented rules
18

NP -> Noun {𝑵𝒐𝒖𝒏𝒔𝒆𝒎 }

Verb  nominates (∃e,x,y nomination (e) ∧ nominator (e,x) ∧ nominee (e,y))

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 16
Syllabus

 Replaces the variable x with Taj and removes λ 13

 With λ calculus, VP semantics problem can be solved

CSA4006-Dr. Anirban Bhowmick

Beta reduction
(λx.love(x, mary)) (john)

1. Strip off the λ prefix

(love(x, mary)) (john)

2. Remove the argument

14
love(x, mary)

3. Replace all occurrences of λ-bound variable by argument

love(john, mary)

CSA4006-Dr. Anirban Bhowmick

Rules
Rule  1 If ∝ is a terminal node, then [| ∝ |] is specified in the lexicon

Rule  2 if [| ∝ |] is a non-branching node, and 𝛽 is its daughter node then [| ∝ |] = [| 𝛽 |]

Rule  3 if ∝ is a branching node, {𝛽, 𝛾} is the set of daughters and [| 𝛽 |] is a function whose domain
contain [| 𝛾 |], then [| ∝ |] = [|𝛽|] ([|𝛾|] )
15

Specifically, the following operators will be applied to the FOPC representations

 DCL declaratives
 IMP imperatives
 YNQ yes no questions
 WHQ wh question 10

• The normal interpretation for a representation headed by the DCL operator would be as a factual
statement to be added to the current knowledge base.

• Imperative sentences begin with a verb phrase and lack an overt subject. Because of the missing subject,
the meaning representation for the main verb phrase will consist of a λ expression with an unbound λ
variable representing this missing subject

CSA4006-Dr. Anirban Bhowmick

Contd.
 Simply supply a subject to the λ-expression by applying a final λ-reduction to a dummy constant.

 The IMP operator can then be applied to this representation as in the following semantic
attachment.

11
 Applying this rule

 Imperatives can be viewed as a kind of speech act

CSA4006-Dr. Anirban Bhowmick

Contd.
 yes-no-questions consist of a sentence initial auxiliary verb, followed by a subject noun phrase and
then a verb phrase.

 The following semantic attachment simply ignores the auxiliary, and with the exception of the YNQ 12

operator
 Yes or No Questions should be thought as asking the whether the propositional part of its meaning
is true or false given the knowledge currently contained in the knowledge-base.

CSA4006-Dr. Anirban Bhowmick

Contd.
 wh-subject-questions ask for specific information about the subject of the sentence rather than
the sentence as a whole.

 The following attachment produces a representation that consists of the operator WHQ, the
variable corresponding to the subject of the sentence, and the body of the proposition.
13

CSA4006-Dr. Anirban Bhowmick

Contd.
 Such questions can be answered by returning a set of assignments for the subject variable that
make the resulting proposition true with respect to the current knowledge base.
 Finally, consider the following wh non subject question.

How can I go from Minneapolis to Long Beach?

 The question is not about the subject of the sentence but rather some other argument, or some 14

aspect of the proposition as a whole.

CSA4006-Dr. Anirban Bhowmick

Why should you study Machine Translation?
 One of the most challenging problems in Natural Language Processing
 Pushes the boundaries of NLP
 Involves analysis as well as synthesis
 Involves all layers of NLP: morphology, syntax, semantics, pragmatics,
discourse
13
 Theory and techniques in MT are applicable to a wide range of other
problems like transliteration, speech recognition and synthesis

CSA4006-Dr. Anirban Bhowmick

Why is Machine Translation interesting?

Language Divergence  the great diversity among languages of the world

The central problem of MT is to bridge

this language divergence

CSA4006-Dr. Anirban Bhowmick

Language Divergence
Word order: SOV (Hindi), SVO (English), VSO, OSV

E: Argentina won the last World Cup

H: अजें टीना ने पपछला पवश्व कप जीर्ा था

Free (Hindi) vs rigid (English) word order

पपछला पवश्व कप अजें टीना ने जीर्ा था (correct)

The last World Cup Argentina won (grammatically incorrect)

The last World Cup won Argentina (meaning changes)

CSA4006-Dr. Anirban Bhowmick

Language Divergence.

Different ways of expressing same concept

water  पानी, जल, नीर
16

Language registers
Formal: आप बैठिये Informal: तू बैि
Standard : मझ
ु े डोसा चाठिए Dakhini: मेरे को डोसा िोना

CSA4006-Dr. Anirban Bhowmick

Why is Machine Translation difficult?
● Ambiguity
○ Same word, multiple meanings: मंत्री (minister or chess piece)
○ Same meaning, multiple words: जल, पानी, नीि (water)

● Word Order
○ Underlying deeper syntactic structure 17

○ Phrase structure grammar?

○ Computationally intensive

1. Phrase fragment matching: (data-driven)

he buys
a book 22
international politics

2. Translation of segments: (data-driven)

वह खिीदर्ा है
एक पकर्ाब
अंर्ि िाष्ट्रीय िाजनीपर्
● Partly rule-based, partly data-
driven. 3. Recombination: (human crafted rules/templates)
● Good methods for matching वह अंर्ि िाष्ट्रीय िाजनीपर् पि एक पकर्ाब खिीदर्ा है
and large corpora did not exist
when proposed
CSA4006-Dr. Anirban Bhowmick
23

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 18
Topic: Statistical Machine Translation
Syllabus

CSA4006-Dr. Anirban Bhowmick

SMT
 Parallel corpora are available in several language pairs.

 Basic idea: use a parallel corpora as a training set of translation examples

 Classis example: IBM work on French-English translation, using the Canadian Hansards.
(1.7 million sentences of 30 words or less in length)
9
 Idea goes back to Warren Weaver (1949): suggested applying statistical and cryptanalytic
techniques to translation.

….one naturally wonders if the problem of translation could conceivably be treated as

a problem in cryptography. When I look at an article in Russian, I say: “This is really
written in English, but it has been coded in some strange symbols. I will now proceed
to decode”
(Warren Weaver, 1949, in a letter to Norbert Wiener)

CSA4006-Dr. Anirban Bhowmick

The Noisy Channel Model
 Goal: translation system from French to English

 Have a model p(e|f) which estimates conditional probability of any English sentence e
given the French sentence f. Use the training corpus to set the parameters.

 A Noisy Channel Model has two components:

10
p(e) the language model
p(f|e) the translation model

CSA4006-Dr. Anirban Bhowmick

NCM

CSA4006-Dr. Anirban Bhowmick

SMT
Let’s formalize the translation process

We will model translation using a probabilistic model. Why?

- We would like to have a measure of confidence for the translations we learn
- We would like to model uncertainty in translation
12

Model: a simplified and idealized understanding of a physical process

CSA4006-Dr. Anirban Bhowmick

SMT

Why use this counter-intuitive way of explaining translation?

● Makes it easier to mathematically represent translation and learn probabilities

● Fidelity and Fluency can be modelled separately

CSA4006-Dr. Anirban Bhowmick

SMT
We have already seen how to learn n-gram language models

14
Let’s see how to learn the translation model  𝑃(𝒇|𝒆)

To learn sentence translation probabilities,

 we first need to learn word-level translation probabilities

That is the task of word alignment

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 19
Topic: Statistical Machine Translation
Syllabus

CSA4006-Dr. Anirban Bhowmick

IBM Model

CSA4006-Dr. Anirban Bhowmick

CSA4006-Dr. Anirban Bhowmick

Contd.

CSA4006-Dr. Anirban Bhowmick

Contd.
The encoder
Layers of recurrent units where, in each time step, an input token is received, collecting relevant
information and producing a hidden state. This depends on the type of RNN; in our example, a LSTM,
the unit mixes the current hidden state and the input and returns an output, discarded, and a new
hidden state.
The encoder vector
The encoder vector is the last hidden state of the encoder, and it tries to contain as much of the useful 12

input information as possible to help the decoder get the best results. It’s the only information from the
input that the decoder will get.
The decoder
Layers of recurrent units — e.g., LSTMs — where each unit produces an output at a time step t. The
hidden state of the first unit is the encoder vector, and the rest of the units accept the hidden state
from the previous unit. The output is calculated using a softmax function to obtain a probability for
every token in the output vocabulary.

CSA4006-Dr. Anirban Bhowmick

Problem and Solution
Why? Longer sentences illustrate the limitations of a single-directional encoder-decoder architecture.

Because language consists of tokens and grammar, the problem with this model is it does not entirely
address the complexity of the grammar.

Specifically, when translating the nth word in the source language, the RNN was considering only the
1st n-word in the source sentence, but grammatically, the meaning of a word depends on both the 13
sequence of words before and after it in a sentence:

A solution: The bi-directional LSTM model. If we use a bi-directional model, it allows us to input the
context of both past and future words to create an accurate encoder output vector:

CSA4006-Dr. Anirban Bhowmick

Bi-LSTM

But then, the challenge then becomes, which word do we need to focus on in a sequence?

CSA4006-Dr. Anirban Bhowmick

Attention Mechanism
Attention Mechanism Overview:
The attention mechanism enhances
the traditional encoder-decoder
architecture by allowing the decoder
to "pay attention" to different parts of
the source sentence when generating
each word in the target sequence.
15

CSA4006-Dr. Anirban Bhowmick

Attention Mechanism

CSA4006-Dr. Anirban Bhowmick

Video Tutorials

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

PDF Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman download
50% (2)
PDF Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman download
65 pages
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai PDF DOCX
100% (1)
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai PDF DOCX
55 pages
Instant download Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman pdf all chapter
67% (3)
Instant download Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman pdf all chapter
55 pages
Prompt Engineering
0% (1)
Prompt Engineering
2 pages
Scan To BIM - Presentation
No ratings yet
Scan To BIM - Presentation
61 pages
The Hundred Page Machine Learning Book
No ratings yet
The Hundred Page Machine Learning Book
7 pages
Delta Module One Exam Report 2015
100% (1)
Delta Module One Exam Report 2015
61 pages
Bro Log Vars
No ratings yet
Bro Log Vars
6 pages
What Is A Support Vector Machine?: Primer
No ratings yet
What Is A Support Vector Machine?: Primer
3 pages
Natural Language Processing (NPL) : Group Name: Goal Diggers
No ratings yet
Natural Language Processing (NPL) : Group Name: Goal Diggers
22 pages
Learn To Read Biblical Hebrew Volume 2 PDF
No ratings yet
Learn To Read Biblical Hebrew Volume 2 PDF
3 pages
Lecture 1 Intro
No ratings yet
Lecture 1 Intro
38 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Crud Rag
No ratings yet
Crud Rag
31 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Lecture 4.a - Greedy Algorithms
No ratings yet
Lecture 4.a - Greedy Algorithms
45 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
No ratings yet
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
281 pages
64 Natural Language Processing Interview Questions and Answers-18 Juli 2019
No ratings yet
64 Natural Language Processing Interview Questions and Answers-18 Juli 2019
30 pages
First-Order Logic in Artificial Intelligence - Javatpoint
No ratings yet
First-Order Logic in Artificial Intelligence - Javatpoint
10 pages
Unit I
No ratings yet
Unit I
30 pages
Chapter 3 AI
No ratings yet
Chapter 3 AI
24 pages
EDA - The Right Way
No ratings yet
EDA - The Right Way
111 pages
Artificial Intelligence: Introduction To
No ratings yet
Artificial Intelligence: Introduction To
43 pages
AI Project Cycle Key
No ratings yet
AI Project Cycle Key
10 pages
Top 50 Machine Learning Interview Q A
No ratings yet
Top 50 Machine Learning Interview Q A
13 pages
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Machine Learning
100% (1)
Machine Learning
189 pages
roadmap-to-crack-data-science-ml-interviews
No ratings yet
roadmap-to-crack-data-science-ml-interviews
22 pages
A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)
No ratings yet
A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)
9 pages
Natural Language Processing and Information Retrieval Principles and Applications (Muskan Garg Etc.) (Z-Library)
No ratings yet
Natural Language Processing and Information Retrieval Principles and Applications (Muskan Garg Etc.) (Z-Library)
271 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
26 pages
Python Programming
No ratings yet
Python Programming
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
No ratings yet
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
12 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
100% (1)
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
65 pages
Bhawini NLP File
No ratings yet
Bhawini NLP File
100 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
Altoros Tensorflow Cheat Sheet
100% (1)
Altoros Tensorflow Cheat Sheet
1 page
Deep Learning Systems Algorithms Compilers and Processors for Large-Scale Production [BooksRack.net]
No ratings yet
Deep Learning Systems Algorithms Compilers and Processors for Large-Scale Production [BooksRack.net]
267 pages
Exploring The Security Risks of Using Large Language Models
100% (1)
Exploring The Security Risks of Using Large Language Models
15 pages
Towards An Approach of Ontological Coreference Resolution For Sentiment Analysis
No ratings yet
Towards An Approach of Ontological Coreference Resolution For Sentiment Analysis
11 pages
AI Lecture 3
No ratings yet
AI Lecture 3
24 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
Semantic Web Introduction
No ratings yet
Semantic Web Introduction
35 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
LlamaIndex Talk (W&B Fully Connected 2024)
No ratings yet
LlamaIndex Talk (W&B Fully Connected 2024)
38 pages
Face Photo Sketch Recognition Using Deep
No ratings yet
Face Photo Sketch Recognition Using Deep
6 pages
Lecture 3.b - Dynamic Programming PDF
No ratings yet
Lecture 3.b - Dynamic Programming PDF
46 pages
Performance Analysis of LoRA Finetuning Llama-2
No ratings yet
Performance Analysis of LoRA Finetuning Llama-2
4 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
No ratings yet
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
7 pages
Data Science Full Roadmap
No ratings yet
Data Science Full Roadmap
2 pages
LCM LoRA Technical Report
No ratings yet
LCM LoRA Technical Report
7 pages
Revision Questions and Anwers On AI
No ratings yet
Revision Questions and Anwers On AI
14 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
Object Oriented Analysis and Design
No ratings yet
Object Oriented Analysis and Design
120 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
Active and Passive Voice
No ratings yet
Active and Passive Voice
3 pages
Kaushanskaya V L Kovner R L Kozhevnikova o N I DR A Grammar PDF
No ratings yet
Kaushanskaya V L Kovner R L Kozhevnikova o N I DR A Grammar PDF
324 pages
Theories About The Origins of Language
50% (2)
Theories About The Origins of Language
2 pages
24AO002282 SummaryPDFEn
No ratings yet
24AO002282 SummaryPDFEn
5 pages
Listening To The Ethnic Voice in Esl Learning: Mardziah Hayati Abdullah Wong Bee Eng
No ratings yet
Listening To The Ethnic Voice in Esl Learning: Mardziah Hayati Abdullah Wong Bee Eng
12 pages
Argumentative Writing
No ratings yet
Argumentative Writing
4 pages
3.written Communication
No ratings yet
3.written Communication
19 pages
ED1REPORT
No ratings yet
ED1REPORT
120 pages
Synonym MCQ Questions and Answer - English Synonym - Verbal Ability - English Aptitude
No ratings yet
Synonym MCQ Questions and Answer - English Synonym - Verbal Ability - English Aptitude
6 pages
Manual On Transcription
No ratings yet
Manual On Transcription
93 pages
5 LANGAUGES in CONTEMPORARY WORLD
No ratings yet
5 LANGAUGES in CONTEMPORARY WORLD
69 pages
13 Science of Tajwid 4
No ratings yet
13 Science of Tajwid 4
6 pages
Facilitating Functions of First Language (L1) in The Conversation of Efl Learners
No ratings yet
Facilitating Functions of First Language (L1) in The Conversation of Efl Learners
15 pages
CH-1 IntroToAutomataTheory
No ratings yet
CH-1 IntroToAutomataTheory
35 pages
Class 7
No ratings yet
Class 7
4 pages
Accent
No ratings yet
Accent
3 pages
Levis 2018
No ratings yet
Levis 2018
9 pages
Exercice Sur SFL, Pragmatique Et Dialectologie
No ratings yet
Exercice Sur SFL, Pragmatique Et Dialectologie
8 pages
BIALOSTOSKY (1978) Coleridge S Interpretation of Wordsworth
No ratings yet
BIALOSTOSKY (1978) Coleridge S Interpretation of Wordsworth
14 pages
Achiever Set 3 A B1 B2 Answer Key-DSE
No ratings yet
Achiever Set 3 A B1 B2 Answer Key-DSE
8 pages
Reported Speech - Lesson
No ratings yet
Reported Speech - Lesson
60 pages
Process Writing Task 1
No ratings yet
Process Writing Task 1
11 pages
Pentru An
No ratings yet
Pentru An
21 pages
Vocatives in Nkengasong'S Black Caps and Red Feathers and Achebe'S
No ratings yet
Vocatives in Nkengasong'S Black Caps and Red Feathers and Achebe'S
17 pages
Ielts Academic Writing Sample Script PDF
No ratings yet
Ielts Academic Writing Sample Script PDF
6 pages
CALL, Technology. Language Learning
100% (1)
CALL, Technology. Language Learning
18 pages
Wise Up Book
No ratings yet
Wise Up Book
8 pages
Personal Pronouns
No ratings yet
Personal Pronouns
8 pages