Unit 1 NLP Introduction
Unit 1 NLP Introduction
Computers
Semantics Parsing
11/28/2024 CEN 1
Natural language and NLP
Natural Language
Refers to the language spoken by people, e.g. English,
Hindi, as opposed to artificial languages, like C++, Java, etc.
Natural Language Processing
Natural language processing (NLP) is a field of computer
science, artificial intelligence, and linguistics concerned with
the interactions between computers and human (natural)
languages.
human–computer interaction.
Computational Linguistics
Computational linguistics (CL) is a discipline
between linguistics and computer science which
is concerned with the computational aspects of
the human language.
[Computational Linguistics
Doing linguistics on computers
More on the linguistic side than NLP, but closely
related ]
11/28/2024 CEN 3
Natural language and NLP
Ultimate goal
– To build computer systems that perform as well at
using natural language as humans do(making
computers as intelligent as people) .
Immediate goal
– To build computer systems that can process text
and speech more intelligently.
Enable Human-Machine Communication
Improving Human-Human communication
Processing of Text or Speech
11/28/2024 CEN 4
Natural language and NLP
Ultimate goal
– To build computer systems that perform as well at
using natural language as humans do(making
computers as intelligent as people) .
Immediate goal
– To build computer systems that can process text
and speech more intelligently.
Enable Human-Machine Communication
Improving Human-Human communication
Processing of Text or Speech
What is the difference b/w Natural Language and
Programming Language ?
11/28/2024 CEN 5
11/28/2024 CEN 6
Human Languages
You know ~50,000 words of primary
language, each with several meanings
six year old knows ~13000 words
First 16 years we learn 1 word every 90
min of waking time
Mental grammar generates sentences -
virtually every sentence is novel
3 year olds already have 90% of grammar
~6000 human languages – none of them
simple!
Adapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society London
11/28/2024 CEN 7
Origins of NLP
It is originated from the idea of Machine
Translation (MT) which came to existence during the
second world war.
11/28/2024 CEN 11
Natural Language Understanding
Only interpretation of Language -only
understanding.
natural language understanding -- that is,
enabling computers to derive meaning from
human or natural language input.
Applications
Database queries
Search Engine
Automatic translation system
Story understanding
11/28/2024 CEN 12
Natural Language Generation
Language production
NLG system is like a translator that converts
a computer based representation into a
natural language representation.
NLG may be viewed as the opposite of
natural language understanding: whereas in
natural language understanding the system
needs to disambiguate the input sentence to
produce the machine representation
language, in NLG the system needs to make
decisions about how to put a concept into
words.
11/28/2024 CEN 13
Why NLP
NLP products are urgently needed for improving
human-machine interaction since the main obstacle
in the interaction between human and computer is
one of communication.
11/28/2024 CEN 14
Why NLP
How can we tell computers about
language? (Or help them learn it as kids
do?)
It would be great if machines Could
Process our emails
Translate languages accurately
Help us manage, summarize, and
Aggregate information
Understand phone conversation
Talk to us /listen to us
Why NLP?
Human language interesting & challenging
NLP offers insights (deep understanding) into
language
Language is the medium of the web
Interdisciplinary: Ling, CS, psych, math
Help in communication
With computers (ASR, TTS)
With other humans (MT)
Huge amounts of data
Internet = at least 20 billions pages
Applications for processing large amounts of texts
Why NLP?
kJfmmfj mmmvvv nnnffn333
Uj iheale eleee mnster vensi credur
Baboi oi cestnitze
Coovoel2^ ekk; ldsllk lkdf vnnjfj?
Fgmflmllk mlfm kfre xnnn!
11/28/2024 CEN 17
Computers Lack Knowledge!
Computers “see” text in English the same
you have seen the previous text!
People have no trouble understanding
language
Common sense knowledge
Reasoning capacity
Experience
Computers have
No common sense knowledge
No reasoning capacity
11/28/2024 CEN 18
NLP and Computational Models
Two types
Knowledge driven
Grammar rules
Semantic Structure
Data driven
Using ML for Syntactic pattern
Human effort is less
Performance depends on quality and quantity
11/28/2024 CEN 19
Classical NLP and Statistical NLP
Classical NLP
Linguist
rules
Computer
rules/probabilities
Text data
corpus
Statistical /ML NLP
11/28/2024 CEN 20
Why NLP is difficult
Natural language is extremely rich in form and
structure, and very ambiguous.
How to represent meaning,
Which structures map to which meaning structures.
One input can mean many different things.
Ambiguity can be at different levels.
Lexical (word level) ambiguity -- different meanings
of words
Syntactic ambiguity -- different ways to parse the
sentence
Interpreting partial information -- how to interpret
pronouns
11/28/2024 CEN 21
Language Processing
Level 1 – Speech sound (Phonetics &
Phonology)
Level 2 – Words & their forms
(Morphology, Lexicon)
Level 3 – Structure of sentences (Syntax,
Parsing)
Level 4 – Meaning of sentences
(Semantics)
Level 5 – Meaning in context & for a
purpose (Pragmatics)
Level 6 – Connected sentence processing
in a larger body
03/01/06 of text
Prof. Pushpak (Discourse)
Bhattacharyya, IIT 22
Bombay
Ambiguity
Phonetic
[raIt] = write, right, rite
Lexical/morphological
book= noun, verb
Syntactic/Structural
I saw the man with the telescope
Semantic
dish = physical plate, menu item
• Pragmatic level Ambiguity
• Kabir and Ayan are married
• Kabir and Suha are married.
• Discourse level Ambiguity
• John went to the club on Saturday. He met Sam.
11/28/2024 CEN 23
Ambiguity
Some interpretations of : I made her
duck.
1. I cooked duck for her.
2. I cooked duck belonging to her.
3. I created a toy duck which she owns.
4. I caused her to quickly lower her head or body.
5. I used magic and turned her into a duck.
duck – morphologically and syntactically
ambiguous: noun or verb.
her – syntactically ambiguous: dative or
possessive.
make – semantically ambiguous: cook or
create.
11/28/2024 CEN 24
Ambiguous Sentences
Time flies like an arrow. Fruit flies like a banana.
[Lexical]
It's a play on words:
Time flies (i.e. goes by quickly) like (i.e. similar to) an arrow soars in the air, fruit flies (i.e.
insects) like (i.e. prefer) a banana.
We saw her duck.
The mouse is on my desk.
11/28/2024 CEN 25
Garden Path Sentence
The sentence uses "flies" as a verb and then
"flies as a noun. It uses "like" as a preposition
and then "like" as a verb.
11/28/2024 CEN 27
Eliza
http://www.manifestation.com/neurotoys/
eliza.php3
11/28/2024 CEN 28
Stages in NLP
11/28/2024 CEN 29
Different levels of Language Analysis
A natural language-system must use
considerable knowledge about the structure
of the language itself, including
what the words are,
how words combine to form sentences,
what the words mean,
how word meanings contribute to sentence
meanings, and so on.
11/28/2024 CEN 30
The Steps in NLP
Discourse
Pragmatics
Semantics
Syntax
Morphology
11/28/2024 CEN 31
The steps in NLP (Cont.)
Morphology: Concerns the way words are
built up from smaller meaning bearing units.
(come(s))
Syntax: concerns how words are put
together to form correct sentences and what
structural role each word has.
Semantics: concerns what words mean and
how these meanings combine in sentences to
form sentence meanings.
Pragmatics: concerns how sentences are
used in different situations and how use
affects the interpretation of the sentence.
Discourse: concerns how the immediately
preceding sentences affect the interpretation
of the nextCEN
11/28/2024 sentence. 32
Phonology
Phonetic and phonological knowledge –
concerns how words are related to the sounds that
realize them. Such knowledge is crucial for speech-
based systems.
Phoneme is the smallest unit of sound
Important in Speech-R and Speech-G
11/28/2024 CEN 33
Morphology
Morphological knowledge –
concerns how words are constructed
from more basic meaning units called
morphemes. A morpheme is the primitive
unit of meaning in a language.
Lexical Analysis-All about words
Require morph-knowledge
Smallest unit is morphemes
11/28/2024 CEN 34
Morphology
Word formation rules from root words
Nouns: Plural (boy-boys); Verbs: Tense
(stretch-stretched);
Languages rich in morphology: e.g., Dravidian,
Hungarian, Turkish
Languages poor in morphology: Chinese,
English
Languages with rich morphology have the
advantage of easier processing at higher
stages of processing
11/28/2024 CEN 35
Syntax
Syntactic knowledge –
concerns how words can be put together to form correct
sentences and determines what structural role each word plays in
the sentence and what phrases are subparts of what other
phrases.
Finding the structure
Relationship b/w words in a sentence
Knowledge related to forming grammatically correct sentences
Assigning a syntactic and logical form to an input sentence
uses knowledge about word and word meanings (lexicon)
uses a set of rules defining legal structures (grammar)
11/28/2024 CEN 39
Reference Resolution
Discourse knowledge-
concerns how the immediately preceding
sentences affect the interpretation of the next
sentence.
Paragraph-document is an input
o/p is structure and meaning
Discourse Knowledge
U: I would like to open a fixed deposit account.
S: For what amount?
U: Make it for 800 dollars.
S: For what duration?
U: What is the interest rate for 3 months?
S: Six percent.
11/28/2024 CEN 40
Inferring Knowledge from text
Words
word frequencies
collocations
word sense
n-grams (words appear in certain order)
Grammar
word categories
syntactic structure
Discourse
Sentence meanings
Applications
Information Retrieval
Information Extraction
Natural language interface
Statistical Machine Translation
11/28/2024 CEN 41
Simple Applications
Word counters (wc in UNIX)
Spell Checkers, grammar checkers
Predictive Text on mobile handsets
11/28/2024 CEN 42
More significant Applications
Intelligent computer systems
NLU interfaces to databases
Computer aided instruction, automatic graders
Intelligent Web searching
Speech recognition
Natural language generation
Machine translation
spelling/grammar correction
Information Retrieval
Data mining
Document classification
Question answering, conversational agents
11/28/2024 CEN 43
11/28/2024 CEN 44
11/28/2024 CEN 45
11/28/2024 CEN 46
Standard NLP tools
NLTK (https://www.nltk.org/)
11/28/2024 CEN 47
Thank you
11/28/2024 CEN 48