0% found this document useful (0 votes)
8 views

Unit 1 NLP Introduction

Uploaded by

sans323597
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit 1 NLP Introduction

Uploaded by

sans323597
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Taxonomy

Computers

Databases Artificial Intelligence Algorithms Networking

Robotics Natural Language Processing Search

Information Machine Language


Retrieval Translation Analysis

Semantics Parsing

11/28/2024 CEN 1
Natural language and NLP
 Natural Language
 Refers to the language spoken by people, e.g. English,
Hindi, as opposed to artificial languages, like C++, Java, etc.
 Natural Language Processing
 Natural language processing (NLP) is a field of computer
science, artificial intelligence, and linguistics concerned with
the interactions between computers and human (natural)
languages.
 human–computer interaction.
Computational Linguistics
Computational linguistics (CL) is a discipline
between linguistics and computer science which
is concerned with the computational aspects of
the human language.
[Computational Linguistics
 Doing linguistics on computers
 More on the linguistic side than NLP, but closely
related ]

11/28/2024 CEN 3
Natural language and NLP
 Ultimate goal
 – To build computer systems that perform as well at
using natural language as humans do(making
computers as intelligent as people) .
 Immediate goal
 – To build computer systems that can process text
and speech more intelligently.
 Enable Human-Machine Communication
 Improving Human-Human communication
 Processing of Text or Speech

11/28/2024 CEN 4
Natural language and NLP
 Ultimate goal
 – To build computer systems that perform as well at
using natural language as humans do(making
computers as intelligent as people) .
 Immediate goal
 – To build computer systems that can process text
and speech more intelligently.
 Enable Human-Machine Communication
 Improving Human-Human communication
 Processing of Text or Speech
 What is the difference b/w Natural Language and
Programming Language ?
11/28/2024 CEN 5
11/28/2024 CEN 6
Human Languages
You know ~50,000 words of primary
language, each with several meanings
six year old knows ~13000 words
First 16 years we learn 1 word every 90
min of waking time
Mental grammar generates sentences -
virtually every sentence is novel
3 year olds already have 90% of grammar
~6000 human languages – none of them
simple!
Adapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society London

11/28/2024 CEN 7
Origins of NLP
 It is originated from the idea of Machine
Translation (MT) which came to existence during the
second world war.

 The primary idea was to convert one human language


to another human language, for example, turning the
Russian language to English language using the brain
of the Computers but after that, the thought of
conversion of human language to computer language
and vice-versa emerged, so that communication with
the machine became easy.
11/28/2024 CEN 8
11/28/2024 CEN 9
11/28/2024 CEN 10
NLP

11/28/2024 CEN 11
Natural Language Understanding
Only interpretation of Language -only
understanding.
natural language understanding -- that is,
enabling computers to derive meaning from
human or natural language input.
Applications
Database queries
Search Engine
Automatic translation system
Story understanding

11/28/2024 CEN 12
Natural Language Generation
Language production
NLG system is like a translator that converts
a computer based representation into a
natural language representation.
NLG may be viewed as the opposite of
natural language understanding: whereas in
natural language understanding the system
needs to disambiguate the input sentence to
produce the machine representation
language, in NLG the system needs to make
decisions about how to put a concept into
words.
11/28/2024 CEN 13
Why NLP
 NLP products are urgently needed for improving
human-machine interaction since the main obstacle
in the interaction between human and computer is
one of communication.

 Today's computers do not understand our


language, and humans have difficulties understand
the computer's language, which does not
correspond to the structure of human thought.

11/28/2024 CEN 14
Why NLP
How can we tell computers about
language? (Or help them learn it as kids
do?)
It would be great if machines Could

Process our emails

Translate languages accurately

Help us manage, summarize, and

Aggregate information

Understand phone conversation

Talk to us /listen to us
Why NLP?
 Human language interesting & challenging
 NLP offers insights (deep understanding) into
language
 Language is the medium of the web
 Interdisciplinary: Ling, CS, psych, math
 Help in communication
 With computers (ASR, TTS)
 With other humans (MT)
 Huge amounts of data
 Internet = at least 20 billions pages
 Applications for processing large amounts of texts
Why NLP?
kJfmmfj mmmvvv nnnffn333
Uj iheale eleee mnster vensi credur
Baboi oi cestnitze
Coovoel2^ ekk; ldsllk lkdf vnnjfj?
Fgmflmllk mlfm kfre xnnn!

11/28/2024 CEN 17
Computers Lack Knowledge!
Computers “see” text in English the same
you have seen the previous text!
People have no trouble understanding
language
Common sense knowledge
Reasoning capacity
Experience
Computers have
No common sense knowledge
No reasoning capacity
11/28/2024 CEN 18
NLP and Computational Models
Two types
Knowledge driven
 Grammar rules
 Semantic Structure

Data driven
 Using ML for Syntactic pattern
 Human effort is less
 Performance depends on quality and quantity

11/28/2024 CEN 19
Classical NLP and Statistical NLP

Classical NLP

Linguist

rules
Computer

rules/probabilities

Text data
corpus
Statistical /ML NLP

11/28/2024 CEN 20
Why NLP is difficult
Natural language is extremely rich in form and
structure, and very ambiguous.
 How to represent meaning,
 Which structures map to which meaning structures.
One input can mean many different things.
Ambiguity can be at different levels.
 Lexical (word level) ambiguity -- different meanings
of words
 Syntactic ambiguity -- different ways to parse the
sentence
 Interpreting partial information -- how to interpret
pronouns
11/28/2024 CEN 21

Language Processing
Level 1 – Speech sound (Phonetics &
Phonology)
Level 2 – Words & their forms
(Morphology, Lexicon)
Level 3 – Structure of sentences (Syntax,
Parsing)
Level 4 – Meaning of sentences
(Semantics)
Level 5 – Meaning in context & for a
purpose (Pragmatics)
Level 6 – Connected sentence processing
in a larger body
03/01/06 of text
Prof. Pushpak (Discourse)
Bhattacharyya, IIT 22
Bombay
Ambiguity
 Phonetic
 [raIt] = write, right, rite
 Lexical/morphological
 book= noun, verb
 Syntactic/Structural
 I saw the man with the telescope
 Semantic
 dish = physical plate, menu item
• Pragmatic level Ambiguity
• Kabir and Ayan are married
• Kabir and Suha are married.
• Discourse level Ambiguity
• John went to the club on Saturday. He met Sam.

→ All of these make NLP difficult

11/28/2024 CEN 23
Ambiguity
 Some interpretations of : I made her
duck.
1. I cooked duck for her.
2. I cooked duck belonging to her.
3. I created a toy duck which she owns.
4. I caused her to quickly lower her head or body.
5. I used magic and turned her into a duck.
 duck – morphologically and syntactically
ambiguous: noun or verb.
 her – syntactically ambiguous: dative or
possessive.
 make – semantically ambiguous: cook or
create.
11/28/2024 CEN 24
Ambiguous Sentences
 Time flies like an arrow. Fruit flies like a banana.
[Lexical]
 It's a play on words:
 Time flies (i.e. goes by quickly) like (i.e. similar to) an arrow soars in the air, fruit flies (i.e.
insects) like (i.e. prefer) a banana.
 We saw her duck.
 The mouse is on my desk.

 I saw the man with the binoculars. [Syntactical]


 They are hunting dogs.
 Wanted: a nurse for a baby about twenty years old.
 The old men and women left the room
 The lady hit the man with an umbrella.
 I told her books were funny
 Mary loves visiting relatives

11/28/2024 CEN 25
Garden Path Sentence
The sentence uses "flies" as a verb and then
"flies as a noun. It uses "like" as a preposition
and then "like" as a verb.

Because the structure of the two parts is


identical, the reader stumbles over the
second part.

It is meant to demonstrate confusion caused


by structure and words with different
meanings, and it does so in a humorous way.
11/28/2024 CEN 26
Eliza
ELIZA is a computer program and an early
example of primitive natural language
processing.
ELIZA was written at MIT by Joseph
Weizenbaum between 1964 and
1966.anguage processing.

11/28/2024 CEN 27
Eliza
http://www.manifestation.com/neurotoys/
eliza.php3

11/28/2024 CEN 28
Stages in NLP

11/28/2024 CEN 29
Different levels of Language Analysis
A natural language-system must use
considerable knowledge about the structure
of the language itself, including
what the words are,
 how words combine to form sentences,
what the words mean,
how word meanings contribute to sentence
meanings, and so on.

11/28/2024 CEN 30
The Steps in NLP
Discourse

Pragmatics

Semantics

Syntax

Morphology

11/28/2024 CEN 31
The steps in NLP (Cont.)
Morphology: Concerns the way words are
built up from smaller meaning bearing units.
(come(s))
Syntax: concerns how words are put
together to form correct sentences and what
structural role each word has.
Semantics: concerns what words mean and
how these meanings combine in sentences to
form sentence meanings.
Pragmatics: concerns how sentences are
used in different situations and how use
affects the interpretation of the sentence.
Discourse: concerns how the immediately
preceding sentences affect the interpretation
of the nextCEN
11/28/2024 sentence. 32
Phonology
Phonetic and phonological knowledge –
 concerns how words are related to the sounds that
realize them. Such knowledge is crucial for speech-
based systems.
 Phoneme is the smallest unit of sound
 Important in Speech-R and Speech-G

11/28/2024 CEN 33
Morphology
Morphological knowledge –
 concerns how words are constructed
from more basic meaning units called
morphemes. A morpheme is the primitive
unit of meaning in a language.
Lexical Analysis-All about words
Require morph-knowledge
Smallest unit is morphemes

11/28/2024 CEN 34
Morphology
Word formation rules from root words
Nouns: Plural (boy-boys); Verbs: Tense
(stretch-stretched);
Languages rich in morphology: e.g., Dravidian,
Hungarian, Turkish
Languages poor in morphology: Chinese,
English
Languages with rich morphology have the
advantage of easier processing at higher
stages of processing

11/28/2024 CEN 35
Syntax
 Syntactic knowledge –
 concerns how words can be put together to form correct
sentences and determines what structural role each word plays in
the sentence and what phrases are subparts of what other
phrases.
 Finding the structure
 Relationship b/w words in a sentence
 Knowledge related to forming grammatically correct sentences
 Assigning a syntactic and logical form to an input sentence
 uses knowledge about word and word meanings (lexicon)
 uses a set of rules defining legal structures (grammar)

(S (NP (NAME Sam))


(VP (V ate)
(NP (ART the)
(N apple))))
11/28/2024 CEN 36
Semantics

Many words have many meanings or senses.


We need to resolve which of the senses of an
ambiguous word is invoked in a particular use of the
word.
concerns what words mean and how these
meanings -combine in sentences to form sentence
meanings. This is the study of context-independent
meaning .
Find the Meaning of a sentence
“plant” = industrial plant
“plant” = living organism
11/28/2024 CEN 37
Pragmatic knowledge
Pragmatic knowledge –
 concerns how sentences are used in different
situations and how use affects the
interpretation of the sentence.
Hard to identify
 Do you know what time it is?
 Yes-(Not a appropriate answer)

 Tourist (in a hurry, checking out of the hotel,


motioning to the service boy): Boy, go upstairs and
see if my sandals are under the divan. Do not be
late. I just have 15 minutes to catch the train.
 Boy (running upstairs and coming back panting):
11/28/2024yes sir, theyCENare there. 38
World Knowledge
includes the general knowledge about the
structure of the world that language users must
have in order to, for example, maintain a
conversation. It includes what each language user
must know about the other user’s beliefs and
goals.
Imparting world knowledge is difficult
"the blue pen ate the ice-cream"
“the dog ate my homework”

11/28/2024 CEN 39
Reference Resolution
Discourse knowledge-
concerns how the immediately preceding
sentences affect the interpretation of the next
sentence.
Paragraph-document is an input
o/p is structure and meaning
Discourse Knowledge
U: I would like to open a fixed deposit account.
S: For what amount?
U: Make it for 800 dollars.
S: For what duration?
U: What is the interest rate for 3 months?
S: Six percent.
11/28/2024 CEN 40
Inferring Knowledge from text
 Words
 word frequencies
 collocations
 word sense
 n-grams (words appear in certain order)
 Grammar
 word categories
 syntactic structure
 Discourse
 Sentence meanings
 Applications
 Information Retrieval
 Information Extraction
 Natural language interface
 Statistical Machine Translation

11/28/2024 CEN 41
Simple Applications
Word counters (wc in UNIX)
Spell Checkers, grammar checkers
Predictive Text on mobile handsets

11/28/2024 CEN 42
More significant Applications
Intelligent computer systems
NLU interfaces to databases
Computer aided instruction, automatic graders
Intelligent Web searching
Speech recognition
Natural language generation
Machine translation
spelling/grammar correction
Information Retrieval
Data mining
Document classification
Question answering, conversational agents
11/28/2024 CEN 43
11/28/2024 CEN 44
11/28/2024 CEN 45
11/28/2024 CEN 46
Standard NLP tools

Stanford Core NLP (https://corenlp.run/)

Stanford NLP Group (


https://nlp.stanford.edu/software/)

NLTK (https://www.nltk.org/)

11/28/2024 CEN 47
Thank you

11/28/2024 CEN 48

You might also like