NLP Introduction , applications, NLP Pipeline, Steps in NLP

Natural Language Processin
Presented by:
Dr.Kirti Verma
CSE, LNCTE

Advantages of NLP
•NLP helps users to ask questions about any subject and get a direct
response within seconds.
•NLP offers exact answers to the question means it does not offer
unnecessary and unwanted information.
•NLP helps computers to communicate with humans in their
languages.
•It is very time efficient.
•Most of the companies use NLP to improve the efficiency of
documentation processes, accuracy of documentation, and identify the
information from large databases.
Disadvantages of NLP
A list of disadvantages of NLP is given below:
•NLP may not show context.
•NLP is unpredictable
•NLP may require more keystrokes.
•NLP is unable to adapt to the new domain, and it has a limited
function that's why NLP is built for a single and specific tasksonly.

 Lexical means relating to words of a language.
 During Lexical analysis given paragraphs are broken down into
words or tokens. Each token has got specific meaning.
 There can be instances where a single word can be interpreted
in multiple ways.
 The ambiguity that is caused by the word alone rather than the
context is known as Lexical Ambiguity.
Example: “Give me the bat!”
In the above sentence, it is unclear whether bat refers to a
nocturnal animal bat or a cricket bat. Just by looking at the word it
does not provide enough information about the meaning hence we
need to know the context in which it is used.
 Lexical Ambiguity can be further categorized
into Polysemy and homonymy.
Lexical Ambiguity

a) Polysemy
It refers to a single word having multiple but related meanings.
Example: Light (adjective).
• Thanks to the new windows, this room is now so light and airy = lit by
the natural light of day.
• The light green dress is better on you = pale colours.
In the above example, light has different meanings but they are related
to each other.
b) Homonymy
It refers to a single word having multiple but unrelated meanings.
Example: Bear, left, Pole
• A bear (the animal) can bear (tolerate) very cold temperatures.
• The driver turned left (opposite of right) and left (departed from) the
main road.
Pole and Pole — The first Pole refers to a citizen of Poland who could
either be referred to as Polish or a Pole. The second Pole refers to a
bamboo pole or any other wooden pole.

 Syntactic meaning refers to the grammatical structure and rules that
define how words should be combined to form sentences and phrases.
 A sentence can be interpreted in more than one way due to its
structure or syntax such ambiguity is referred to as Syntactic
Ambiguity.
Example 1: “Old men and women”
The above sentence can have two possible meanings:
All old men and young women.
All old men and old women.
Example 2: “John saw the boy with telescope. “
In the above case, two possible meanings are
John saw the boy through his telescope.
John saw the boy who was holding the telescope.
Syntactic Ambiguity/ Structural
ambiguity

 Semantics is nothing but “Meaning”.
 The semantics of a word or phrase refers to the way it is
typically understood or interpreted by people.
 Syntax describes the rules by which words can be combined
into sentences, while semantics describes what they mean.
 Semantic Ambiguity occurs when a sentence has more than one
interpretation or meaning.
Scope abiguity
Example 1: “Seema loves her mother and Sriya does too.”
The interpretations can be Sriya loves Seema’s mother or Sriya
likes her mother.
Semantic Ambiguity

Anaphoric Ambiguity
A word that gets its meaning from a preceding word or phrase is called an
anaphor.
Example: “Susan plays the piano. She likes music.”
In this example, the word she is an anaphor and refers back to a preceding
expression i.e., Susan.
The linguistic element or elements to which an anaphor refers is called an
antecedent. The relationship between anaphor and antecedent is termed
‘anaphora’.
Ambiguity that arises when there is more than one reference to the
antecedent is known as Anaphoric Ambiguity.
Example 1: “The horse ran up the hill. It was very steep. It soon got tired.”
In this example, there are two ‘it’, and it is unclear to which each ‘it’ refers,
this leads to Anaphoric Ambiguity. The sentence will be meaningful if first ‘it’
refers to the hill and 2nd ‘it’ refers to the horse. Anaphors may not be in the
immediately previous sentence. They may present in the sentences before
the previous one or may present in the same sentence.

Pragmatic ambiguity
Pragmatics focuses on the real-time usage of language like what the speaker
wants to convey and how the listener infers it.
Situational context, the individuals’ mental states, the preceding dialogue,
and other elements play a major role in understanding what the speaker is
trying to say and how the listeners perceive it.
Example:

Step 1: Sentence segmentation
Sentence segmentation is the first step in the NLP pipeline. It divides the
entire paragraph into different sentences for better understanding.
For example, "London is the capital and most populous city of England and the
United Kingdom. Standing on the River Thames in the southeast of the island
of Great Britain, London has been a major settlement for two millennia. It was
founded by the Romans, who named it Londinium."
After using sentence segmentation, we get the following result:
“London is the capital and most populous city of England and the United
Kingdom.”
“Standing on the River Thames in the southeast of the island of Great Britain,
London has been a major settlement for two millennia.”
“It was founded by the Romans, who named it Londinium.”

#Program for sentence tokenization Using NLTK
import nltk
from nltk.tokenize import sent_tokenize
def tokenize_sentences(text):
sentences = sent_tokenize(text)
return sentences
text = "NLTK is a leading platform for building Python programs to
work with human language data. It provides easy-to-use interfaces to
over 50 corpora and lexical resources such as WordNet, along with a
suite of text processing libraries for classification, tokenization,
stemming, tagging, parsing, and semantic reasoning, wrappers for
industrial-strength NLP libraries, and an active discussion forum.“
# Tokenize sentences
sentences = tokenize_sentences(text)
# Print tokenized sentences
for i, sentence in enumerate(sentences):
print(f"Sentence {i+1}: {sentence}")

Step 2: Word tokenization
Word tokenization breaks the sentence into separate words or tokens. This
helps understand the context of the text.
When tokenizing the sentence “London is the capital and most populous
city of England and the United Kingdom”, it is broken into separate words,
i.e., “London”, “is”, “the”, “capital”, “and”, “most”, “populous”, “city”, “of”,
“England”, “and”, “the”, “United”, “Kingdom”, “.”

import nltk
#nltk.download('punkt') # Download the necessary tokenization
models
from nltk.tokenize import word_tokenize
def tokenize_words(text):
words = word_tokenize(text)
return words
# Example text
text = "NLTK is a leading platform for building Python programs
to work with human language data."
# Tokenize words
words = tokenize_words(text)
# Print tokenized words
print(words)
Word tokenization using nltk

Step 3: Stemming
Stemming helps in preprocessing text. The model analyzes the parts of
speech to figure out what exactly the sentence is talking about.
Stemming normalizes words into their base or root form.
In other words, it helps to predict the parts of speech for each token. For
example, intelligently, intelligence, and intelligent.
These words originate from a single root word ‘intelligen’. However, in English
there’s no such word as ‘intelligen’.

from nltk.stem import PorterStemmer
porter = PorterStemmer()
words = ['generous','fairly','sings','generation']
for word in words:
print(word,"--->",porter.stem(word))
Step 3: Stemming code in Python using NLTK library

Step 4: Lemmatization
Lemmatization removes inflectional endings and returns the
canonical form of a word or lemma.
It is similar to stemming except that the lemma is an actual
word.
For example, ‘playing’ and ‘plays’ are forms of the word
‘play’. Hence, play is the lemma of these words. Unlike a
stem (recall ‘intelligen’), ‘play’ is a proper word.

### import necessary libraries
from nltk.stem import WordNetLemmatizer
text = "Very orderly and methodical he looked, with a hand on
each knee, and a loud watch ticking a sonorous sermon under
his flapped newly bought waist-coat, as though it pitted its
gravity and longevity against the levity and evanescence of the
brisk fire."
# tokenise text
tokens = word_tokenize(text)
wordnet_lemmatizer = WordNetLemmatizer()
lemmatized = [wordnet_lemmatizer.lemmatize(token) for token in
tokens]
print(lemmatized)
Step 4: Lemmatization using nltk

Step 5: Stop word analysis
The next step is to consider the importance of each and every word in a
given sentence. In English, some words appear more frequently than
others such as "is", "a", "the", "and". As they appear often, the NLP
pipeline flags them as stop words. They are filtered out so as to focus on
more important words.

program to eliminate stopwords using nltk
import nltk
from nltk.corpus import stopwords
def remove_stopwords(text):
# Tokenize the text into words
# Get English stopwords
english_stopwords = set(stopwords.words('english'))
# Remove stopwords from the tokenized words
filtered_words = [word for word in words if word.lower() not in
english_stopwords]
# Join the filtered words back into a single string
filtered_text = ' '.join(filtered_words)
return filtered_text
# Example text
text = "NLTK is a leading platform for building Python programs to work with
human language data."
# Remove stopwords
filtered_text = remove_stopwords(text)
# Print filtered text
print(filtered_text)

Step 6: Dependency parsing
Next comes dependency parsing which is mainly used to find out how all the
words in a sentence are related to each other. To find the dependency, we can
build a tree and assign a single word as a parent word. The main verb in the
sentence will act as the root node.
The edges in a dependency tree represent grammatical relationships.
These relationships define words’ roles in a sentence, such as subject,
object, modifier, or adverbial.
Subject-Verb Relationship: In a
sentence like “She sings,” the word
“She” depends on “sings” as the
subject of the verb.

Modifier-Head Relationship:
In the sentence “The big cat,” “big” modifies
“cat,” creating a modifier-head relationship.
Direct Object-Verb Relationship:
In “She eats apples,” “apples” is the direct
object that depends on the verb “eats.”
Adverbial-Verb Relationship:
In “He sings well,” “well” modifies the
verb “sings” and forms an adverbial-
verb relationship.

Dependency Tag Description
acl
clausal modifier of a noun
(adnominal clause)
acl:relcl relative clause modifier
advcl adverbial clause modifier
advmod adverbial modifier
advmod:emph emphasizing phrase, intensifier
advmod:lmod locative adverbial modifier
amod adjectival modifier
appos appositional modifier
aux auxiliary
aux:move passive auxiliary
case case-marking
cc coordinating conjunction
cc:preconj preconjunct
ccomp clausal complement
clf classifier
compound compound
conj conjunct
cop copula
csubj clausal topic
csubj:move clausal passive topic
dep unspecified dependency
det determiner
det:numgov рrоnоminаl quаntifier gоverning the саse оf the nоun
det:nummod
рrоnоminаl quаntifier agreeing with the саse оf the
nоun
det:poss possessive determiner
discourse discourse ingredient
dislocated dislocated parts
expl expletive
expl:impers impersonal expletive
expl:move reflexive pronoun utilized in reflexive passive
expl:pv reflexive clitic with an inherently reflexive verb
mounted mounted multiword expression
flat flat multiword expression
flat:overseas overseas phrases
flat:title names
goeswith goes with
iobj oblique object
checklist checklist
mark marker
nmod nominal modifier
nmod:poss possessive nominal modifier
nmod:tmod temporal modifier

Step 7: Part-of-speech (POS) tagging
POS tags contain verbs, adverbs, nouns, and adjectives that help indicate
the meaning of words in a grammatically correct way in a sentence.

POS tagging is a key step in NLP and is
used in many applications, including:
Text analysis
Machine translation
Information retrieval
Speech recognition
Parsing
Sentiment analysis:
Part-of-speech (POS) tagging is a process in Natural Language
Processing (NLP) that assigns grammatical categories to words in
a sentence. This helps algorithms understand the meaning and
structure of a text.

Program to perform Parts of Speech
tagging using nltk
#Parts of Speech Tagging
import nltk
def pos_tagging(text):
# Tokenize the text into words
# Perform POS tagging
tagged_words = nltk.pos_tag(words)
return tagged_words
# Example text
text = "NLTK is a leading platform for building Python
programs to work with human language data."
# Perform POS tagging
tagged_text = pos_tagging(text)
# Print POS tagged text
print(tagged_text)

[('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('leading', 'VBG'),
('platform', 'NN'), ('for', 'IN'), ('building', 'VBG'), ('Python',
'NNP'), ('programs', 'NNS'), ('to', 'TO'), ('work', 'VB'),
('with', 'IN'), ('human', 'JJ'), ('language', 'NN'), ('data',
'NNS'), ('.', '.')]
Output:
# CODE to print all the POS TAGS
import nltk
nltk.download('tagsets')
nltk.help.upenn_tagset()

Step 8: Named Entity Recognition (NER)
Named Entity Recognition (NER) is the process of detecting the named
entity such as person name, movie name, organization name, or location.
Example:
Steve Jobs introduced iPhone at the Macworld Conference in San
Francisco, California.

• Lexicon Based Method
The NER uses a dictionary with a list of words or terms.
• Rule Based Method
The Rule Based NER method uses a set of predefined rules
guides the extraction of information. These rules are based on
patterns and context.
• Machine Learning-Based Method
Multi-Class Classification with Machine Learning
Algorithms
One way is to train the model for multi-class classification
using different machine learning algorithms, but it requires a
lot of labelling Conditional Random Field (CRF)
, it is implemented by both NLP Speech Tagger and NLTK.
• Deep Learning Based Method
Deep learning NER system is much more accurate than
previous method, as it is capable to assemble words.
Types of Named Entity Recognition

#Named Entity Recognition
from nltk import pos_tag, ne_chunk
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
def ner(text):
tagged_words = pos_tag(words)
named_entities = ne_chunk(tagged_words)
return named_entities
text = "Apple is a company based in California, United States.
Steve Jobs was one of its founders."
named_entities = ner(text)
print(named_entities)

(S
(GPE Apple/NNP)
is/VBZ
a/DT
company/NN
based/VBN
in/IN
(GPE California/NNP)
,/,
(GPE United/NNP States/NNPS)
./.
(PERSON Steve/NNP Jobs/NNP)
was/VBD
one/CD
of/IN
its/PRP$
founders/NNS
./.)

Step 9: Chunking
Chunking is used to collect the individual piece of information
and grouping them into bigger pieces of sentences.
Chunk extraction or partial parsing is a process of meaningful
extracting short phrases from the sentence (tagged with Part-of-
Speech).
parts of speech namely the noun, verb, adjective, adverb,
preposition, conjunction, pronoun, and interjection.

import nltk
from nltk.chunk import RegexpParser
# Example sentence
sentence = "Educative Answers is a free web encyclopedia
written by devs for devs."
# Tokenization
tokens = word_tokenize(sentence)
# POS tagging
pos_tags = nltk.pos_tag(tokens)
# Chunking patterns
chunk_patterns = r"""
NP: {<DT>?<JJ>*<NN>} # Chunk noun phrases
VP: {<VB.*><NP|PP>} # Chunk verb phrases
""
# Create a chunk parser
chunk_parser = RegexpParser(chunk_patterns)
# Perform chunking
result = chunk_parser.parse(pos_tags)
# Print the chunked result
print(result)

CHUNKING :
"Educative Answers is a free web encyclopedia written by devs
for devs."
(S
Educative/JJ
Answers/NNPS
(VP is/VBZ (NP a/DT free/JJ web/NN))
(NP encyclopedia/NN)
written/VBN
by/IN
(NP devs/NN)
for/IN
(NP devs/NN)
./.)

After chunking the text result.draw will draw a chunking
tree in python

• A lexicon is defined as a collection of words and phrases in
a given language, with the analysis of this collection being
the process of splitting the lexicon into components, based
on what the user sets as parameters – paragraphs,
phrases, words, or characters.
• Morphological analysis is the process of identifying the
morphemes of a word.
• A morpheme is a basic unit of English language
construction, which is a small element of a word, that
carries meaning.
• These can be either a free morpheme (e.g. walk) or a
bound morpheme (e.g. -ing, -ed), with the difference
between the two being that the latter cannot stand on it’s
own to produce a word with meaning, and should be
assigned to a free morpheme to attach meaning.
Phase I: Lexical or Morphological analysis

Importance of Morphological Analysis
Morphological analysis is crucial in NLP for several reasons:
• Understanding Word Structure: It helps in deciphering the
composition of complex words.
• Predicting Word Forms: It aids in anticipating different forms
of a word based on its morphemes.
• Improving Accuracy: It enhances the accuracy of tasks such
as part-of-speech tagging, syntactic parsing, and machine
translation.

• This phase is essential for understanding the structure of a
sentence and assessing its grammatical correctness.
• It involves analyzing the relationships between words and
ensuring their logical consistency by comparing their
arrangement against standard grammatical rules.
• Consider the following sentences:
• Correct Syntax: “John eats an apple.”
• Incorrect Syntax: “Apple eats John an.”
• POS Tags:
• John: Proper Noun (NNP)
• eats: Verb (VBZ)
• an: Determiner (DT)
• apple: Noun (NN)
Phase II: Syntactic analysis or Parsing

• Syntactically Correct but Semantically Incorrect:
“Apple eats a John.”
• This sentence is grammatically correct but does not make
sense semantically.
• An apple cannot eat a person, highlighting the importance
of semantic analysis in ensuring logical coherence.
• Literal Interpretation:
“What time is it?”
• This phrase is interpreted literally as someone asking for the
current time, demonstrating how semantic analysis helps in
understanding the intended meaning.
Phase III: Semantic Analysis

Semantic Analysis
Semantic Analysis is the third phase of Natural Language
Processing (NLP), focusing on extracting the meaning
from text.
Semantic analysis aims to understand the dictionary
definitions of words and their usage in context. It
determines whether the arrangement of words in a
sentence makes logical sense.
Key Tasks in Semantic Analysis
Named Entity Recognition (NER): NER identifies and classifies entities within
the text, such as names of people, places, and organizations. These entities
belong to predefined categories and are crucial for understanding the text’s
content.
Word Sense Disambiguation (WSD): WSD determines the correct meaning of
ambiguous words based on context. For example, the word “bank” can refer to a
financial institution or the side of a river. WSD uses contextual clues to assign the
appropriate meaning.

Discourse integration is the analysis and identification of the larger
context for any smaller part of natural language structure (e.g. a
phrase, word or sentence).
During this phase, it’s important to ensure that each phrase, word, and
entity mentioned are mentioned within the appropriate context.
Contextual Reference: “This is unfair!”
To understand what “this” refers to, we need to examine the
preceding or following sentences. Without context, the statement’s
meaning remains unclear.
Anaphora Resolution: “Taylor went to the store to buy some
groceries. She realized she forgot her wallet.”
In this example, the pronoun “she” refers back to “Taylor” in the
first sentence. Understanding that “Taylor” is the antecedent of
“she” is crucial for grasping the sentence’s meaning.
Phase IV: Discourse integration

• It focusing on interpreting the inferred meaning of a text
beyond its literal content.
• Human language is often complex and layered with
underlying assumptions, implications, and intentions that go
beyond straightforward interpretation.
Contextual Greeting: “Hello! What time is it?”
“Hello!” is more than just a greeting; it serves to establish
contact.
“What time is it?” might be a straightforward request for the
current time, but it could also imply concern about being late.
Figurative Expression: “I’m falling for you.”
The word “falling” literally means collapsing, but in this context, it
means the speaker is expressing love for someone.
Phase V: Pragmatic Analysis

What is the difference between large
language models and generative AI?
Generative AI is an umbrella term that refers to artificial intelligence
models that have the capability to generate content.
Generative AI can generate text, code, images, video, and music.
Examples of generative AI include Midjourney, DALL-E, and
ChatGPT.
Large language models are a type of generative AI that are trained
on text and produce textual content. ChatGPT is a popular example
of generative text AI.
All large language models are generative AI1
.
LLMs have achieved remarkable advancements in various
language-related applications such as text generation, translation,
summarization, question-answering, and more.

A large language model is a computer program that learns and
generates human-like language using a transformer architecture
trained on vast training data.
Large Language Models (LLMs) are foundational machine learning
models that use deep learning algorithms to process and understand
natural language.
These models are trained on massive amounts of text data to learn
patterns and entity relationships in the language.
LLMs can perform many types of language tasks, such as translating
languages, analyzing sentiments, chatbot conversations, and more.
A large language model is an advanced type of language model that
is trained using deep learning techniques on massive amounts of text
data. These models are capable of generating human-like text and
performing various natural language processing tasks.
Large Language Model

NLP Introduction , applications, NLP Pipeline, Steps in NLP

More Related Content

What's hot (20)

Similar to NLP Introduction , applications, NLP Pipeline, Steps in NLP (20)

More from Kirti Verma (20)

Recently uploaded (20)

NLP Introduction , applications, NLP Pipeline, Steps in NLP