0% found this document useful (0 votes)

4 views12 pages

NLP UNIT-2

The document provides an overview of the Natural Language Toolkit (NLTK), a Python library for natural language processing (NLP) that includes tools for text processing, tokenization, stemming, lemmatization, and part-of-speech tagging. It discusses the installation process, various NLP techniques, and the advantages and disadvantages of stemming and lemmatization. Additionally, it covers the use of WordNet for semantic similarity and query expansion in information retrieval systems.

Uploaded by

gaursujal02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

NLP UNIT-2

Uploaded by

gaursujal02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT-2

Natural Language Toolkit (NLTK)

NLTK, or Natural Language Toolkit, is a Python package that you can use for NLP. A lot of
the data that you could be analyzing is unstructured data and contains human-readable text.
Before you can analyze that data programmatically, you first need to preprocess it.
Installing NLTK
We can install NLTK on various OS as follows –
On Windows
In order to install NLTK on Windows OS, follow the below steps –
• First, open the Windows command prompt and navigate to the location of
the pip folder.
• Next, enter the following command to install NLTK –
pip3 install nltk
Now, open the PythonShell from Windows Start Menu and type the following command in
order to verify NLTK’s installation –
Import nltk
If you get no error, you have successfully installed NLTK on your Windows OS having
Python3.
NLTK is a toolkit build for working with NLP in Python. It provides us various text
processing libraries with a lot of test datasets. A variety of tasks can be performed using
NLTK such as tokenizing, parse tree visualization, etc.
Advantages of NLTK:
One of the most popular and widely used tools for text mining is NLTK, or Natural Language
Toolkit. NLTK is a Python library that provides a rich set of modules and resources for NLP,
such as tokenizers, parsers, stemmers, taggers, corpora, and models.

Size of NLTK:

It consists of about 30 compressed files requiring about 100Mb disk space. The full collection
of data (i.e., all in the downloader) is nearly ten times this size (at the time of writing) and
continues to expand. Once the data is downloaded to your machine, you can load some of it
using the Python interpreter.

Tokenization in NLP
Tokenization is the start of the NLP process, converting sentences into understandable bits of
data that a program can work with. Without a strong foundation built through tokenization,
the NLP process can quickly devolve into a messy telephone game.
NLTK Tokenization is used for parsing a large amount of textual data into parts to perform an
analysis of the character of the text. NLTK for tokenization can be used for training machine
learning models, Natural Language Processing text cleaning.
In Python tokenization basically refers to splitting up a larger body of text into smaller lines,
words or even creating words for a non-English language. The various tokenization functions
in-built into the nltk module itself and can be used in programs as shown below.

Line Tokenization:

import nltk
sentence_data = "The First sentence is about Python. The Second: about Django. You can
learn Python,Django and Data Ananlysis here. "
nltk_tokens = nltk.sent_tokenize(sentence_data)
print (nltk_tokens)

O/P:
['The First sentence is about Python.', 'The Second: about Django.', 'You can learn
Python,Django and Data Ananlysis here.']

Word Tokenzitaion

import nltk

word_data = "It originated from the idea that there are readers who prefer learning new skills
from the comforts of their drawing rooms"
nltk_tokens = nltk.word_tokenize(word_data)
print (nltk_tokens)

O/P: ['It', 'originated', 'from', 'the', 'idea', 'that', 'there', 'are', 'readers',
'who', 'prefer', 'learning', 'new', 'skills', 'from', 'the',
'comforts', 'of', 'their', 'drawing', 'rooms']
STEMMING IN NLP
Stemming is a technique used to reduce an inflected word down to its word stem. For
example, the words “programming,” “programmer,” and “programs” can all be reduced down
to the common word stem “program.” In other words, “program” can be used as a synonym
for the prior three inflection words.
Stemming is a technique used to extract the base form of the words by removing affixes from
them. It is just like cutting down the branches of a tree to its stems.
For example, the stem of the words eating, eats, eaten is eat.

Stemming Algorithms

Application of Stemming:
Stemming is used in information retrieval systems like search engines, text mining SEOs,
Web search results, indexing, tagging systems, and word analysis, stemming. It is used to
determine domain vocabularies in domain analysis.
Disadvantage:
Overstemming and understemming are two problems that can arise in stemming.
Overstemming occurs when a stemmer reduces a word to its base form too aggressively,
resulting in a stem that is not a valid word. For example, the word “fishing” might be
overstemmed to “fishin,” which is not correct.

Lemmatization
Lemmatization is the process of grouping together different inflected forms of the same word.
It's used in computational linguistics, natural language processing (NLP) and chatbots.
Lemmatization is a text pre-processing technique used in natural language processing (NLP)
models to break a word down to its root meaning to identify similarities. For example, a
lemmatization algorithm would reduce the word better to its root word, or lemme, good.
Example:
Lemmatization considers the context and converts the word to its meaningful base form,
which is called Lemma. For instance, stemming the word 'Caring' would return 'Car'. For
instance, lemmatizing the word 'Caring' would return 'Care'.
Lemmatization is another technique used to reduce inflected words to their root word. It
describes the algorithmic process of identifying an inflected word's “lemma” (dictionary
form) based on its intended meaning.
Examples of lemmatization:
# import these modules
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print("rocks :", lemmatizer.lemmatize("rocks"))

print("corpora :", lemmatizer.lemmatize("corpora"))

# a denotes adjective in "pos"

print("better :", lemmatizer.lemmatize("better", pos ="a"))

Output :

rocks : rock
corpora : corpus
better : good
Advantages of Lemmatization with NLTK:
1. Improves text analysis accuracy: Lemmatization helps in improving the
accuracy of text analysis by reducing words to their base or dictionary form.
This makes it easier to identify and analyze words that have similar meanings.
2. Reduces data size: Since lemmatization reduces words to their base form, it
helps in reducing the data size of the text, which makes it easier to handle large
datasets.
3. Better search results: Lemmatization helps in retrieving better search results
since it reduces different forms of a word to a common base form, making it
easier to match different forms of a word in the text.
4. Useful for feature extraction: Lemmatization can be useful in feature extraction
tasks, where the aim is to extract meaningful features from text for machine
learning tasks.

Disadvantages of Lemmatization with NLTK:

1. Requires prior knowledge: Lemmatization requires prior knowledge of the
language and the rules governing the formation of words. If the rules for a
specific language are not available, then the accuracy of the lemmatizer may be
affected.
2. Time-consuming: Lemmatization can be time-consuming since it involves
parsing the text and performing a lookup in a dictionary or a database of word
forms.
3. Not suitable for real-time applications: Since lemmatization is time-consuming,
it may not be suitable for real-time applications that require quick response
times.
4. May lead to ambiguity: Lemmatization may lead to ambiguity, as a single word
may have multiple meanings depending on the context in which it is used. In
such cases, the lemmatizer may not be able to determine the correct meaning of
the word.

POS tagging
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also
called grammatical tagging is the process of marking up a word in a text (corpus) as
corresponding to a particular part of speech, based on both its definition and its context.
Part-of-speech tagging (POS tagging) is a process in which each word in a text is assigned its
appropriate morphosyntactic category (for example noun-singular, verb-past, adjective,
pronoun-personal, and the like).
Example:
In a rule-based POS tagging system, words are assigned POS tags based on their
characteristics and the context in which they appear. For example, a rule-based POS tagger
might assign the tag “noun” to any word that ends in “-tion” or “-ment,” as these suffixes are
often used to form nouns.

Hidden Markov Model (HMM) POS Tagging

Before digging deep into HMM POS tagging, we must understand the concept of Hidden
Markov Model (HMM).
Hidden Markov Model
An HMM model may be defined as the doubly-embedded stochastic model, where the
underlying stochastic process is hidden. This hidden stochastic process can only be observed
through another set of stochastic processes that produces the sequence of observations.
Example
For example, a sequence of hidden coin tossing experiments is done and we see only the
observation sequence consisting of heads and tails. The actual details of the process - how
many coins used, the order in which they are selected - are hidden from us. By observing this
sequence of heads and tails, we can build several HMMs to explain the sequence. Following
is one form of Hidden Markov Model for this problem −

We assumed that there are two states in the HMM and each of the state corresponds to the
selection of different biased coin. Following matrix gives the state transition probabilities −
A=[a11a21a12a22]
Here,
• aij = probability of transition from one state to another from i to j.
• a11 + a12 = 1 and a21 + a22 =1
• P1 = probability of heads of the first coin i.e. the bias of the first coin.
• P2 = probability of heads of the second coin i.e. the bias of the second coin.
We can also create an HMM model assuming that there are 3 coins or more.
This way, we can characterize HMM by the following elements −
• N, the number of states in the model (in the above example N =2, only two
states).
• M, the number of distinct observations that can appear with each state in the
above example M = 2, i.e., H or T).
• A, the state transition probability distribution − the matrix A in the above
example.
• P, the probability distribution of the observable symbols in each state (in our
example P1 and P2).
• I, the initial state distribution.
chunking in NLP
Chunking is defined as the process of natural language processing used to identify parts of
speech and short phrases present in a given sentence.

Chunking, one of the important processes in natural language processing, is used to identify
parts of speech (POS) and short phrases. In other simple words, with chunking, we can get
the structure of the sentence. It is also called partial parsing.
Chunk patterns and chinks
Chunk patterns are the patterns of part-of-speech (POS) tags that define what kind of words
made up a chunk. We can define chunk patterns with the help of modified regular
expressions.
Moreover, we can also define patterns for what kind of words should not be in a chunk and
these unchunked words are known as chinks.
Implementation example
In the example below, along with the result of parsing the sentence “the book has many
chapters”, there is a grammar for noun phrases that combines both a chunk and a chink
pattern −
import nltk
sentence = [
("the", "DT"),
("book", "NN"),
("has","VBZ"),
("many","JJ"),
("chapters","NNS")
]
chunker = nltk.RegexpParser(
r'''
NP:{<DT><NN.*><.*>*<NN.*>}
}<VB.*>{
'''
)
chunker.parse(sentence)
Output = chunker.parse(sentence)
Output.draw()

OUTPUT
WordNet

WordNet is a massive lexicon of English words. Nouns, verbs, adjectives, and adverbs are
arranged into synsets,' which are collections of cognitive synonyms that communicate a
separate concept. Conceptual-semantic and linguistic links like hyponymy and antonymy are
used to connect synsets.

WordNet has been used for a number of purposes in information systems, including word-
sense disambiguation, information retrieval, automatic text classification, automatic text
summarization, machine translation and even automatic crossword puzzle generation.

WordNet is a network of words linked by lexical and semantic relations. Nouns, verbs,
adjectives and adverbs are grouped into sets of cognitive synonyms, called synsets, each
expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and
lexical relations.

Application of WordNet in Query Expansion

Query Expansion is the term given when a search engine adding search terms to a user's
weighted search. The goal is to improve precision and/or recall. Example: User Query: “car”;
Expanded Query: “car cars automobile automobiles auto” etc
Four main Phases:
1.decompositin,
2.optimization,
3. code
generation
4.execution.
query language, a computer programming language used to retrieve information from a
database. The uses of databases are manifold. They provide a means of retrieving records
orparts of records and performing various calculations before displaying the results.
Query by example is a query language used in relational databases that allows users to
searchfor information in tables and fields by providing a simple user interface where the
user will be able to input an example of the data that he or she wants to access.
Query expansion is used as a term for adding related words to a query in order to increase
thenumber of returned documents and improve recall accordingly. Typically, all the
keywords ineach query should first be extracted, and then for each keyword the synonyms
and acronyms are automatically selected

What is Semantic or Wordnet Similarity?

It is the task of measuring sentence similarity is defined as determining how similar the
meanings of the two sentences or words are. Example: Top image says that for two
questions How old are you? and What is your age? the answer is the same I am 20 years
old? How the system gave the same answer for the two questions, because the two questions
are semantically related i.e they are similar in meaning. Our task is how to identify the
similarity between the sentences.

Right at the start, I'm going to ask you a question. Which pair of words is the most similar in the
following,

• deer , elk
• deer , giraffe
• deer , horse
• deer and mouse?

Mostly every one of us says deer and elk but how a machine can say, this is by using some
semantic measures and semantic resources. There are a lot of high-performance techniques to
do this but here is the basic process.

Two entities can be termed as similar under one of the following conditions:

1. Both belong to the same class

2. Both belong to classes that have a common parent class
3. One entity belongs to a class that is a parent class to which the other entity
belongs
Two relationships can be termed as similar under one of the following conditions:

1. Both belong to the same class

2. Both belong to classes with a common parent class
3. One relation belongs to a class that is a parent class to which the other relation
belongs

When we should go with Semantic Similarity?

1. If your task is to group the words which are similar in meaning then we should go with Semantic
Similarity.
2. It is the basic building block of Natural Language Understanding Tasks. Textual
Entitlement: Let us consider a paragraph P and a sentence S if you want to find whether the
sentence S derives its meaning from paragraph P or not then you can go with Semantic
Similarity. Para-Phrasing: Paraphrasing is a task where you rephrase or rewrite some
sentences you get into another sentence that has the same meaning.

Resources used for Semantic Similarity

1. There are a lot of resources used for semantic similarity one of those is WordNet. WordNet is a
Semantic Dictionary of English Words that are interlinked by semantic relations.
2. It also includes rich Linguistic Information such as parts of speech, Word Senses i.e different
meanings of the word, hypernyms, and hyponyms, etc.
3. WordNet is Machine Readable and available freely hence it is used most for Natural Language
Processing Tasks.

Measures of Semantic Similarity

Here are a few different algorithms used to measure semantic similarity between words, and is important
to know in the context of the new age enterprise search platforms like 3RDi Search.

1]Path Length Score

Path Length is a score that represents the number of edges that connect two words in
the shortest path. In a thesaurus hierarchy graph, the shorter the path between two
words/senses, the more related they are.

2] Leakcock-Chodorow Score

This score includes log smoothing and it indicates the number of edges between two words/senses. This
is similar to path length with log smoothing and has the same features, except that due to the log
smoothing, it is continuous in nature.
3] Wu & Palmer Score
This is a score that considers the positions of concepts c1 and c2 in the taxonomy in relation to the Least
Common Subsumer (c1, c2). In path-based measurements, it considers that the similarity between two
concepts is a function of path length and depth.

4] Resnik Similarity Score

Based on the Information Content (IC) of the Least Common Subsumer, this score indicates how similar
two-word senses are.

The frequency counts of concepts contained in a corpus of text is displayed as information content. Each
time a concept is observed, the frequency associated with the idea is updated in WordNet, as are the
counts of the ancestor concepts in the WordNet hierarchy (for nouns and verbs).

5] Lin Similarity Score

This is a number that takes into account both the amount of information required to state the similarities
between the two concepts as well as the amount of information required to fully describe these terms.

6)Cosine similarity
Measuring semantic similarity doesn’t depend on this type separately but combines it with other
types for measuring the distance between non-zero vectors of features.
The most important algorithms in this type are Manhattan Distance, Euclidean Distance, Cosine
Similarity, Jaccard Index, and Sorensen-Dice Index.

where n is the size of features vector.

Conclusion:
Semantic similarity is a significant concept today as it has multiple applications in Natural Language
Processing (NLP) and forms one of the building blocks of the new age enterprise search platforms. Want
to witness semantic similarity through our 3RDi Search platform? Visit www.3rdisearch.com or drop us
an email on [email protected] and our team will get in touch with you.

NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
Text Preprocessing: Information Retrieval
100% (2)
Text Preprocessing: Information Retrieval
16 pages
NLP Class10.PDF
No ratings yet
NLP Class10.PDF
9 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
Voltage Drop Mitigation by Adaptive Voltage Scaling Using Clock-Data Compensation
No ratings yet
Voltage Drop Mitigation by Adaptive Voltage Scaling Using Clock-Data Compensation
4 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
3.Nlp Lab Manual
No ratings yet
3.Nlp Lab Manual
18 pages
Ultimate Logs Marketplace - Products
No ratings yet
Ultimate Logs Marketplace - Products
1 page
Sample-Asia Data
No ratings yet
Sample-Asia Data
139 pages
TextMining
No ratings yet
TextMining
43 pages
LP Vi Manual
No ratings yet
LP Vi Manual
77 pages
NLP EXP 3 (1)
No ratings yet
NLP EXP 3 (1)
24 pages
Lab 2
No ratings yet
Lab 2
49 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
ir manual
No ratings yet
ir manual
53 pages
Natural Language Processing With Python's NLTK Package – Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package – Real Python
27 pages
Introduction To Natural Language Processing (Nlp) : Ths. Đặng Nhân Cách Email: [email protected]
No ratings yet
Introduction To Natural Language Processing (Nlp) : Ths. Đặng Nhân Cách Email: [email protected]
25 pages
Available Freeware List
0% (1)
Available Freeware List
268 pages
Viva Questions
No ratings yet
Viva Questions
6 pages
CH4
No ratings yet
CH4
15 pages
UNIT 1_Part1
No ratings yet
UNIT 1_Part1
121 pages
Text Analytics Basics
No ratings yet
Text Analytics Basics
28 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
NLP Lect-5 02.02.21
No ratings yet
NLP Lect-5 02.02.21
18 pages
Final LP-VI NLP Manual 2023-24
No ratings yet
Final LP-VI NLP Manual 2023-24
29 pages
NLP Lect-6 03.02.21
No ratings yet
NLP Lect-6 03.02.21
17 pages
Fundaments of Text Analysis
No ratings yet
Fundaments of Text Analysis
14 pages
NLTK
No ratings yet
NLTK
4 pages
3 a Morphology
No ratings yet
3 a Morphology
4 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
Text Mining
No ratings yet
Text Mining
62 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
16 pages
NLP m2
No ratings yet
NLP m2
71 pages
NLP Steps Basic
No ratings yet
NLP Steps Basic
26 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
5-TECS OpenStack (V7.23.40) Troubleshooting
No ratings yet
5-TECS OpenStack (V7.23.40) Troubleshooting
155 pages
Detailed_Business_Blueprint_PP
No ratings yet
Detailed_Business_Blueprint_PP
2 pages
Resume Reference List Layout
100% (1)
Resume Reference List Layout
6 pages
NLP 3-6
No ratings yet
NLP 3-6
20 pages
NLB final lab manual (2)
No ratings yet
NLB final lab manual (2)
23 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
Unit 1b
No ratings yet
Unit 1b
24 pages
Full Principles of Accounting, Volume 2: Managerial Accounting 1st Edition Mitchell Franklin Ebook All Chapters
No ratings yet
Full Principles of Accounting, Volume 2: Managerial Accounting 1st Edition Mitchell Franklin Ebook All Chapters
34 pages
CMS8S3680 69xx Datasheet V1.0.8
No ratings yet
CMS8S3680 69xx Datasheet V1.0.8
46 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
7 pages
HP LaserJet Managed MFP E731 Series CPMD
No ratings yet
HP LaserJet Managed MFP E731 Series CPMD
942 pages
Synonym or Similar Word Detection in Assignment Papers: Gayatri Behera
No ratings yet
Synonym or Similar Word Detection in Assignment Papers: Gayatri Behera
2 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
Figma Brochure Guide
No ratings yet
Figma Brochure Guide
2 pages
Sg 248576
No ratings yet
Sg 248576
92 pages
Extracting, Cleaning and Pre-Processing Text
No ratings yet
Extracting, Cleaning and Pre-Processing Text
12 pages
CAT King study material 5
No ratings yet
CAT King study material 5
21 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
Operation and Installation Guide: Control/Communicator D7212G
No ratings yet
Operation and Installation Guide: Control/Communicator D7212G
68 pages
SL-3_Assignment No 7
No ratings yet
SL-3_Assignment No 7
14 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
AI_NLP
No ratings yet
AI_NLP
9 pages
MasterCAM - X4 - Mill Training Tutorial
100% (6)
MasterCAM - X4 - Mill Training Tutorial
101 pages
DBMS - Assignment - Roshan Kumar Thapa - 1
No ratings yet
DBMS - Assignment - Roshan Kumar Thapa - 1
6 pages
Ass7 Write Up .Final
No ratings yet
Ass7 Write Up .Final
11 pages
NLTK
No ratings yet
NLTK
3 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
ICT IGCSE Operating System
No ratings yet
ICT IGCSE Operating System
3 pages
Sample
No ratings yet
Sample
8 pages
Specifications: R&S®EVS300 ILS/VOR Analyzer
No ratings yet
Specifications: R&S®EVS300 ILS/VOR Analyzer
10 pages
CV1 Samiksha Sharma 61920343
No ratings yet
CV1 Samiksha Sharma 61920343
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
1z0 532oracle Hyperion Financial Management 11 Essentials
No ratings yet
1z0 532oracle Hyperion Financial Management 11 Essentials
29 pages
Principles of Check Weighing
No ratings yet
Principles of Check Weighing
72 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Wireless Overview Slides
No ratings yet
Wireless Overview Slides
26 pages
III. Advanced Neural Network Techniques
No ratings yet
III. Advanced Neural Network Techniques
6 pages
Using P-Spice Models For Vishay Siliconix Power Mosfets
No ratings yet
Using P-Spice Models For Vishay Siliconix Power Mosfets
5 pages
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
From Everand
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
Christopher Right
2.5/5 (2)
Ruby On Rails: An Introduction: JA-SIG Summer Conference 2007
No ratings yet
Ruby On Rails: An Introduction: JA-SIG Summer Conference 2007
43 pages
9.1 05 - Facebook Ad - Dynamics
No ratings yet
9.1 05 - Facebook Ad - Dynamics
18 pages
Xii PB 2023 Ip 201123
No ratings yet
Xii PB 2023 Ip 201123
6 pages
Time Series Graph
No ratings yet
Time Series Graph
1 page
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Kinduct Integrations Catalogue
No ratings yet
Kinduct Integrations Catalogue
12 pages
ZTE LTE Local Maintenance Terminal User Guide
80% (5)
ZTE LTE Local Maintenance Terminal User Guide
109 pages
Full Stack Development Bootcamp: 11 Months - Online
No ratings yet
Full Stack Development Bootcamp: 11 Months - Online
12 pages

NLP UNIT-2

Uploaded by

NLP UNIT-2

Uploaded by

UNIT-2

Natural Language Toolkit (NLTK)

print("rocks :", lemmatizer.lemmatize("rocks"))

# a denotes adjective in "pos"

Disadvantages of Lemmatization with NLTK:

Hidden Markov Model (HMM) POS Tagging

Application of WordNet in Query Expansion

What is Semantic or Wordnet Similarity?

1. Both belong to the same class

1. Both belong to the same class

When we should go with Semantic Similarity?

Resources used for Semantic Similarity

Measures of Semantic Similarity

1]Path Length Score

4] Resnik Similarity Score

5] Lin Similarity Score

where n is the size of features vector.

You might also like