0% found this document useful (0 votes)
3 views

NLP Lab File

Uploaded by

Bharat Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

NLP Lab File

Uploaded by

Bharat Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

DELHI TECHNOLOGICAL

UNIVERSITY
SE-316
NATURAL LANGUAGE PROCESSING

Department of Software Engineering


Delhi Technological University
Bawana Road, Delhi-110042

Submitted by
Prashant Tiwari
Roll Number :- 2K20/IT/103
Batch :- IT-B

Submitted to : Dr. Divyashikha Sethia


Department of Software Engineering
Delhi Technological University
INDEX

S. No. Experiment Date

1. Import nltk and download the ‘stopwords’ 13-01-2023


and ‘punkt’ packages

2. Import spacy and load the language model. 13-01-2023

3. WAP in python to tokenize a given text. 20-01-2023

4. WAP in python to get the sentences of a 03-03-2023


text document.

5. WAP in python to tokenize text with 03-02-2023


stopwords as delimiters.

6. WAP in python to add custom stop words in 03-02-2023


spaCy.

7. WAP to remove punctuations, perform 24-02-2023


stemming, lemmatize given text and extract
usernames from emails

8. WAP to do spell correction, extract all 07-03-2023


nouns, pronouns and verbs in a given text

9. WAP to find similarity between two words 31-03-2023


and classify a text as positive/negative
sentiment
EXPERIMENT - 1
AIM : Import nltk and download the ‘stopwords’ and ‘punkt’
packages

CODE :
import nltk

nltk.download('stopwords')
nltk.download('punkt')

OUTPUT :
EXPERIMENT - 2
AIM : Import spacy and load the language model

CODE :
import spacy
nlp_eng = spacy.load('en_core_web_sm')
nlp_multi = spacy.load('xx_ent_wiki_sm')

OUTPUT :
EXPERIMENT - 3
AIM : WAP in python to tokenize a given text

CODE :
from nltk import word_tokenize
text = "Last week, the University of Cambridge shared its own research
that shows if everyone wears a mask outside home,dreaded ‘second wave’
of the pandemic can be avoided."
text = word_tokenize(text)
for t in text:
print(t)

OUTPUT :
EXPERIMENT - 4
AIM : WAP in python to get the sentences of a text document.

CODE :
file = open('04.txt')
Input_text = file.read()
ans = Input_text.split('.')

for an in ans:
print(an,'\n')

OUTPUT :
EXPERIMENT - 5
AIM : WAP in python to tokenize text with stopwords as
delimiters.

CODE :
text = "Walter was feeling anxious. He was diagnosed today. He probably
is the best person I know."

stop_words_and_delims = ['was', 'is', 'the', '.', ',', '-', '!', '?']


for r in stop_words_and_delims:
text = text.replace(r, 'DELIM')

words = [t.strip() for t in text.split('DELIM')]


words_filtered = list(filter(lambda a: a not in [''], words))
for word in words_filtered:
print(word)

OUTPUT :
EXPERIMENT - 6
AIM : WAP in python to add custom stop words in spaCy.

CODE :
import spacy

nlp = spacy.load('en_core_web_sm')

custom_stop_words = ['was', 'is','the','JUNK','NIL','of','more' ,'.',


',', '-', '!', '?','a']
for word in custom_stop_words:
nlp.vocab[word].is_stop = True

doc = nlp("Jonas was a JUNK great guy NIL Adam was evil NIL Martha JUNK
was more of a fool")
for token in doc:
if not token.is_stop:
print(token.text, end=" ")

OUTPUT :
EXPERIMENT - 7
AIM : WAP to remove punctuations, perform stemming,
lemmatize given text and extract usernames from emails

CODE :
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''

string = "Jonas!!! great \\guy <> Adam --evil [Martha] ;;fool() ."

ans = ""
for char in string:
if char not in punctuations:
ans+=char

print(ans)

from nltk.stem import PorterStemmer


from nltk.tokenize import word_tokenize
text= "Dancing is an art. Students should be taught dance as a subject
in schools . I danced in many of my school function. Some people are
always hesitating to dance."
ans = ""
stemmer = PorterStemmer()
tokens = word_tokenize(text)
for token in tokens:
ans+=stemmer.stem(token)
ans+=" "
print(ans)

from nltk.corpus import wordnet


from nltk.tokenize import word_tokenize

from nltk.stem.wordnet import WordNetLemmatizer


lemmatizer = WordNetLemmatizer()
text= "Dancing is an art. Students should be taught dance as a subject
in schools . I danced in many of my school function. Some people are
always hesitating to dance."
ans = ""
tokens = word_tokenize(text)
for token in tokens:
ans+=lemmatizer.lemmatize(token, wordnet.VERB)
ans+=" "
print(ans)
from nltk.tokenize import word_tokenize

text= "The new registrations are [email protected] ,


[email protected]. If you find any disruptions, kindly contact
[email protected] or [email protected] "

text_list = word_tokenize(text)
usernames = []
for i in range(len(text_list)):
if text_list[i] == "@":
usernames.append(text_list[i-1])
print(usernames)

OUTPUT :
EXPERIMENT - 8
AIM : WAP to do spell correction, extract all nouns, pronouns
and verbs in a given text

CODE :
from textblob import TextBlob
text="He is a gret person. He beleives in bod"
textb = TextBlob(text)
correct_text = textb.correct()
print(correct_text)

import nltk
from nltk import word_tokenize, pos_tag
text="James works at Microsoft. She lives in manchester and likes to
play the flute"
tokens = word_tokenize(text)
parts_of_speech = nltk.pos_tag(tokens)
nouns = list(filter(lambda x: x[1] == "NN" or x[1] == "NNP",
parts_of_speech))
for noun in nouns:
print(noun[0])

from nltk import pos_tag, word_tokenize

text = "I may bake a cake for my birthday. The talk will introduce
reader about Use of baking"

words = word_tokenize(text)

verb_phrases = []
for i in range(len(words)):
if i > 0 and pos_tag(words)[i][1] == 'VB':
verb_phrase = words[i-1] + ' ' + words[i]
verb_phrases.append(verb_phrase)

for i in verb_phrases:
print (i)

OUTPUT :
EXPERIMENT - 9
AIM : WAP to find similarity between two words and classify a
text as positive/negative sentiment

CODE :
import spacy

nlp = spacy.load('en_core_web_md')
words = "amazing terrible excellent"

tokens = nlp(words)

token1, token2, token3 = tokens[0], tokens[1], tokens[2]

print(f"Similarity between {token1} and {token2} : ",


token1.similarity(token2))
print(f"Similarity between {token1} and {token3} : ",
token1.similarity(token3))

from textblob import TextBlob


text = "It was a very pleasant day"
print(TextBlob(text).sentiment)

OUTPUT :

You might also like