NLP Lab Complete
NLP Lab Complete
R POLYTECHNIC,
Department of ARTIFICIAL
INTELLIGENCE AND MACHINE LEARNING
GUDLAVALLERU
CERTIFICATE
Certified that this is the Bonafide Record Work of NATURAL
out by
Mr. / Ms.
A student of
Marks Awarded 40
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13
14
15 .
2
Experiment:-1 DATE:
2. Click on Download, and then you have to check for compatibility of your Pc, after that it will start
downloading.
1
5. Select an install for “Just Me” unless you’re installing for all users (which require Windows Administrator
privileges) and click Next.
6. Select a destination folder to install Anaconda and click the Next button.
7. Click the Install button after installation Click the Next button.
2
8. And then click the Finish button.
To install NLTK using the Anaconda prompt, you can follow these steps:
3
After this you will get a GUI where you can download all the data.
RESULT:
4
Experiment-2 DATE:
AIM:EXECUTE TOKENISE BY WORD USING NLTK IN PYTHON.
Introduction
Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down
text into smaller units called tokens. These tokens can be words, phrases, or symbols.
PROGRAM1
import nltk
from nltk.tokenize import word_tokenize
print(word_tokenize("this is nirmal kollipara"))
PROGRAM2
def token(file_path):
with open(file_path,'r')as file:
text=file.read()
tokens=word_tokenize(text)
return tokens
file_path=r'C:\Users\DELL\Desktop\abc.txt'
tokens = token(file_path)
print(tokens)
RESULT:
5
Experiment-3 DATE:
Program 1
import nltk
from nltk.tokenize import sent_tokenize
t="hi how are you. this nirmal kollipara"
sent_tokenize(t)
import nltk
from nltk.tokenize import sent_tokenize
file=open("C:/Users/DELL/Desktop/abc.txt","r")
t=file.read()
sent_tokenize(t)
RESULT
6
Experiment-4 DATE:
AIM:EXECISE TO FIND MINIMUM NUMBER OF EDITS REQUIRED TO CONVERT STR1 INTO STR2 USING
PYTHON.
Introduction
The minimum edit distance is the lowest number of operations needed to transform one string into the other. It has many
applications. In an NLP for example, it could be used in spelling correction, document similarity and machine translation.
Program1:
str1="GEEKSFORGEEKS"
str2="GEEXSFRGEEKKS"
if m==0:
return n
if n==0:
return m
if str1[m-1]==str2[n-1]:
return 1+min(med(str1,str2,m,n-1),
med(str1,str2,m-1,n),
med(str1,str2,m-1,n-1))
print(med(str1,str2,len(str1),len(str2)))
Program2:
def editDistance(str1,str2,m,n):
if m==0:
return n
if n==0:
return m
if str1[m-1]==str2[n-1]:
7
return editDistance(str1,str2,m-1,n-1)
return 1+ min(editDistance(str1,str2,m,n-1),
editDistance(str1,str2,m-1,n),
editDistance(str1,str2,m-1,n-1),
str1="NLPPROGRAMM"
str2="DLPPROGRAMM"
print(editDistance(str1,str2,len(str1),len(str2)))
RESULT:
8
Experiment-5 DATE:
AIM:PRACTICE PART OF SPEECH TAGGING WITH STOP WORDS USING NLTK IN PYTHON.
INTRODUCTION
Stop words are a set of commonly used words in a language. Examples of stop words in English are “a,” “the,” “is,” “are,”
etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so
widely used that they carry very little useful information.
PROGRAM1:
import nltk
words=nltk.word_tokenize(text)
stop_words=set(stopwords.words('english'))
pos_tags=nltk.pos_tag(words)
print(f"{tag}:{word}")
PROGRAM2:
import nltk
from nltk.corpus import stopwords
words=nltk.word_tokenize(text)
stop_words=set(stopwords.words('english'))
pos_tags=nltk.pos_tag(words)
stop =nltk.pos_tag(stop)
print(f"{tag}:{word}")
print(stop_words)
RESULT:
10
Experiment-6 DATE:
Binning method is used to smoothing data or to handle noisy data. In this method, the data is first sorted and then the
sorted values are distributed into a number of buckets or bins. As binning methods consult the neighbourhood of values,
they perform local smoothing.
Program
import numpy as np
import math
dataset = load_iris()
a = dataset.data
b = np.zeros(150)
b[i]=a[i,1]
# create bins
bin1=np.zeros((30,5))
bin2=np.zeros((30,5))
bin3=np.zeros((30,5))
# Bin mean
11
k=int(i/5)
for j in range(5):
bin1[k,j]=mean
# Bin boundaries
k=int(i/5)
bin2[k,j]=b[i]
else:
bin2[k,j]=b[i+4]
# Bin median
k=int(i/5)
bin3[k,j]=b[i+2]
RESULT:
12
Experiment-7 DATE:
Program
import nltk
print(parse_tree)
parse_tree.pretty_print()
RESULT:
13
Experiment-8 DATE:
SHallow parsing, also known as chunking, is a type of natural language processing (NLP) technique that aims to identify
and extract meaningful phrases or chunks from a sentence.
Unlike full parsing, which involves analyzing the grammatical structure of a sentence, shallow parsing focuses on
identifying individual phrases or constituents, such as noun phrases, verb phrases, and prepositional phrases.
Shallow parsing is an essential component of many NLP tasks, including information extraction, text classification,
and sentiment analysis.
Program
import nltk
from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.chunk import RegexpParser
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
def shallow_parse(sentence):
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
print("POS Tags:", pos_tags)
chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>} # Noun Phrase
VP: {<VB.*><NP|PP|CLAUSE>+$} # Verb Phrase
PP: {<IN><NP>} # Prepositional Phrase
CLAUSE: {<NP><VP>} # Clause
"""
chunk_parser = RegexpParser(chunk_grammar)
tree = chunk_parser.parse(pos_tags)
print("\nShallow Parse Tree:")
print(tree)
tree.draw()
if __name__ == "__main__":
sentence = "The quick brown fox jumps over the lazy dog."
shallow_parse(sentence)
RESULT:
14
Experiment-9 DATE:
The Fibonacci numbers are the numbers in the following integer sequence. 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ……..
In mathematical terms, the sequence Fn of Fibonacci numbers is defined by the recurrence relation.
Fn = Fn-1 + Fn-2
with seed values : F0 = 0 and F1 = 1.
Program
def fibonacci_dp(n):
# Edge cases for n = 0 and n = 1
if n == 0:
return 0
elif n == 1:
return 1
Result:
15
Experiment-10 DATE:
Syntax : TextBlob.correct()
Return : Return the correct sentence without spelling mistakes.
Program
return str(corrected_text)
text="i havv a dreem that one day thea nation will rise up."
corrected_text=correct_spelling_textblob(text)
corrected_text1=correct_spelling_textblob("helloo these is mee")
print(corrected_text)
RESULT:
16
Experiment-11 DATE:
Introduction
Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with
Part-of-Speech).Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can
even define a pattern or words that can’t be a part of chuck and such words are known as chinks.
Program
import nltk
from nltk import pos_tag, word_tokenize, RegexpParser
# Sample sentence
sentence = "The quick brown fox jumps over the lazy dog."
Result
17
Experiment-12 DATE:
INTRODUCTION
Chinking is nothing but the process of removing the chunk from the chunk which is called as chink. These patterns
are normal regular expression which are modifdied and designed to match POS(Part-of-Speech) tag designed to
match the sequences of POS tags.
PROGRAM
import nltk
from nltk.chunk import RegexpParser
from nltk import pos_tag
from nltk.tokenize import word_tokenize
# Example sentence
sentence = "The quick brown fox jumps over the lazy dog."
RESULT:
18
Experiment-13 DATE:
INTRODUCTION
Lemmatization is the process of grouping together the different inflected forms of a word so they can
be analyzed as a single item. Lemmatization is similar to stemming but it brings context to the
words. So, it links words with similar meanings to one word.
PROGRAM
Result:
19
Experiment-14 DATE:
INTRODCUTION
Stemming is a method in text processing that eliminates prefixes and suffixes from words,
transforming them into their fundamental or root form, The main objective of stemming is to
streamline and standardize words, enhancing the effectiveness of the natural language
processing tasks.
PROGRAM
porter_stemmer = PorterStemmer()
RESULT:
20
Experiment-15 DATE:
PROGRAM
import nltk
from nltk import FreqDist
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt
# Sample text
text = "This is a sample text. This text is for testing the frequency distribution."
RESULT:
21