0% found this document useful (0 votes)
40 views23 pages

NLP Lab Complete

Natural language processing lab

Uploaded by

vineethaiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views23 pages

NLP Lab Complete

Natural language processing lab

Uploaded by

vineethaiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

A.A.N.M & V.V.R.S.

R POLYTECHNIC,
Department of ARTIFICIAL
INTELLIGENCE AND MACHINE LEARNING
GUDLAVALLERU

CERTIFICATE
Certified that this is the Bonafide Record Work of NATURAL

LANGUAGE PROCESSING USING PYTHON LAB (AIM-506) carried

out by

Mr. / Ms.

A.A.N.M & V.V.R.S.R Polytechnic


PIN No.

A student of

during the Academic Year

Marks Awarded 40

Head of Section Staff Member


1
INDEX
Page
S. No. NAME OF THE EXPERIMENT Marks Remarks
No.

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13

14

15 .

2
Experiment:-1 DATE:

Aim: INSTALLATION OF NLTK IN PYTHON.

Installation of Anaconda IDE

1. Please click on the link below https://www.anaconda.com/download/#windows

2. Click on Download, and then you have to check for compatibility of your Pc, after that it will start
downloading.

3. Double click the installer to launch.Click ON Next.

4. Read the licensing terms and click “I Agree”.

1
5. Select an install for “Just Me” unless you’re installing for all users (which require Windows Administrator
privileges) and click Next.

6. Select a destination folder to install Anaconda and click the Next button.

7. Click the Install button after installation Click the Next button.

2
8. And then click the Finish button.

• Installation of nltk using Anaconda prompt :-

To install NLTK using the Anaconda prompt, you can follow these steps:

1. Open Anaconda Píompt :

2. Input the following commands:


import nltk nltk.download('all')

3
After this you will get a GUI where you can download all the data.

3. close nltk downloader after installation of nltk.

RESULT:

4
Experiment-2 DATE:
AIM:EXECUTE TOKENISE BY WORD USING NLTK IN PYTHON.
Introduction
Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down
text into smaller units called tokens. These tokens can be words, phrases, or symbols.

PROGRAM1
import nltk
from nltk.tokenize import word_tokenize
print(word_tokenize("this is nirmal kollipara"))

PROGRAM2
def token(file_path):
with open(file_path,'r')as file:
text=file.read()
tokens=word_tokenize(text)
return tokens
file_path=r'C:\Users\DELL\Desktop\abc.txt'
tokens = token(file_path)
print(tokens)

RESULT:

5
Experiment-3 DATE:

AIM:EXECUTE TOKENISE BY SENTENCE USING NLTK IN PYTHON.


Introduction
Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down
text into smaller units called tokens. These tokens can be words, phrases, or symbols.

Program 1
import nltk
from nltk.tokenize import sent_tokenize
t="hi how are you. this nirmal kollipara"
sent_tokenize(t)

program 2: reading form file

import nltk
from nltk.tokenize import sent_tokenize
file=open("C:/Users/DELL/Desktop/abc.txt","r")
t=file.read()
sent_tokenize(t)

RESULT

6
Experiment-4 DATE:

AIM:EXECISE TO FIND MINIMUM NUMBER OF EDITS REQUIRED TO CONVERT STR1 INTO STR2 USING
PYTHON.

Introduction
The minimum edit distance is the lowest number of operations needed to transform one string into the other. It has many
applications. In an NLP for example, it could be used in spelling correction, document similarity and machine translation.

Program1:

str1="GEEKSFORGEEKS"

str2="GEEXSFRGEEKKS"

def med (str1,str2,m,n):

if m==0:

return n

if n==0:

return m

if str1[m-1]==str2[n-1]:

return med (str1,str2,m-1,n-1)

return 1+min(med(str1,str2,m,n-1),

med(str1,str2,m-1,n),

med(str1,str2,m-1,n-1))

print(med(str1,str2,len(str1),len(str2)))

Program2:

def editDistance(str1,str2,m,n):

if m==0:

return n

if n==0:

return m

if str1[m-1]==str2[n-1]:

7
return editDistance(str1,str2,m-1,n-1)

return 1+ min(editDistance(str1,str2,m,n-1),

editDistance(str1,str2,m-1,n),

editDistance(str1,str2,m-1,n-1),

str1="NLPPROGRAMM"

str2="DLPPROGRAMM"

print(editDistance(str1,str2,len(str1),len(str2)))

RESULT:

8
Experiment-5 DATE:

AIM:PRACTICE PART OF SPEECH TAGGING WITH STOP WORDS USING NLTK IN PYTHON.

INTRODUCTION
Stop words are a set of commonly used words in a language. Examples of stop words in English are “a,” “the,” “is,” “are,”
etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so
widely used that they carry very little useful information.

PROGRAM1:

import nltk

from nltk.corpus import stopwords


text="the quick brown fox jumps over the lazy dog"

words=nltk.word_tokenize(text)

stop_words=set(stopwords.words('english'))

pos_tags=nltk.pos_tag(words)

print("pos tags include stop words")

for word,tag in pos_tags:

print(f"{tag}:{word}")

PROGRAM2:
import nltk
from nltk.corpus import stopwords

text="the quick brown fox jumps over the lazy dog"

words=nltk.word_tokenize(text)

stop_words=set(stopwords.words('english'))

stop=[word for word in words if word.lower()in stop_words]

pos_tags=nltk.pos_tag(words)

stop =nltk.pos_tag(stop)

print("pos tags include stop words")

for word,tag in pos_tags:


9
print(f"{tag}:{word}")

print("pos tags with stop words:")

for word,tag in stop:

print(f"{tag}:{word}")

print("..................................................................the stop words are below............................................................")

print(stop_words)

RESULT:

10
Experiment-6 DATE:

AIM:EXERCISE ON BINNING METHODS FOR DATA SMOOTHING USING PYTHON.


Introduction

Binning method is used to smoothing data or to handle noisy data. In this method, the data is first sorted and then the
sorted values are distributed into a number of buckets or bins. As binning methods consult the neighbourhood of values,
they perform local smoothing.

Program

import numpy as np

import math

from sklearn.datasets import load_iris

from sklearn import datasets, linear_model, metrics

# load iris data set

dataset = load_iris()

a = dataset.data

b = np.zeros(150)

# take 1st column among 4 column of data set

for i in range (150):

b[i]=a[i,1]

b=np.sort(b) #sort the array

# create bins

bin1=np.zeros((30,5))

bin2=np.zeros((30,5))

bin3=np.zeros((30,5))

# Bin mean

for i in range (0,150,5):

11
k=int(i/5)

mean=(b[i] + b[i+1] + b[i+2] + b[i+3] + b[i+4])/5

for j in range(5):

bin1[k,j]=mean

print("Bin Mean: \n",bin1)

# Bin boundaries

for i in range (0,150,5):

k=int(i/5)

for j in range (5):

if (b[i+j]-b[i]) < (b[i+4]-b[i+j]):

bin2[k,j]=b[i]

else:

bin2[k,j]=b[i+4]

print("Bin Boundaries: \n",bin2)

# Bin median

for i in range (0,150,5):

k=int(i/5)

for j in range (5):

bin3[k,j]=b[i+2]

print("Bin Median: \n",bin3)

RESULT:

12
Experiment-7 DATE:

AIM:PRACTICE BASIC TREE BANK STRUCTURE IMPLEMENTATION IN PYTHON.

Program

import nltk

from nltk.tree import Tree

parse_tree=Tree.fromstring('(S (NP (DT THE)(NN CAT))(VP(VBD SAT)(PP(IN ON)(NP(DT THE)(NN MAT)))))')

print(parse_tree)

parse_tree.pretty_print()

RESULT:

13
Experiment-8 DATE:

AIM:EXERCISE ON CREATING SHALLOW TREE USING PYTHON.


Introdcution

SHallow parsing, also known as chunking, is a type of natural language processing (NLP) technique that aims to identify
and extract meaningful phrases or chunks from a sentence.
Unlike full parsing, which involves analyzing the grammatical structure of a sentence, shallow parsing focuses on
identifying individual phrases or constituents, such as noun phrases, verb phrases, and prepositional phrases.
Shallow parsing is an essential component of many NLP tasks, including information extraction, text classification,
and sentiment analysis.

Program

import nltk
from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.chunk import RegexpParser

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

def shallow_parse(sentence):
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
print("POS Tags:", pos_tags)

chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>} # Noun Phrase
VP: {<VB.*><NP|PP|CLAUSE>+$} # Verb Phrase
PP: {<IN><NP>} # Prepositional Phrase
CLAUSE: {<NP><VP>} # Clause
"""

chunk_parser = RegexpParser(chunk_grammar)
tree = chunk_parser.parse(pos_tags)
print("\nShallow Parse Tree:")
print(tree)
tree.draw()

if __name__ == "__main__":
sentence = "The quick brown fox jumps over the lazy dog."
shallow_parse(sentence)

RESULT:

14
Experiment-9 DATE:

AIM:PRACTICE FIBONACCI NUMBER USING DYNAMIC PROGRAMMING PYTHON.


Introdcution

The Fibonacci numbers are the numbers in the following integer sequence. 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ……..
In mathematical terms, the sequence Fn of Fibonacci numbers is defined by the recurrence relation.
Fn = Fn-1 + Fn-2
with seed values : F0 = 0 and F1 = 1.

Program

def fibonacci_dp(n):
# Edge cases for n = 0 and n = 1
if n == 0:
return 0
elif n == 1:
return 1

# Initialize the base values for F(0) and F(1)


fib = [0] * (n + 1)
fib[0] = 0
fib[1] = 1

# Build the Fibonacci sequence from the bottom-up


for i in range(2, n + 1):
fib[i] = fib[i - 1] + fib[i - 2]

# The nth Fibonacci number


return fib[n]

# Driver function to test the program


if __name__ == "__main__":
n = 10 # Find the 10th Fibonacci number
print(f"Fibonacci number F({n}) = {fibonacci_dp(n)}")

Result:

15
Experiment-10 DATE:

AIM:EXECUTE CORRECT() FUNCTION USING NLTK IN PYTHON.


Introdcution
With the help of TextBlob.correct() method, we can get the corrected words if any sentence have spelling mistakes by
using TextBlob.correct() method.

Syntax : TextBlob.correct()
Return : Return the correct sentence without spelling mistakes.

Program

pip install textblob


from textblob import TextBlob
def correct_spelling_textblob(text):
blob=TextBlob(text)
corrected_text=blob.correct()

return str(corrected_text)
text="i havv a dreem that one day thea nation will rise up."
corrected_text=correct_spelling_textblob(text)
corrected_text1=correct_spelling_textblob("helloo these is mee")
print(corrected_text)

RESULT:

16
Experiment-11 DATE:

AIM: EXERCISE ON CHUNKING USING NLTK IN PYTHON

Introduction

Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with
Part-of-Speech).Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can
even define a pattern or words that can’t be a part of chuck and such words are known as chinks.

Program

import nltk
from nltk import pos_tag, word_tokenize, RegexpParser

# Download necessary resources


nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Sample sentence
sentence = "The quick brown fox jumps over the lazy dog."

# Tokenize and tag the sentence


tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)

# Define a grammar for chunking


grammar = "NP: {<DT>?<JJ>*<NN.*>+}"
chunk_parser = RegexpParser(grammar)

# Chunk the sentence


chunks = chunk_parser.parse(tagged)

# Print the chunked output


print(chunks)

Result

17
Experiment-12 DATE:

AIM: EXERCISE ON CHINKING USING NLTK IN PYTHON.

INTRODUCTION

Chinking is nothing but the process of removing the chunk from the chunk which is called as chink. These patterns
are normal regular expression which are modifdied and designed to match POS(Part-of-Speech) tag designed to
match the sequences of POS tags.

PROGRAM

import nltk
from nltk.chunk import RegexpParser
from nltk import pos_tag
from nltk.tokenize import word_tokenize

# Example sentence
sentence = "The quick brown fox jumps over the lazy dog."

# Tokenizing and part-of-speech tagging


tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)

# Chunking pattern (Example: Noun Phrase Chunk - NP)


chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>} # Chunk determiners, adjectives, and nouns
}<VB|IN>{ # Chink verbs (VB) or prepositions (IN)
"""

# Create a chunk parser using the grammar


chunk_parser = RegexpParser(chunk_grammar)

# Parse the tagged sentence


chunked_sentence = chunk_parser.parse(tagged)

# Display the result


print(chunked_sentence)

RESULT:

18
Experiment-13 DATE:

AIM:PRACTICE LEMMATIZING USING NLTK PYTHON.

INTRODUCTION

Lemmatization is the process of grouping together the different inflected forms of a word so they can
be analyzed as a single item. Lemmatization is similar to stemming but it brings context to the
words. So, it links words with similar meanings to one word.

PROGRAM

from nltk.stem import WordNetLemmatizer


lemmatizer = WordNetLemmatizer()
print("rocks :", lemmatizer.lemmatize("rocks"))
print("corpora :", lemmatizer.lemmatize("corpora"))
# a denotes adjective in "pos"
print("better :", lemmatizer.lemmatize("better", pos="a"))

Result:

19
Experiment-14 DATE:

AIM:PRACTICE STEMMING USING NLTK IN PYTHON.

INTRODCUTION

Stemming is a method in text processing that eliminates prefixes and suffixes from words,

transforming them into their fundamental or root form, The main objective of stemming is to

streamline and standardize words, enhancing the effectiveness of the natural language

processing tasks.

PROGRAM

from nltk.stem import PorterStemmer

# Create a Porter Stemmer instance

porter_stemmer = PorterStemmer()

# Example words for stemming

words = ["running", "jumps", "happily", "running", "happily"]

# Apply stemming to each word

stemmed_words = [porter_stemmer.stem(word) for word in words]

# Print the results

print("Original words:", words)

print("Stemmed words:", stemmed_words)

RESULT:

20
Experiment-15 DATE:

AIM:EXERCISE ON MAKING A FREQUENCY DISTRIBUTION USING NLTK IN PYTHON.

PROGRAM

import nltk
from nltk import FreqDist
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt

# Make sure to download the punkt tokenizer if you haven't done so


nltk.download('punkt')

# Sample text
text = "This is a sample text. This text is for testing the frequency distribution."

# Tokenize the text


tokens = word_tokenize(text.lower()) # Convert to lower case to standardize

# Create a frequency distribution


freq_dist = FreqDist(tokens)

# Print the frequency distribution


print(freq_dist)

# Plot the frequency distribution


freq_dist.plot(30, cumulative=False)
plt.show()

# You can also access the most common words


print(freq_dist.most_common(5))

RESULT:

21

You might also like