p4
p4
Date: ……………
Practical No.4: Perform following data preprocessing on text/paragraph using NLTK
library:
a. Write a Python program to tokenize words, sentence wise.
b. Write a python program that accepts the list of tokenized word and
stems it into root word.
c. Write a program in python to identify the part of speech for each word
in the text.
d. Write a Python NLTK program to remove stop words from a given text.
e. Write a python program for identifying and correcting misspelled
words in a given text, such as an essay or a letter.
A. Objective: Learn data pre-processing using NLTK library to write the python
program.
B. Expected Program Outcomes (POs): PO1, PO2, PO3, PO4, PO5, PO6, PO7
C. Expected Skills to be developed based on competency:
Able to apply data preprocessing on text/paragraph using NLTK library.
Here is program logic for a Python program that utilizes NLTK for various NLP tasks:
1. Import the necessary modules and libraries:
nltk for NLP functionalities
Specific modules like PorterStemmer or WordNetLemmatizer for word
stemming or lemmatization
2. Define functions for each task:
Tokenization:
Use word_tokenize() to tokenize words
Use sent_tokenize() to tokenize sentences
Word Stemming:
Initialize a stemmer object (e.g., PorterStemmer())
Use the stemmer's stem() function to stem each word
Part-of-Speech (POS) Tagging:
Use pos_tag() to get POS tags for each word
Stop Words Removal:
Use stopwords.words() to get a list of stopwords for a specific
language
Filter out the stopwords from the tokenized words
Misspelled Words Correction:
Initialize a spell checker object (e.g., SpellChecker())
Use the spell checker's correction() function to correct
misspelled words
3. Get user input or load text from a file.
Foundation of AI and ML (4351601)
import nltk
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
from nltk import pos_tag, word_tokenize
from spellchecker import SpellChecker
# Tokenization
deftokenize_words(text):
return word_tokenize(text)
deftokenize_sentences(text):
return sent_tokenize(text)
# Word Stemming
defstem_words(words):
stemmer = PorterStemmer()
return [stemmer.stem(word) for word in words]
Foundation of AI and ML (4351601)
# POS Tagging
defidentify_pos(words):
return pos_tag(words)
# Example usage
text = "This is an example sentence. And here's another one!"
# Tokenization
words = tokenize_words(text)
sentences = tokenize_sentences(text)
# Word Stemming
stemmed_words = stem_words(words)
# POS Tagging
pos_tags = identify_pos(words)
Foundation of AI and ML (4351601)
I. Resources/Equipment Required
Ensure that all the necessary equipment and software are in good working
condition.
Never eat or drink in the lab, as it can cause contamination and create
safety hazards.
If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
Foundation of AI and ML (4351601)
Output:
Foundation of AI and ML (4351601)
M. References / Suggestions
1. https://www.geeksforgeeks.org/machine-learning/
2. https://www.geeksforgeeks.org/natural-language-processing-nlp-tutorial/
3. https://www.tutorialspoint.com/machine_learning_with_python/index.htm
N. Assessment-Rubrics
Presentinpractical
Watched other
sessionbutnotatte
Performe d Performed students
ntivelyparticipate
practical practical with performing
dinperformance
Engagement /5 him/hers others help practical but not
elf tried him/herself