0% found this document useful (0 votes)

14 views

Combine PDF

Uploaded by

rsdhiva22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Combine PDF

Uploaded by

rsdhiva22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 124

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY

SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 5: SENTIMENT ANALYSIS

AIM: To perform sentiment analysis program using an SVM classifier with TF-IDF
vectorization.

PROCEDURE:
Data Preparation: Downloading the dataset, converting it into a suitable format (words and
sentiments), and structuring it into a DataFrame.
Splitting Data: Dividing the dataset into training and testing sets to train the model on a
portion and evaluate it on another.
TF-IDF Vectorization: Converting text data into numerical vectors using TF-IDF (Term
Frequency-Inverse Document Frequency) representation.
SVM Initialization and Training: Setting up an SVM classifier and training it using the TF-
IDF vectors obtained from the training text data.
Prediction and Evaluation: Transforming test data into TF-IDF vectors, predicting sentiment
labels, and evaluating the model's performance by comparing predicted labels with actual
labels using accuracy and a classification report.
The following algorithm outlines the process of building a sentiment analysis model using an
SVM classifier with TF-IDF vectorization in Python. Adjustments can be made to use
different datasets, vectorization techniques, or machine learning models based on specific
requirements.

ALGORITHM:
1. Library Installation and Import: Install required libraries (scikit-learn and nltk).
Import necessary modules from these libraries.
2. Download NLTK Resources: Download the movie_reviews dataset from NLTK.
3. Load and Prepare Dataset: Load the movie_reviews dataset.
Convert the dataset into a suitable format (list of words and corresponding sentiments)
and create a DataFrame.
1

4. Split Data into Train and Test Sets: Split the dataset into training and testing sets (e.g.,
Page

80% training, 20% testing).

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

5. TF-IDF Vectorization: Initialize a TF-IDF vectorizer.
Fit and transform the training text data to convert it into numerical TF-IDF vectors.
6. Initialize and Train SVM Classifier: Initialize an SVM classifier (using a linear kernel
for this example).
Train the SVM classifier using the TF-IDF vectors and corresponding sentiment
labels.
7. Prediction and Evaluation: Transform the test text data into TF-IDF vectors using the
trained vectorizer.
Predict sentiment labels for the test data using the trained SVM classifier.
Calculate the accuracy score to evaluate the model's performance.
Generate a classification report showing precision, recall, and F1-score for each class.

PROGRAM:
# Install necessary libraries
!pip install scikit-learn
!pip install nltk

# Import required libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from nltk.corpus import movie_reviews # Sample dataset from NLTK

# Download NLTK resources (run only once if not downloaded)

import nltk
nltk.download('movie_reviews')

# Load the movie_reviews dataset

documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
2
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

# Convert data to DataFrame
df = pd.DataFrame(documents, columns=['text', 'sentiment'])

# Split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(df['text'], df['sentiment'], test_size=0.2,
random_state=42)

# Initialize TF-IDF vectorizer

tfidf_vectorizer = TfidfVectorizer()

# Fit and transform the training data

X_train_tfidf = tfidf_vectorizer.fit_transform(X_train.apply(' '.join))

# Initialize SVM classifier

svm_classifier = SVC(kernel='linear')

# Train the classifier

svm_classifier.fit(X_train_tfidf, y_train)

# Transform the test data

X_test_tfidf = tfidf_vectorizer.transform(X_test.apply(' '.join))

# Predict on the test data

y_pred = svm_classifier.predict(X_test_tfidf)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Display classification report

3
Page

print(classification_report(y_test, y_pred))

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

OUTPUT:
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages
(from scikit-learn) (1.23.5)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from
scikit-learn) (1.11.3)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from
scikit-learn) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-
packages (from scikit-learn) (3.2.0)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk)
(8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk)
(1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages
(from nltk) (2023.6.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk)
(4.66.1)
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data] Unzipping corpora/movie_reviews.zip.
Accuracy: 0.84
precision recall f1-score support

neg 0.83 0.85 0.84 199

pos 0.85 0.82 0.84 201

accuracy 0.84 400

macro avg 0.84 0.84 0.84 400
weighted avg 0.84 0.84 0.84 400

4
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 6: PARTS OF SPEECH TAGGING

AIM: To perform Parts of Speech (POS) tagging program using NLTK

PROCEDURE:
Library Installation and Import: Ensures the NLTK library is available for use and imports
the necessary modules for text processing.
Download NLTK Resources: Downloads essential resources (punkt for tokenization,
averaged_perceptron_tagger for POS tagging) required by NLTK.
Sample Text: Defines a piece of text to demonstrate POS tagging.
Tokenization: Divides the text into individual words or tokens, making it suitable for further
analysis.
POS Tagging: Assigns each word in the text its respective grammatical category or POS tag
using NLTK's POS tagging functionality.
Display POS Tags: Prints or displays the words along with their associated POS tags obtained
from the tagging process.
The following algorithm outlines the steps involved in performing Parts of Speech (POS)
tagging using NLTK in Python. It demonstrates how to tokenize a text and assign
grammatical categories to individual words, providing insight into the linguistic structure of
the text.

ALGORITHM:
1. Library Installation and Import: Install NLTK library if not already installed.
Import the necessary NLTK library for text processing and POS tagging.
2. Download NLTK Resources: Download NLTK resources required for tokenization
and POS tagging (punkt for tokenization, averaged_perceptron_tagger for POS
tagging).
3. Sample Text: Define a sample text for POS tagging.
1

4. Tokenization: Break down the provided text into individual words (tokens) using
Page

NLTK's word_tokenize() method.

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

5. POS Tagging: Perform POS tagging on the tokens obtained from the text using
NLTK's pos_tag() method.
Assign POS tags to each word in the text based on its grammatical category (noun,
verb, adjective, etc.).
6. Display POS Tags: Print or display the words along with their respective POS tags
generated by the POS tagging process.

PROGRAM:
# Install NLTK (if not already installed)
!pip install nltk

# Import necessary libraries

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Sample text for POS tagging

text = "Parts of speech tagging helps to understand the function of each word in a sentence."

# Tokenize the text into words

tokens = nltk.word_tokenize(text)

# Perform POS tagging

pos_tags = nltk.pos_tag(tokens)

# Display the POS tags

print("POS tags:", pos_tags)

OUTPUT:
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk)
2

(8.1.7)
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk)
(1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages
(from nltk) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk)
(4.66.2)
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
POS tags: [('Parts', 'NNS'), ('of', 'IN'), ('speech', 'NN'), ('tagging', 'VBG'), ('helps', 'NNS'), ('to',
'TO'), ('understand', 'VB'), ('the', 'DT'), ('function', 'NN'), ('of', 'IN'), ('each', 'DT'), ('word',
'NN'), ('in', 'IN'), ('a', 'DT'), ('sentence', 'NN'), ('.', '.')]

3
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 7: CHUNKING

AIM: To perform Noun Phrase chunking

PROCEDURE:
In Natural Language Processing (NLP), chunking is the process of extracting short, meaningful
phrases (chunks) from a sentence based on specific patterns of parts of speech (POS). Python
provides tools like NLTK (Natural Language Toolkit) to perform chunking. This example
demonstrates a basic noun phrase (NP) and verb phrase (VP) chunking using NLTK. You can
adjust the chunk grammar patterns to capture different types of phrases or entities based on
your specific needs.
The chunk_grammar variable contains patterns defined using regular expressions for
identifying noun phrases and verb phrases. Adjusting these patterns can help extract different
types of chunks like prepositional phrases, named entities, etc.
Tokenization: Breaking the sentence into individual tokens or words.
POS Tagging: Assigning part-of-speech tags to each token (identifying whether it's a noun,
verb, adjective, etc.).
Chunking: Grouping tokens into larger structures (noun phrases, verb phrases) based on
defined grammar rules.
Chunk Grammar: Regular expressions defining patterns for identifying specific chunk
structures (like noun phrases).
Chunk Parser: Utilizing the chunk grammar to parse and extract chunks based on the
provided POS-tagged tokens.
The following algorithm outlines the steps involved in the noun phrase chunking process
using NLTK in Python, highlighting the key processes and the role of chunk grammar in
identifying and extracting specific syntactic structures from text data.
1
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

ALGORITHM:

1. Import Necessary Libraries: Import required modules from NLTK for tokenization,
POS tagging, and chunking.

2. Download NLTK Resources (if needed): Ensure NLTK resources like tokenizers and
POS taggers are downloaded (nltk.download('punkt'),
nltk.download('averaged_perceptron_tagger')).

3. Define a Sample Sentence: Set a sample sentence that will be used for chunking.

4. Tokenization: Break the sentence into individual words or tokens using NLTK's
word_tokenize() function.

5. Part-of-Speech (POS) Tagging: Tag each token with its corresponding part-of-speech
using NLTK's pos_tag() function.

6. Chunk Grammar Definition: Define a chunk grammar using regular expressions to

identify noun phrases (NP). For example, NP: {<DT>?<JJ>*<NN>} captures
sequences with optional determiners (DT), adjectives (JJ), and nouns (NN) as noun
phrases.

7. Chunk Parser Creation: Create a chunk parser using RegexpParser() and provide the
defined chunk grammar.

8. Chunking: Parse the tagged sentence using the created chunk parser to extract chunks
based on the defined grammar.

9. Display Chunks: Iterate through the parsed chunks and print the subtrees labeled as
'NP', which represent the identified noun phrases.
2
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

PROGRAM:
!pip install nltk
import nltk
from nltk import RegexpParser
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

# Download NLTK resources (run only once if not downloaded)

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Sample sentence
sentence = "The quick brown fox jumps over the lazy dog"

# Tokenize the sentence

tokens = word_tokenize(sentence)

# POS tagging
tagged = pos_tag(tokens)

# Define a chunk grammar using regular expressions

# NP (noun phrase) chunking: "NP: {<DT>?<JJ>*<NN>}"
# This grammar captures optional determiner (DT), adjectives (JJ), and nouns (NN) as a noun
phrase
chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>}
"""

# Create a chunk parser with the defined grammar

chunk_parser = RegexpParser(chunk_grammar)
3
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

# Parse the tagged sentence to extract chunks
chunks = chunk_parser.parse(tagged)

# Display the chunks

for subtree in chunks.subtrees():
if subtree.label() == 'NP': # Print only noun phrases
print(subtree)

OUTPUT:
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk)
(8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk)
(1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages
(from nltk) (2023.6.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk)
(4.66.1)
(NP The/DT quick/JJ brown/NN)
(NP fox/NN)
(NP the/DT lazy/JJ dog/NN)
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
4
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 6: CASE STUDY

AIM: Parts of Speech Tagging

Problem Statement:
An online news aggregator wants to improve its recommendation system by analyzing the
content of news articles. To achieve this, they need to perform parts of speech tagging on the
article text to extract relevant information such as key topics, sentiments, and entities
mentioned.
Objectives :
1. Develop a parts of speech tagging system to analyze the content of news articles.
2. Extract key information such as nouns, verbs, adjectives, and other parts of speech to
understand the structure of the articles.
3. Enhance the recommendation system by incorporating the extracted information to
provide more accurate and personalized recommendations to users.

Dataset:
The dataset consists of a collection of news articles in text format. Each article is labeled with
its category (e.g., politics, sports, entertainment) and contains textual content for analysis.

Approach:
1. Preprocess the dataset by tokenizing the text into words and sentences.
2. Perform parts of speech tagging using a pre-trained model or a custom-trained model.
3. Extract relevant parts of speech such as nouns, verbs, adjectives, and adverbs from the
tagged text.
4. Analyze the distribution of different parts of speech across the articles to understand
their linguistic characteristics.
5. Integrate the extracted information into the recommendation system to improve the
relevance of recommended articles for users.
Program :
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

# Download NLTK resources (if not already downloaded)

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

def pos_tagging(text):
sentences = sent_tokenize(text)
tagged_tokens = []
for sentence in sentences:
tokens = word_tokenize(sentence)
tagged_tokens.extend(nltk.pos_tag(tokens))
return tagged_tokens

def main():
# Example news article
article_text = """
Manchester United secured a 3-1 victory over Chelsea in yesterday's
match.
Goals from Rashford, Greenwood, and Fernandes sealed the win for
United.
Chelsea's only goal came from Pulisic in the first half.
The victory boosts United's chances in the Premier League title
race.
"""

tagged_tokens = pos_tagging(article_text)
print("Original Article Text:\n", article_text)
print("\nParts of Speech Tagging:")
for token, pos_tag in tagged_tokens:
print(f"{token}: {pos_tag}")

if __name__ == "__main__":
main()

Output:
Original Article Text:

Manchester United secured a 3-1 victory over Chelsea in yesterday's match.

Goals from Rashford, Greenwood, and Fernandes sealed the win for United.
Chelsea's only goal came from Pulisic in the first half.
The victory boosts United's chances in the Premier League title race.

Parts of Speech Tagging:

Manchester: NNP
United: NNP
secured: VBD
a: DT
3-1: JJ
victory: NN
over: IN
Chelsea: NNP
in: IN
yesterday: NN
's: POS
match: NN
.: .
Goals: NNS
from: IN
Rashford: NNP
,: ,
Greenwood: NNP
,: ,
and: CC
Fernandes: NNP
sealed: VBD
the: DT
win: NN
for: IN
United: NNP
.: .
Chelsea: NN
's: POS
only: JJ
goal: NN
came: VBD
from: IN
Pulisic: NNP
in: IN
the: DT
first: JJ
half: NN
.: .
The: DT
victory: NN
boosts: VBZ
United: NNP
's: POS
chances: NNS
in: IN
the: DT
Premier: NNP
League: NNP
title: NN
race: NN
.: .
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
Result:
This program demonstrates the parts of speech tagging process on a sample news
article. Each word in the article is followed by its corresponding part of speech tag. This
information can be further utilized for analysis and decision-making in the recommendation
system.
SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 7: CASE STUDY

AIM: The aim of this case study is to demonstrate the extraction of noun phrases from a
given text using chunking, a technique in Natural Language Processing (NLP). We will
utilize Python's NLTK library to implement chunking and extract meaningful noun phrases
from the text.
Problem Statement:
Given a sample text, our goal is to identify and extract noun phrases, which are
sequences of words containing a noun and optionally other words like adjectives or
determiners. The problem involves implementing a program that tokenizes the text, performs
part-of-speech tagging, applies chunking to identify noun phrases, and finally outputs the
extracted noun phrases.
Objectives :
1. Tokenize the input text into words.
2. Perform part-of-speech tagging to assign grammatical tags to each word.
3. Define a chunk grammar to identify noun phrases.
4. Apply chunking to extract noun phrases from the text.
5. Display the extracted noun phrases.
Dataset:
For this case study, we will use a sample text: "The quick brown fox jumps over the lazy
dog."
Approach:
The approach involves several steps to extract noun phrases from the given text using
chunking in Natural Language Processing (NLP). Firstly, the input text is tokenized into
individual words to prepare it for further processing. Following tokenization, each word is
tagged with its part-of-speech using NLTK's pos_tag function, which assigns grammatical
tags to each word based on its context. Next, a chunk grammar is defined to specify the
patterns that identify noun phrases. This grammar is then utilized to apply chunking, which
groups consecutive words that match the defined patterns into noun phrases. Finally, the
extracted noun phrases are outputted, providing meaningful insights into the structure and
content of the text. This approach allows for the identification and extraction of important
linguistic units, facilitating various NLP tasks such as information extraction, text
summarization, and sentiment analysis.

Program :
import nltk
import os

# Set NLTK data path

nltk.data.path.append("/usr/local/share/nltk_data")

# Download the 'punkt' tokenizer model

nltk.download('punkt')

# Download the 'averaged_perceptron_tagger' model

nltk.download('averaged_perceptron_tagger')

# Sample text
text = "The quick brown fox jumps over the lazy dog."

# Tokenize the text into words

words = nltk.word_tokenize(text)

# Perform part-of-speech tagging

pos_tags = nltk.pos_tag(words)

# Define chunk grammar

chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>} # Chunk sequences of DT, JJ, NN
"""

# Create chunk parser

chunk_parser = nltk.RegexpParser(chunk_grammar)

# Apply chunking
chunked_text = chunk_parser.parse(pos_tags)

# Extract noun phrases

noun_phrases = []
for subtree in chunked_text.subtrees(filter=lambda t: t.label() ==
'NP'):
noun_phrases.append(' '.join(word for word, tag in
subtree.leaves()))

# Output
print("Original Text:", text)
print("Noun Phrases:")
for phrase in noun_phrases:
print("-", phrase)

Output:
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
Original Text: The quick brown fox jumps over the lazy dog.
Noun Phrases:
- The quick brown
- fox
- the lazy dog

Result:
Chunking is a valuable technique in NLP for identifying and extracting meaningful
phrases from text. In this case study, we successfully implemented chunking using Python's
NLTK library to extract noun phrases from a given text. By identifying and extracting noun
phrases, we gained insights into the structure and semantics of the text, which can be
beneficial for various NLP applications such as information extraction, sentiment analysis,
and text summarization.
Lab 2 – Case Study

Aim: Autocomplete System for Email Composition

Problem Statement:
A software company wants to develop an intelligent autocomplete system for email composition.
The goal is to assist users in generating coherent and contextually appropriate sentences while
composing emails. The system should predict the next word or phrase based on the user's input
and the context of the email.

Objectives:

The objectives of the provided program are to implement a simple email autocomplete system using the
GPT-2 language model. The program aims to facilitate user interaction by suggesting autocompletions
based on the context provided and the user's input. Key objectives include initializing and integrating
the GPT-2 model and tokenizer from the Hugging Face Transformers library, defining a class structure
(EmailAutocompleteSystem) to encapsulate the autocomplete system, and creating a method
(generate_suggestions) to generate context-aware suggestions. The program encourages user
engagement by incorporating a user input loop, allowing continuous interaction until the user chooses
to exit. The ultimate goal is to demonstrate the practical use of a pre-trained language model for
generating relevant suggestions in the context of email composition, showcasing the capabilities of the
GPT-2 model for natural language processing tasks.

Approach:
1. Data Collection:
 Collect a diverse dataset of emails, including different writing styles, topics, and
formality levels.
 Annotate the dataset with proper context information, such as sender, recipient,
subject, and the body of the email.
2. Data Preprocessing:
 Clean and tokenize the text data.
 Handle issues like punctuation, capitalization, and special characters.
 Split the dataset into training and testing sets.
3. Model Selection:
 Choose a suitable NLP model for word generation. Options may include recurrent
neural networks (RNNs), long short-term memory networks (LSTMs), or
transformer models like GPT-3.
 Fine-tune or train the model on the email dataset to understand the specific
language patterns used in emails.
4. Context Integration:
 Design a mechanism to incorporate contextual information from the email, such
as the subject, previous sentences, and the relationship between the sender and
recipient.
 Implement a way for the model to understand the context shift within the email
body.
5. User Interface:
 Develop a user-friendly interface that integrates with popular email clients or
standalone applications.
 Allow users to enable or disable the autocomplete feature as needed.
 Provide visual cues to indicate suggested words or phrases.
6. Model Evaluation:
 Evaluate the model's performance on the test dataset using metrics like perplexity,
accuracy, and precision.
 Gather user feedback on the effectiveness and usability of the autocomplete
system.
7. Fine-Tuning and Iteration:
 Analyze user feedback and performance metrics to identify areas for
improvement.
 Consider refining the model based on user suggestions and addressing any
limitations.
8. Deployment:
 Deploy the trained model as a service that can be accessed by the email
application.
 Ensure scalability and reliability of the autocomplete system.
Potential Challenges:
 Context Understanding: Ensuring the model effectively understands and incorporates
the context of the email.
 Ambiguity Handling: Dealing with ambiguous phrases and understanding the user's
intended meaning.
 Personalization: Tailoring the system to individual writing styles and preferences.
Success Criteria:
 Improved email composition efficiency and speed.
 Positive user feedback on the accuracy and relevance of autocomplete suggestions.
 Reduction in typing errors and improved overall user experience.
By successfully developing and implementing this word generation program, the company aims
to enhance the productivity and user experience of individuals engaged in email communication.

Program :
!pip install transformers
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

class EmailAutocompleteSystem:
def __init__(self):
self.model_name = "gpt2"
self.tokenizer = GPT2Tokenizer.from_pretrained(self.model_name)
self.model = GPT2LMHeadModel.from_pretrained(self.model_name)

def generate_suggestions(self, user_input, context):

input_text = f"{context} {user_input}"
input_ids = self.tokenizer.encode(input_text, return_tensors="pt")

with torch.no_grad():
output = self.model.generate(input_ids, max_length=50, num_return_sequences=1,
no_repeat_ngram_size=2)

generated_text = self.tokenizer.decode(output[0], skip_special_tokens=True)

suggestions = generated_text.split()[len(user_input.split()):]
return suggestions

# Example usage
if __name__ == "__main__":
autocomplete_system = EmailAutocompleteSystem()

# Assume user is composing an email with some context

email_context = "Subject: Discussing Project Proposal\nHi [Recipient],"

while True:
user_input = input("Enter your sentence (type 'exit' to end): ")

if user_input.lower() == 'exit':
break

suggestions = autocomplete_system.generate_suggestions(user_input, email_context)

if suggestions:
print("Autocomplete Suggestions:", suggestions)
else:
print("No suggestions available.")

Output:
Enter your sentence (type 'exit' to end): hello, how are you ? How's
everything going on !
The attention mask and the pad token id were not set. As a consequence, you
may observe unexpected behavior. Please pass your input's `attention_mask` to
obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Autocomplete Suggestions: ["How's", 'everything', 'going', 'on!', "I'm", 'a',
'programmer', 'and', "I've", 'been', 'working', 'on', 'a', 'project', 'for',
'a', 'while', 'now.', 'I', 'have', 'a', 'lot', 'of', 'ideas', 'for', 'the']
Enter your sentence (type 'exit' to end): exit

Result:
The result demonstrates the integration of a powerful language model for enhancing user experience in
composing emails through intelligent autocomplete suggestions.
SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 3: TEXT CLASSIFICATION

AIM: To perform Text classification using python and scikit-learn

PROCEDURE:
This algorithm outlines the steps involved in the text classification task using
LinearSVC on the 20 Newsgroups dataset. It provides a structured approach to implementing
the program and understanding the workflow.
ALGORITHM:
Algorithm: Text Classification using LinearSVC

1. Load the 20 Newsgroups dataset with specified categories.

- Import the necessary libraries: fetch_20newsgroups from sklearn.datasets.
- Specify the categories of interest for classification.
- Use fetch_20newsgroups to load the dataset for both training and testing sets.

2. Split the dataset into training and testing sets.

- Import train_test_split from sklearn.model_selection.
- Split the dataset into X_train, X_test, y_train, and y_test.

3. Create a pipeline for text classification.

- Import make_pipeline from sklearn.pipeline.
- Create a pipeline with TF-IDF Vectorizer and LinearSVC classifier.

4. Train the model on the training data.

- Call the fit method on the pipeline with X_train and y_train as input.

5. Predict labels for the testing data.

- Use the trained model to predict labels for X_test.

6. Evaluate the model's performance.

- Calculate accuracy_score to measure the accuracy of the model.
- Print classification_report to see precision, recall, and F1-score for each class.

End Algorithm
PROGRAM:
# Install scikit-learn if not already installed
!pip install scikit-learn

# Import necessary libraries

import pandas as pd

from sklearn.datasets import fetch_20newsgroups

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import make_pipeline
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load the 20 Newsgroups dataset

categories = ['sci.med', 'sci.space', 'comp.graphics', 'talk.politics.mideast']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)

# Split the data into training and testing sets

X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

# Create a pipeline with TF-IDF vectorizer and LinearSVC classifier

model = make_pipeline(
TfidfVectorizer(),
LinearSVC()
)

# Train the model

model.fit(X_train, y_train)

# Predict labels for the test set

predictions = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, predictions))

Accuracy: 0.9504823151125402
Classification Report:
precision recall f1-score support

0 0.89 0.97 0.93 389

1 0.96 0.91 0.94 396
2 0.98 0.94 0.96 394
3 0.98 0.98 0.98 376

accuracy 0.95 1555

macro avg 0.95 0.95 0.95 1555
weighted avg 0.95 0.95 0.95 1555
SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 3: CASE STUDY

AIM: Customer Support Email Classification

Problem Statement:
A customer support company receives a large volume of incoming emails from customers
with various inquiries, complaints, and feedback. Manually categorizing and prioritizing
these emails is time-consuming and inefficient. The company wants to develop a text
classification system to automatically classify incoming emails into predefined categories,
allowing for faster response times and better customer service.

Objectives :
 The text classification system successfully categorizes incoming customer emails into
predefined categories.
 It improves the efficiency of the customer support team by automating email
classification and prioritization.
 The company can respond to customer inquiries and issues more promptly, leading to
higher customer satisfaction and retention.

Dataset:
The company has a dataset of past customer emails along with their corresponding
categories. Each email is labeled with one or more categories, indicating the type of inquiry
or issue raised by the customer. For demonstration purposes, we will use the
fetch_20newsgroups dataset from scikit-learn, which contains a collection of newsgroup
documents, spanning 20 different newsgroups. We'll simulate this dataset as if it were
customer support emails categorized into predefined categories.

Approach:
Data Preparation:
 Load the 20 Newsgroups dataset as a proxy for customer support emails.
 Select a subset of categories that represent different types of customer inquiries,
complaints, and feedback.
 Prepare the data and target labels from the dataset.
Data Preprocessing:
 Clean the email text data by removing unnecessary information such as email headers,
signatures, and HTML tags.
 Tokenize the text and convert it to lowercase.
 Remove stopwords and apply techniques like stemming or lemmatization to reduce
words to their base forms.
Feature Extraction:
Use TF-IDF Vectorizer to convert text data into numerical features, limiting the maximum
number of features to 10,000 and removing English stopwords.
Model Selection:
 Choose a suitable classification algorithm such as Linear Support Vector Classifier
(LinearSVC) for text classification.
 Train the chosen model on the training data.
Model Evaluation:
 Predict labels for the test set using the trained model.
 Evaluate the classifier's performance using accuracy and a classification report, which
includes precision, recall, and F1-score for each category.
Future Enhancements:
 Continuous monitoring and updating of the model to adapt to evolving customer
inquiries and language patterns.
 Integration of sentiment analysis to assess the sentiment of customer emails and
prioritize urgent or critical issues.
 Expansion of the model to handle multiclass classification and a wider range of
customer inquiry categories.
Program :
# Import necessary libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score, classification_report
# Load the 20 Newsgroups dataset as a proxy for customer support emails
newsgroups = fetch_20newsgroups(subset='all', categories=['comp.sys.ibm.pc.hardware',
'comp.sys.mac.hardware', 'rec.autos', 'rec.motorcycles', 'sci.electronics'])

# Prepare data and target labels

X = newsgroups.data
y = newsgroups.target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create TF-IDF vectorizer

vectorizer = TfidfVectorizer(stop_words='english', max_features=10000)
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)

# Train the LinearSVC classifier

classifier = LinearSVC()
classifier.fit(X_train, y_train)

# Predict labels for the test set

predictions = classifier.predict(X_test)

# Evaluate the classifier

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, predictions, target_names=newsgroups.target_names))
Output:
Accuracy: 0.9389623601220752

Classification Report:
precision recall f1-score support

comp.sys.ibm.pc.hardware 0.92 0.91 0.91 212

comp.sys.mac.hardware 0.94 0.93 0.94 198
rec.autos 0.97 0.93 0.95 179
rec.motorcycles 0.96 0.99 0.97 205
sci.electronics 0.92 0.93 0.92 189

accuracy 0.94 983

macro avg 0.94 0.94 0.94 983
weighted avg 0.94 0.94 0.94 983

Result:
This case study outlines the problem statement, dataset, approach, expected outcome,
and future enhancements for developing a text classification system for customer support
email classification. It demonstrates the application of machine learning techniques to
automate and improve customer service processes.
SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 4: SEMANTIC ANALYSIS

AIM: To perform Semantic Analysis using Gensim

PROCEDURE:
Semantic analysis is a broad area in NLP. This program demonstrates semantic analysis by
leveraging pre-trained word vectors using Word2Vec from Gensim. It utilizes word
embeddings to find words similar to each word in the provided sentences.
Library Installation: Ensure the necessary libraries (Gensim and NLTK) are installed.
Library Import: Import the required libraries (gensim for word vectors and nltk for
tokenization).
Pre-trained Word Vectors: Load pre-trained word vectors (Word2Vec) using Gensim's
api.load() method.
Sample Sentences: Define sample sentences for semantic analysis.
Tokenization: Break down the sentences into individual words using NLTK's word_tokenize()
method.
Semantic Analysis: Iterate through each word in the tokenized sentences and:
Check if the word exists in the pre-trained Word2Vec model.
If the word exists, find similar words using the most_similar() method from the
word vectors model.
Display or store the similar words for each word in the sentence.
If the word doesn't exist in the pre-trained model, indicate that it's not present.
The following algorithm outlines the steps involved in performing semantic analysis using pre-
trained word vectors (Word2Vec) in Python, demonstrating how to find similar words for each
word in the provided sentences based on the loaded word vectors.
1
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

ALGORITHM:
1. Install Necessary Libraries: Install Gensim and NLTK libraries (!pip install gensim,
!pip install nltk).
2. Import Libraries: Import required libraries: gensim for word vectors and nltk for
tokenization.
3. Download Pre-trained Word Vectors: Download pre-trained word vectors (Word2Vec)
using Gensim's api.load() method.
4. Define Sample Sentences: Create sample sentences for semantic analysis.
5. Tokenization: Tokenize the sentences into words using NLTK's word_tokenize()
method.
6. Semantic Analysis with Word Vectors: Iterate through each tokenized sentence.
For each word in the sentence:
Check if the word exists in the pre-trained Word2Vec model.
If the word exists:
Find words similar to the current word using word_vectors.most_similar(word).
Display or store the similar words.
If the word doesn't exist in the model:
Print a message indicating that the word is not in the pre-trained model.

PROGRAM:
# Install necessary libraries
!pip install gensim
!pip install nltk

# Import required libraries

import gensim.downloader as api
from nltk.tokenize import word_tokenize

# Download pre-trained word vectors (Word2Vec)

word_vectors = api.load("word2vec-google-news-300")

# Sample sentences
sentences = [
"Natural language processing is a challenging but fascinating field.",
"Word embeddings capture semantic meanings of words in a vector space."
2
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

# Tokenize sentences
tokenized_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]

# Perform semantic analysis using pre-trained word vectors

for tokenized_sentence in tokenized_sentences:
for word in tokenized_sentence:
if word in word_vectors:
similar_words = word_vectors.most_similar(word)
print(f"Words similar to '{word}': {similar_words}")
else:
print(f"'{word}' is not in the pre-trained Word2Vec model.")

OUTPUT:
Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (4.3.2)
Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.10/dist-packages
(from gensim) (1.23.5)
Requirement already satisfied: scipy>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from
gensim) (1.11.3)
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages
(from gensim) (6.4.0)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk)
(8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk)
(1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages
(from nltk) (2023.6.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk)
(4.66.1)
[==================================================] 100.0%
1662.8/1662.8MB downloaded
3

Words similar to 'natural': [('Splittorff_lacked', 0.636509358882904), ('Natural',

Page

0.58078932762146), ('Mike_Taugher_covers', 0.577259361743927), ('manmade',

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

0.5276211500167847), ('shell_salted_pistachios', 0.5084421634674072), ('unnatural',
0.5030758380889893), ('naturally', 0.49992606043815613), ('Intraparty_squabbles',
0.4988228678703308), ('Burt_Bees_®', 0.49539363384246826), ('causes_Buxeda',
0.4935200810432434)]
Words similar to 'language': [('langauge', 0.7476695775985718), ('Language',
0.6695356369018555), ('languages', 0.6341332197189331), ('English',
0.6120712757110596), ('CMPB_Spanish', 0.6083104610443115), ('nonnative_speakers',
0.6063109636306763), ('idiomatic_expressions', 0.5889801979064941), ('verb_tenses',
0.58415687084198), ('Kumeyaay_Diegueno', 0.5798824429512024), ('dialect',
0.5724600553512573)]
Words similar to 'processing': [('Processing', 0.7285515666007996), ('processed',
0.6519132852554321), ('processor', 0.636760413646698), ('warden_Dominick_DeRose',
0.6166526675224304), ('processors', 0.5953895449638367),
('Discoverer_Enterprise_resumed', 0.5376213192939758), ('LSI_Tarari',
0.520267903804779), ('processer', 0.5166687369346619), ('remittance_processing',
0.5144169926643372), ('Farmland_Foods_pork', 0.5071728825569153)]
Words similar to 'is': [('was', 0.6549733281135559), ("isn'ta", 0.6439523100852966), ('seems',
0.634029746055603), ('Is', 0.6085968613624573), ('becomes', 0.5841935276985168),
('appears', 0.5822900533676147), ('remains', 0.5796942114830017), ('іѕ',
0.5695518255233765), ('makes', 0.5567088723182678), ('isn_`_t', 0.5513144135475159)]
'a' is not in the pre-trained Word2Vec model.
Words similar to 'challenging': [('difficult', 0.6388775110244751), ('challenge',
0.5953003764152527), ('daunting', 0.569800615310669), ('tough', 0.5689979791641235),
('challenges', 0.5471934676170349), ('challenged', 0.5449535846710205), ('Challenging',
0.5242965817451477), ('tricky', 0.5236554741859436), ('toughest', 0.5169045329093933),
('diffi_cult', 0.5010539889335632)]
Words similar to 'but': [('although', 0.8104525804519653), ('though', 0.7285684943199158),
('because', 0.7225914597511292), ('so', 0.6865807771682739), ('But', 0.6826984882354736),
('Although', 0.6188263297080994), ('Though', 0.6153667569160461), ('Unfortunately',
0.6031029224395752), ('Of_course', 0.593142032623291), ('anyway', 0.5869061350822449)]
Words similar to 'fascinating': [('interesting', 0.7623067498207092), ('intriguing',
0.7245113253593445), ('enlightening', 0.6644250154495239), ('captivating',
0.6459898352622986), ('facinating', 0.6416683793067932), ('riveting',
0.6324825286865234), ('instructive', 0.6210989356040955), ('endlessly_fascinating',
0.6188612580299377), ('revelatory', 0.6170244216918945), ('engrossing',
0.6126049160957336)]
Words similar to 'field': [('fields', 0.5582526326179504), ('fi_eld', 0.5188260078430176),
('Keith_Toogood', 0.49749255180358887), ('Mackenzie_Hoambrecker',
0.49514278769493103), ('Josh_Arauco_kicked', 0.48817265033721924), ('Nick_Cattoi',
0.4863145053386688), ('Armando_Cuko', 0.4853871166706085), ('Jon_Striefsky',
0.48322004079818726), ('kicker_Nico_Grasu', 0.47572532296180725),
4
Page

('Chris_Manfredini_kicked', 0.47327715158462524)]

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

'.' is not in the pre-trained Word2Vec model.
Words similar to 'word': [('phrase', 0.6777030825614929), ('words', 0.5864380598068237),
('verb', 0.5517287254333496), ('Word', 0.54575115442276), ('adjective',
0.5290762186050415), ('cuss_word', 0.5272089242935181), ('colloquialism',
0.5160348415374756), ('noun', 0.5129537582397461), ('astrology_#/##/##',
0.5039082765579224), ('synonym', 0.49379870295524597)]
'embeddings' is not in the pre-trained Word2Vec model.
Words similar to 'capture': [('capturing', 0.7563897371292114), ('captured',
0.7155306935310364), ('captures', 0.6099075078964233), ('Capturing',
0.6023245453834534), ('recapture', 0.5498639941215515), ('Capture', 0.5493018627166748),
('nab', 0.4941576421260834), ('Captured', 0.45745959877967834), ('apprehend',
0.4357919692993164), ('seize', 0.4338296055793762)]
Words similar to 'semantic': [('semantics', 0.6644964814186096), ('Semantic',
0.6464474201202393), ('contextual', 0.5909127593040466), ('meta', 0.5905876755714417),
('ontology', 0.5880525708198547), ('Semantic_Web', 0.5612248778343201), ('semantically',
0.5600483417510986), ('microformat', 0.5582399368286133), ('inferencing',
0.5541478991508484), ('terminological', 0.5533202290534973)]
Words similar to 'meanings': [('grammatical_constructions', 0.594986081123352), ('idioms',
0.5938195586204529), ('connotations', 0.5836683511734009), ('symbolic_meanings',
0.5806494951248169), ('meaning', 0.5785343647003174), ('literal_meanings',
0.5743482112884521), ('denotative', 0.5730364918708801), ('phrasal_verbs',
0.5697917342185974), ('contexts', 0.5609514713287354), ('adjectives_adverbs',
0.5569407343864441)]
'of' is not in the pre-trained Word2Vec model.
Words similar to 'words': [('phrases', 0.7100036144256592), ('phrase', 0.6408688426017761),
('Words', 0.6160537600517273), ('word', 0.5864380598068237), ('adjectives',
0.5812757015228271), ('uttered', 0.5724518299102783), ('plate_umpire_Tony_Randozzo',
0.5642045140266418), ('expletives', 0.5539036989212036), ('Mayor_Cirilo_Pena',
0.553884744644165), ('Tele_prompter', 0.5441114902496338)]
Words similar to 'in': [('inthe', 0.5891957879066467), ('where', 0.5662435293197632), ('the',
0.5429296493530273), ('In', 0.5415117144584656), ('during', 0.5188906192779541), ('iin',
0.48737412691116333), ('at', 0.484235554933548), ('from', 0.48268404603004456),
('outside', 0.47092658281326294), ('for', 0.4566476047039032)]
'a' is not in the pre-trained Word2Vec model.
Words similar to 'vector': [('vectors', 0.750322163105011), ('adeno_associated_viral_AAV',
0.5999537110328674), ('bitmap_graphics', 0.5428463220596313), ('Sindbis',
0.5353653430938721), ('bitmap_images', 0.5318013429641724), ('signal_analyzer_VSA',
0.5276671051979065), ('analyzer_VNA', 0.5184376239776611), ('vectorial',
0.5084835886955261), ('nonviral_gene_therapy', 0.5036363005638123), ('shellcode',
5

0.5015827417373657)]
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Words similar to 'space': [('spaces', 0.6570690870285034), ('music_concept_ShockHound',
0.5850345492362976), ('Shuttle_docks', 0.5566749572753906), ('Space',
0.5478203296661377), ('Soviet_Union_Yuri_Gagarin', 0.5417766571044922),
('Shuttle_Discovery_blasts', 0.5352603197097778), ('Shuttle_Discovery_docks',
0.534925103187561), ('Shuttle_Endeavour_undocks', 0.532420814037323),
('Shuttle_Discovery_arrives', 0.5323426723480225), ('Shuttle_undocks',
0.523307740688324)]
'.' is not in the pre-trained Word2Vec model.

6
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 4: CASE STUDY

AIM: Enhancing Customer Service with Semantic Analysis

Problem Statement:
A multinational e-commerce company, "E-Shop Inc.," is looking to improve its customer
service operations by leveraging advanced natural language processing techniques. They have
a vast repository of customer interactions, including emails, chat transcripts, and social media
messages. E-Shop Inc. aims to implement semantic analysis to better understand customer
queries and sentiment, ultimately enhancing the overall customer experience..
Objectives :
 Semantic Analysis: The primary objective is to perform semantic analysis on customer
queries to understand the underlying meaning and extract relevant information. By
identifying synonyms and related terms, the program aims to capture the semantic
nuances of the input text.
 Improving Customer Service: The program aims to enhance customer service
operations by providing insights into customer queries. By analyzing the semantics of
the queries, the program can help identify common issues, extract key information, and
facilitate more effective responses.
Dataset:
In the provided program, there isn't a specific dataset used for semantic analysis. Instead, the
program demonstrates a basic approach to perform semantic analysis on a set of example
customer queries. However, in a real-world scenario, the dataset used for semantic analysis
could consist of a collection of text data relevant to the domain of interest, such as customer
support tickets, product reviews, social media interactions, or any other type of textual data
where semantic analysis is applicable.
Approach:
 Tokenization: The program starts by tokenizing the input text into individual words or
tokens. Tokenization is a fundamental step in natural language processing (NLP) for
breaking down text into its constituent parts.
 Stopword Removal: Stopwords, such as "is", "the", "and", etc., are removed from the
tokens to filter out irrelevant words that do not carry much semantic meaning.
 Lemmatization: The program lemmatizes the remaining tokens to reduce them to their
base or dictionary form. Lemmatization helps in normalizing words and reducing
inflectional forms to a common base, improving the accuracy of semantic analysis.
 Synonym Generation: Using the WordNet database, the program retrieves synonyms
for each lemmatized token. WordNet is a lexical database of the English language that
provides semantic relationships between words, including synonyms, hypernyms,
hyponyms, etc.
 Output Generation: Finally, the program outputs the synonyms generated for each
customer query, providing insights into the semantic content of the queries.
Program :
import nltk
from nltk.corpus import wordnet
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# Initialize NLTK resources

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Function to perform semantic analysis

def semantic_analysis(text):
# Tokenize text
tokens = word_tokenize(text)

# Remove stopwords
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in
stop_words]

# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(token) for token in
filtered_tokens]

# Synonyms generation
synonyms = set()
for token in lemmatized_tokens:
for syn in wordnet.synsets(token):
for lemma in syn.lemmas():
synonyms.add(lemma.name())

return list(synonyms)

# Example customer queries

customer_queries = [
"I received a damaged product. Can I get a refund?",
"I'm having trouble accessing my account.",
"How can I track my order status?",
"The item I received doesn't match the description.",
"Is there a discount available for bulk orders?"
]

# Semantic analysis for each query

for query in customer_queries:
print("Customer Query:", query)
synonyms = semantic_analysis(query)
print("Semantic Analysis (Synonyms):", synonyms)
print("\n")

Output:
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
Customer Query: I received a damaged product. Can I get a refund?
Semantic Analysis (Synonyms): ['refund', 'grow', 'baffle', 'pay_off', 'Cartesian_product',
'arrive', 'engender', 'standard', 'have', 'damaged', 'experience', 'develop', 'sustain', 'product',
'acquire', 'encounter', 'take_in', 'find', 'stupefy', 'bugger_off', 'draw', 'pose', 'aim', 'nonplus',
'induce', 'mother', 'stimulate', 'make', 'repayment', 'convey', 'cause', 'mathematical_product',
'get', 'damage', 'produce', 'set_out', 'merchandise', 'buzz_off', 'beat', 'meet', 'start', 'commence',
'return', 'pick_up', 'production', 'fix', 'stick', "get_under_one's_skin", 'go', 'mystify', 'take',
'perplex', 'welcome', 'vex', 'begin', 'come', 'fuck_off', 'bring', 'contract', 'capture', 'generate',
'give_back', 'incur', 'repay', 'let', 'become', 'start_out', 'gravel', 'scram', 'obtain', 'pay_back',
'amaze', 'catch', 'beget', 'get_down', 'set_about', 'invite', 'bring_forth', 'drive', 'sire', 'intersection',
'discredited', 'suffer', 'received', 'ware', 'dumbfound', 'fetch', 'father', 'arrest', 'flummox', 'puzzle',
'bewilder', 'receive']

Customer Query: I'm having trouble accessing my account.

Semantic Analysis (Synonyms): ['write_up', 'report', 'invoice', 'trouble', 'describe', 'answer_for',
'access', 'pain', 'get_at', 'distract', 'disorder', 'unhinge', 'account_statement', 'calculate',
'disoblige', 'fuss', 'bill', 'disquiet', 'inconvenience', 'incommode', 'news_report', 'disturb',
'explanation', 'problem', 'perturb', 'cark', 'account', 'accounting', 'business_relationship', 'score',
'bother', 'history', 'story', 'difficulty', 'worry', 'inconvenience_oneself', 'hassle', 'chronicle',
'discommode', 'ail', 'put_out', 'upset', 'trouble_oneself']
Customer Query: How can I track my order status?
Semantic Analysis (Synonyms): ['cover', 'rank', 'cart_track', 'purchase_order', 'pass_over',
'parliamentary_law', 'cartroad', 'status', 'get_across', 'traverse', 'prescribe', 'order', 'orderliness',
'rails', 'fiat', 'gild', 'regularise', 'runway', 'govern', 'ordination', 'give_chase', 'put', 'cut_across',
'chase', 'consecrate', 'cut', 'grade', 'social_club', 'order_of_magnitude', 'path', 'rules_of_order',
'caterpillar_tread', 'tail', 'Holy_Order', 'cross', 'data_track', 'monastic_order', 'rate', 'go_after',
'say', 'edict', 'regularize', 'Order', 'caterpillar_track', 'parliamentary_procedure', 'cut_through',
'rail', 'enjoin', 'course', 'racecourse', 'arrange', 'club', 'society', 'ordinate', 'set_up', 'rescript',
'chase_after', 'place', 'dictate', 'tell', 'range', 'decree', 'regulate', 'lodge', 'condition', 'track',
'raceway', 'ordain', 'racetrack', 'get_over', 'lead', 'guild', 'running', 'tag', 'ordering', 'position',
'trail', 'dog']

Customer Query: The item I received doesn't match the description.

Semantic Analysis (Synonyms): ['point', 'mate', 'catch', 'equalize', 'detail', 'touch', 'invite',
'welcome', 'get', 'standard', 'jibe', 'description', 'verbal_description', 'gibe', 'have', 'rival', 'equal',
'pit', 'experience', 'fit', 'check', 'item', 'mates', 'correspond', 'oppose', 'pair', 'lucifer', 'received',
'peer', 'cope_with', 'encounter', 'play_off', 'meet', 'take_in', 'friction_match', 'find', 'particular',
'equate', 'match', 'couple', 'equalise', 'pick_up', 'agree', 'incur', 'compeer', 'twin', 'tally', 'obtain',
'token', 'receive']

Customer Query: Is there a discount available for bulk orders?

Semantic Analysis (Synonyms): ['tell', 'put', 'rank', 'parliamentary_procedure',
'purchase_order', 'consecrate', 'majority', 'range', 'useable', 'decree', 'parliamentary_law',
'regulate', 'price_reduction', 'mass', 'dictate', 'social_club', 'lodge', 'usable', 'enjoin', 'discount',
'grade', 'order_of_magnitude', 'brush_off', 'rules_of_order', 'rebate', 'push_aside', 'prescribe',
'arrange', 'bulge', 'bank_discount', 'dismiss', 'club', 'ordain', 'order', 'society', 'bulk', 'ordinate',
'Holy_Order', 'brush_aside', 'orderliness', 'guild', 'set_up', 'fiat', 'gild', 'regularise',
'uncommitted', 'ordering', 'available', 'monastic_order', 'deduction', 'ordination', 'govern', 'rate',
'discount_rate', 'say', 'edict', 'regularize', 'rescript', 'disregard', 'ignore', 'volume', 'Order', 'place']

Result:
By following this approach, the program aims to achieve the objectives of performing
semantic analysis on customer queries and improving customer service operations by providing
valuable insights into the semantics of the queries.
SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB – 1 : CASE STUDY

AIM: To Enhance Customer Feedback Analysis through NLP-based Text Processing

PROBLEM STATEMENT:

A company receives a large volume of customer feedback across various channels such as
emails, social media, and surveys. Understanding and categorizing this feedback manually is
time-consuming and inefficient. The goal is to develop an NLP-based program to automatically
process and analyze customer feedback to extract valuable insights.

OBJECTIVE:

Utilize spaCy and NLP techniques to process customer feedback text, extract tokens, perform
lemmatization, and conduct dependency parsing to uncover underlying relationships between
words.

APPROACH:

Data Collection:
Gather a dataset containing customer feedback from different sources, including emails, social
media comments, and survey responses.

Text Processing with spaCy:

Utilize spaCy to process the customer feedback text. Extract tokens to identify individual words
and perform lemmatization to obtain their base forms.
1
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Dependency Parsing Analysis:
Use spaCy's dependency parsing feature to identify the syntactic relationships between words.
Analyze the dependency tree to understand how different parts of the feedback sentences are
connected.

Insight Generation:
Categorize the feedback based on sentiment, identify frequently occurring topics, or extract
key phrases related to specific issues or praises mentioned by customers.

Implementation:

Use Python and spaCy to develop the program for text processing and analysis.
Incorporate visualization techniques (e.g., graphs, word clouds) to represent the findings and
insights derived from the processed feedback.
Evaluation:

Evaluate the accuracy and efficiency of tokenization, lemmatization, and dependency parsing
in handling different types of customer feedback.
Measure the program's ability to extract meaningful insights and categorize feedback
accurately.
PROGRAM:
import spacy

# Load English tokenizer, tagger, parser, NER, and word vectors

nlp = spacy.load("en_core_web_sm")

# Sample customer feedback data

customer_feedback = [
"The product is amazing! I love the quality.",
"The customer service was terrible, very disappointed.",
"Great experience overall, highly recommended.",
2

"The delivery was late, very frustrating."

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

]

def analyze_feedback(feedback):
for idx, text in enumerate(feedback, start=1):
print(f"\nAnalyzing Feedback {idx}: '{text}'")
doc = nlp(text)

# Extract tokens and lemmatization

tokens = [token.text for token in doc]
lemmas = [token.lemma_ for token in doc]
print("Tokens:", tokens)
print("Lemmas:", lemmas)

# Dependency parsing
print("\nDependency Parsing:")
for token in doc:
print(token.text, token.dep_, token.head.text, token.head.pos_,
[child for child in token.children])

if __name__ == "__main__":
analyze_feedback(customer_feedback)

OUTPUT:
Analyzing Feedback 1: 'The product is amazing! I love the quality.'
Tokens: ['The', 'product', 'is', 'amazing', '!', 'I', 'love', 'the', 'quality', '.']
Lemmas: ['the', 'product', 'be', 'amazing', '!', 'I', 'love', 'the', 'quality', '.']

Dependency Parsing:
The det product NOUN []
3

product nsubj is AUX [The]

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

is ROOT is AUX [product, amazing, !]
amazing acomp is AUX []
! punct is AUX []
I nsubj love VERB []
love ROOT love VERB [I, quality, .]
the det quality NOUN []
quality dobj love VERB [the]
. punct love VERB []

Analyzing Feedback 2: 'The customer service was terrible, very disappointed.'

Tokens: ['The', 'customer', 'service', 'was', 'terrible', ',', 'very', 'disappointed', '.']
Lemmas: ['the', 'customer', 'service', 'be', 'terrible', ',', 'very', 'disappointed', '.']

Dependency Parsing:
The det service NOUN []
customer compound service NOUN []
service nsubj was AUX [The, customer]
was ROOT was AUX [service, disappointed, .]
terrible amod disappointed ADJ []
, punct disappointed ADJ []
very advmod disappointed ADJ []
disappointed acomp was AUX [terrible, ,, very]
. punct was AUX []

Analyzing Feedback 3: 'Great experience overall, highly recommended.'

Tokens: ['Great', 'experience', 'overall', ',', 'highly', 'recommended', '.']
Lemmas: ['great', 'experience', 'overall', ',', 'highly', 'recommend', '.']

Dependency Parsing:
4

Great amod experience NOUN []

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

experience nsubj recommended VERB [Great]
overall advmod recommended VERB []
, punct recommended VERB []
highly advmod recommended VERB []
recommended ROOT recommended VERB [experience, overall, ,, highly, .]
. punct recommended VERB []

Analyzing Feedback 4: 'The delivery was late, very frustrating.'

Tokens: ['The', 'delivery', 'was', 'late', ',', 'very', 'frustrating', '.']
Lemmas: ['the', 'delivery', 'be', 'late', ',', 'very', 'frustrating', '.']

Dependency Parsing:
The det delivery NOUN [ ]
delivery nsubj was AUX [The]
was ROOT was AUX [delivery, frustrating, .]
late advmod frustrating ADJ [ ]
, punct frustrating ADJ [ ]
very advmod frustrating ADJ [ ]
frustrating acomp was AUX [late, ,, very]
. punct was AUX [ ]

CONCLUSION:
The developed NLP-based program utilizing spaCy proves to be an efficient solution for
processing and analyzing customer feedback. Its capability to extract tokens, perform
lemmatization, and conduct dependency parsing aids in understanding the sentiment,
identifying key topics, and establishing relationships within the feedback data. This enables
companies to derive actionable insights, prioritize issues, and enhance customer satisfaction
based on the analysis of their feedback.

RESULT:
This case study demonstrates the practical application of the provided code snippet using
5

spaCy in a business context, specifically for customer feedback analysis, showcasing how
Page

NLP techniques can be employed to extract valuable insights from unstructured text data.

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

4/23/24, 2:20 PM nlp.ipynb - Colab

#1
import spacy
# Load English tokenizer, tagger, parser, NER, and word vectors
nlp = spacy.load("en_core_web_sm")
# Sample text for analysis
text = "Natural Language Processing is a fascinating field of study."
# Process the text with spaCy
doc = nlp(text)
# Extracting tokens and lemmatization
tokens = [token.text for token in doc]
lemmas = [token.lemma_ for token in doc]
print("Tokens:", tokens)
print("Lemmas:", lemmas)
# Dependency parsing
print("\nDependency Parsing:")
for token in doc:
print(token.text, token.dep_, token.head.text, token.head.pos_,
[child for child in token.children])

Tokens: ['Natural', 'Language', 'Processing', 'is', 'a', 'fascinating', 'field', 'of', 'study', '.']
Lemmas: ['Natural', 'Language', 'Processing', 'be', 'a', 'fascinating', 'field', 'of', 'study', '.']

Dependency Parsing:
Natural compound Language PROPN []
Language compound Processing PROPN [Natural]
Processing nsubj is AUX [Language]
is ROOT is AUX [Processing, field, .]
a det field NOUN []
fascinating amod field NOUN []
field attr is AUX [a, fascinating, of]
of prep field NOUN [study]
study pobj of ADP []
. punct is AUX []

#1 case study
import spacy
# Load English tokenizer, tagger, parser, NER, and word vectors
nlp = spacy.load("en_core_web_sm")
# Sample customer feedback data
customer_feedback = [
"The product is amazing! I love the quality.",
"The customer service was terrible, very disappointed.",
"Great experience overall, highly recommended.",
"The delivery was late, very frustrating."
]
def analyze_feedback(feedback):
for idx, text in enumerate(feedback, start=1):
print(f"\nAnalyzing Feedback {idx}: '{text}'")
doc = nlp(text)
tokens = [token.text for token in doc]
lemmas = [token.lemma_ for token in doc]
print("Tokens:", tokens)
print("Lemmas:", lemmas)
print("\nDependency Parsing:")
for token in doc:
print(token.text, token.dep_, token.head.text, token.head.pos_,
[child for child in token.children])
if __name__ == "__main__":
analyze_feedback(customer_feedback)

Analyzing Feedback 1: 'The product is amazing! I love the quality.'

Analyzing Feedback 2: 'The customer service was terrible, very disappointed.'

Analyzing Feedback 3: 'Great experience overall, highly recommended.'

Analyzing Feedback 4: 'The delivery was late, very frustrating.'

Tokens: ['The', 'delivery', 'was', 'late', ',', 'very', 'frustrating', '.']
Lemmas: ['the', 'delivery', 'be', 'late', ',', 'very', 'frustrating', '.']

Dependency Parsing:
The det delivery NOUN []
delivery nsubj was AUX [The]
was ROOT was AUX [delivery, frustrating, .]
late advmod frustrating ADJ []
, punct frustrating ADJ []
very advmod frustrating ADJ []
frustrating acomp was AUX [late, ,, very]
. punct was AUX []

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 1/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#2
import nltk
import random

nltk.download('punkt')
nltk.download('gutenberg')

words = nltk.corpus.gutenberg.words()

bigrams = list(nltk.bigrams(words))

starting_word = "the"
generated_text = [starting_word]

for _ in range(20):

possible_words = [word2 for (word1, word2) in bigrams if word1.lower() == generated_text[-1].lower()]

next_word = random.choice(possible_words)
generated_text.append(next_word)

print(' '.join(generated_text))

[nltk_data] Downloading package punkt to /root/nltk_data...

[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package gutenberg to /root/nltk_data...
[nltk_data] Unzipping corpora/gutenberg.zip.
the mast head and the son , " If you can afford it doesn ' s eye spare them more step

#2 Case study
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
class EmailAutocompleteSystem:
def __init__(self):
self.model_name = "gpt2"
self.tokenizer = GPT2Tokenizer.from_pretrained(self.model_name)
self.model = GPT2LMHeadModel.from_pretrained(self.model_name)
def generate_suggestions(self, user_input, context):
input_text = f"{context} {user_input}"
input_ids = self.tokenizer.encode(input_text, return_tensors="pt")
with torch.no_grad():
output = self.model.generate(input_ids, max_length=50, num_return_sequences=1,no_repeat_ngram_size=2)
generated_text = self.tokenizer.decode(output[0], skip_special_tokens=True)
suggestions = generated_text.split()[len(user_input.split()):]
return suggestions

if __name__ == "__main__":
autocomplete_system = EmailAutocompleteSystem()
email_context = "Subject: Discussing Project Proposal\nHi [Recipient],"
while True:
user_input = input("Enter your sentence (type 'exit' to end): ")
if user_input.lower() == 'exit':
break
suggestions = autocomplete_system.generate_suggestions(user_input, email_context)
if suggestions:
print("Autocomplete Suggestions:", suggestions)
else:
print("No suggestions available.")

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarni
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public mo
warnings.warn(
tokenizer_config.json: 100% 26.0/26.0 [00:00<00:00, 636B/s]

vocab.json: 100% 1.04M/1.04M [00:00<00:00, 6.15MB/s]

merges.txt: 100% 456k/456k [00:00<00:00, 2.20MB/s]

tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 8.90MB/s]

config.json: 100% 665/665 [00:00<00:00, 12.9kB/s]

model.safetensors: 100% 548M/548M [00:09<00:00, 41.4MB/s]

generation_config.json: 100% 124/124 [00:00<00:00, 588B/s]

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 2/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#3
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import make_pipeline
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load the 20 Newsgroups dataset
categories = ['sci.med', 'sci.space', 'comp.graphics', 'talk.politics.mideast']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)
# Split the data into training and testing sets
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target
# Create a pipeline with TF-IDF vectorizer and LinearSVC classifier
model = make_pipeline(
TfidfVectorizer(),
LinearSVC()
)
# Train the model
model.fit(X_train, y_train)
# Predict labels for the test set
predictions = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, predictions))

Accuracy: 0.9504823151125402

Classification Report:
precision recall f1-score support

0 0.89 0.97 0.93 389

1 0.96 0.91 0.94 396
2 0.98 0.94 0.96 394
3 0.98 0.98 0.98 376

accuracy 0.95 1555

macro avg 0.95 0.95 0.95 1555
weighted avg 0.95 0.95 0.95 1555

#3 Case study
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score, classification_report
# Load the 20 Newsgroups dataset as a proxy for customer support emails
newsgroups = fetch_20newsgroups(subset='all', categories=['comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'rec.autos', 'rec.motorcy
# Prepare data and target labels
X = newsgroups.data
y = newsgroups.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create TF-IDF vectorizer
vectorizer = TfidfVectorizer(stop_words='english', max_features=10000)
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)
# Train the LinearSVC classifier
classifier = LinearSVC()
classifier.fit(X_train, y_train)
# Predict labels for the test set
predictions = classifier.predict(X_test)
# Evaluate the classifier
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, predictions, target_names=newsgroups.target_names))

Accuracy: 0.9389623601220752

Classification Report:
precision recall f1-score support

comp.sys.ibm.pc.hardware 0.92 0.91 0.91 212

comp.sys.mac.hardware 0.94 0.93 0.94 198
rec.autos 0.97 0.93 0.95 179

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 3/9
4/23/24, 2:20 PM nlp.ipynb - Colab
rec.motorcycles 0.96 0.99 0.97 205
sci.electronics 0.92 0.93 0.92 189

accuracy 0.94 983

macro avg 0.94 0.94 0.94 983
weighted avg 0.94 0.94 0.94 983

#4
# Install necessary libraries
!pip install gensim
!pip install nltk
# Import required libraries
import gensim.downloader as api
from nltk.tokenize import word_tokenize
# Download pre-trained word vectors (Word2Vec)
word_vectors = api.load("word2vec-google-news-300")
# Sample sentences
sentences = [
"Natural language processing is a challenging but fascinating field.",
"Word embeddings capture semantic meanings of words in a vector space."
]
# Tokenize sentences
tokenized_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]
# Perform semantic analysis using pre-trained word vectors
for tokenized_sentence in tokenized_sentences:
for word in tokenized_sentence:
if word in word_vectors:
similar_words = word_vectors.most_similar(word)
print(f"Words similar to '{word}': {similar_words}")
else:
print(f"'{word}' is not in the pre-trained Word2Vec model.")

Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (4.3.2)

Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.10/dist-packages (from gensim) (1.25.2)
Requirement already satisfied: scipy>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from gensim) (1.11.4)
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages (from gensim) (6.4.0)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk) (1.4.0)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk) (4.66.2)
Words similar to 'natural': [('Splittorff_lacked', 0.636509358882904), ('Natural', 0.58078932762146), ('Mike_Taugher_covers', 0.5772
Words similar to 'language': [('langauge', 0.7476695775985718), ('Language', 0.6695356369018555), ('languages', 0.6341332197189331),
Words similar to 'processing': [('Processing', 0.7285515666007996), ('processed', 0.6519132852554321), ('processor', 0.6367604136466
Words similar to 'is': [('was', 0.6549733281135559), ("isn'ta", 0.6439523100852966), ('seems', 0.634029746055603), ('Is', 0.60859686
'a' is not in the pre-trained Word2Vec model.
Words similar to 'challenging': [('difficult', 0.6388775110244751), ('challenge', 0.5953003764152527), ('daunting', 0.56980061531066
Words similar to 'but': [('although', 0.8104525804519653), ('though', 0.7285684943199158), ('because', 0.7225914597511292), ('so', 0
Words similar to 'fascinating': [('interesting', 0.7623067498207092), ('intriguing', 0.7245113253593445), ('enlightening', 0.6644250
Words similar to 'field': [('fields', 0.5582526326179504), ('fi_eld', 0.5188260078430176), ('Keith_Toogood', 0.49749255180358887),
'.' is not in the pre-trained Word2Vec model.
Words similar to 'word': [('phrase', 0.6777030825614929), ('words', 0.5864380598068237), ('verb', 0.5517287254333496), ('Word', 0.54
'embeddings' is not in the pre-trained Word2Vec model.
Words similar to 'capture': [('capturing', 0.7563897371292114), ('captured', 0.7155306935310364), ('captures', 0.6099075078964233),
Words similar to 'semantic': [('semantics', 0.6644964814186096), ('Semantic', 0.6464474201202393), ('contextual', 0.5909127593040466
Words similar to 'meanings': [('grammatical_constructions', 0.594986081123352), ('idioms', 0.5938195586204529), ('connotations', 0.5
'of' is not in the pre-trained Word2Vec model.
Words similar to 'words': [('phrases', 0.7100036144256592), ('phrase', 0.6408688426017761), ('Words', 0.6160537600517273), ('word',
Words similar to 'in': [('inthe', 0.5891957879066467), ('where', 0.5662435293197632), ('the', 0.5429296493530273), ('In', 0.54151171
'a' is not in the pre-trained Word2Vec model.
Words similar to 'vector': [('vectors', 0.750322163105011), ('adeno_associated_viral_AAV', 0.5999537110328674), ('bitmap_graphics',
Words similar to 'space': [('spaces', 0.6570690870285034), ('music_concept_ShockHound', 0.5850345492362976), ('Shuttle_docks', 0.556
'.' is not in the pre-trained Word2Vec model.

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 4/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#4 case study
import nltk
from nltk.corpus import wordnet
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
# Initialize NLTK resources
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
# Function to perform semantic analysis
def semantic_analysis(text):
tokens = word_tokenize(text)
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]
synonyms = set()
for token in lemmatized_tokens:
for syn in wordnet.synsets(token):
for lemma in syn.lemmas():
synonyms.add(lemma.name())
return list(synonyms)
# Example customer queries
customer_queries = [
"I received a damaged product. Can I get a refund?",
"I'm having trouble accessing my account.",
"How can I track my order status?",
"The item I received doesn't match the description.",
"Is there a discount available for bulk orders?"
]
# Semantic analysis for each query
for query in customer_queries:
print("Customer Query:", query)
synonyms = semantic_analysis(query)
print("Semantic Analysis (Synonyms):", synonyms)
print("\n")

[nltk_data] Downloading package punkt to /root/nltk_data...

[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
Customer Query: I received a damaged product. Can I get a refund?
Semantic Analysis (Synonyms): ['generate', 'fuck_off', 'damaged', 'bewilder', 'get', 'stick', 'draw', 'pick_up', 'take_in', 'develop

Customer Query: I'm having trouble accessing my account.

Semantic Analysis (Synonyms): ['disturb', 'news_report', 'calculate', 'answer_for', 'accounting', 'invoice', 'account_statement', 'b

Customer Query: How can I track my order status?

Semantic Analysis (Synonyms): ['racetrack', 'position', 'parliamentary_procedure', 'society', 'gild', 'tell', 'order_of_magnitude',

Customer Query: The item I received doesn't match the description.

Semantic Analysis (Synonyms): ['oppose', 'invite', 'received', 'couple', 'agree', 'check', 'receive', 'get', 'friction_match', 'meet

Customer Query: Is there a discount available for bulk orders?

Semantic Analysis (Synonyms): ['orderliness', 'dictate', 'ordination', 'edict', 'bank_discount', 'parliamentary_procedure', 'set_up

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 5/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#5
# Install necessary libraries
!pip install scikit-learn
!pip install nltk
# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from nltk.corpus import movie_reviews # Sample dataset from NLTK
# Download NLTK resources (run only once if not downloaded)
import nltk
nltk.download('movie_reviews')
# Load the movie_reviews dataset
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]

# Convert data to DataFrame

df = pd.DataFrame(documents, columns=['text', 'sentiment'])
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['sentiment'], test_size=0.2,
random_state=42)
# Initialize TF-IDF vectorizer
tfidf_vectorizer = TfidfVectorizer()
# Fit and transform the training data
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train.apply(' '.join))
# Initialize SVM classifier
svm_classifier = SVC(kernel='linear')
# Train the classifier
svm_classifier.fit(X_train_tfidf, y_train)
# Transform the test data
X_test_tfidf = tfidf_vectorizer.transform(X_test.apply(' '.join))
# Predict on the test data
y_pred = svm_classifier.predict(X_test_tfidf)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
# Display classification report
print(classification_report(y_test, y_pred))

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)

Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.25.2)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.11.4)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.4.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (3.4.0)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk) (1.4.0)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk) (4.66.2)
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data] Unzipping corpora/movie_reviews.zip.
Accuracy: 0.84
precision recall f1-score support

neg 0.83 0.85 0.84 199

pos 0.85 0.82 0.84 201

accuracy 0.84 400

macro avg 0.84 0.84 0.84 400
weighted avg 0.84 0.84 0.84 400

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 6/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#5 case study
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
# Download NLTK resources (only required once)
nltk.download('vader_lexicon')
# Sample reviews
reviews = [
"This product is amazing! I love it.",
"The product was good, but the packaging was damaged.",
"Very disappointing experience. Would not recommend.",
"Neutral feedback on the product.",
]
# Initialize Sentiment Intensity Analyzer
sid = SentimentIntensityAnalyzer()
# Analyze sentiment for each review
for review in reviews:
print("Review:", review)
scores = sid.polarity_scores(review)
print("Sentiment:", end=' ')
if scores['compound'] > 0.05:
print("Positive")
elif scores['compound'] < -0.05:
print("Negative")
else:
print("Neutral")
print()

Review: This product is amazing! I love it.

Sentiment: Positive
Review: The product was good, but the packaging was damaged.
Sentiment: Negative
Review: Very disappointing experience. Would not recommend.
Sentiment: Negative
Review: Neutral feedback on the product.
Sentiment: Neutral

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...

#6
# Install NLTK (if not already installed)
!pip install nltk
# Import necessary libraries
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
# Sample text for POS tagging
text = "Parts of speech tagging helps to understand the function of each word in a sentence."
# Tokenize the text into words
tokens = nltk.word_tokenize(text)
# Perform POS tagging
pos_tags = nltk.pos_tag(tokens)
# Display the POS tags
print("POS tags:", pos_tags)

Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)

Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk) (1.4.0)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk) (2023.12.25)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk) (4.66.2)
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
POS tags: [('Parts', 'NNS'), ('of', 'IN'), ('speech', 'NN'), ('tagging', 'VBG'), ('helps', 'NNS'), ('to', 'TO'), ('understand', 'VB

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 7/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#6 Case study
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
# Download NLTK resources (if not already downloaded)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
def pos_tagging(text):
sentences = sent_tokenize(text)
tagged_tokens = []
for sentence in sentences:
tokens = word_tokenize(sentence)
tagged_tokens.extend(nltk.pos_tag(tokens))
return tagged_tokens
def main():
article_text = """Manchester United secured a 3-1 victory over Chelsea in yesterday's match.
Goals from Rashford, Greenwood, and Fernandes sealed the win for United.
Chelsea's only goal came from Pulisic in the first half.
The victory boosts United's chances in the Premier League title race.
"""
tagged_tokens = pos_tagging(article_text)
print("Original Article Text:\n", article_text)
print("\nParts of Speech Tagging:")
for token, pos_tag in tagged_tokens:
print(f"{token}: {pos_tag}")
if __name__ == "__main__":
main()

Original Article Text:

Manchester United secured a 3-1 victory over Chelsea in yesterday's match.
Goals from Rashford, Greenwood, and Fernandes sealed the win for United.
Chelsea's only goal came from Pulisic in the first half.
The victory boosts United's chances in the Premier League title race.

Parts of Speech Tagging:

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 8/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#7
!pip install nltk
import nltk
from nltk import RegexpParser
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
# Download NLTK resources (run only once if not downloaded)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
# Sample sentence
sentence = "The quick brown fox jumps over the lazy dog"
# Tokenize the sentence
tokens = word_tokenize(sentence)
# POS tagging
tagged = pos_tag(tokens)
# Define a chunk grammar using regular expressions
# NP (noun phrase) chunking: "NP: {<DT>?<JJ>*<NN>}"
# This grammar captures optional determiner (DT), adjectives (JJ), and nouns (NN) as a noun phrase
chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>}
"""
# Create a chunk parser with the defined grammar
chunk_parser = RegexpParser(chunk_grammar)
# Parse the tagged sentence to extract chunks
chunks = chunk_parser.parse(tagged)
# Display the chunks
for subtree in chunks.subtrees():
if subtree.label() == 'NP':
print(subtree)

Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 9/9
4/23/24, 2:20 PM nlp.ipynb - Colab

Analyzing Feedback 1: 'The product is amazing! I love the quality.'

Analyzing Feedback 2: 'The customer service was terrible, very disappointed.'

Analyzing Feedback 3: 'Great experience overall, highly recommended.'

Analyzing Feedback 4: 'The delivery was late, very frustrating.'

Tokens: ['The', 'delivery', 'was', 'late', ',', 'very', 'frustrating', '.']
Lemmas: ['the', 'delivery', 'be', 'late', ',', 'very', 'frustrating', '.']

Dependency Parsing:
The det delivery NOUN []
delivery nsubj was AUX [The]
was ROOT was AUX [delivery, frustrating, .]
late advmod frustrating ADJ []
, punct frustrating ADJ []
very advmod frustrating ADJ []
frustrating acomp was AUX [late, ,, very]
. punct was AUX []

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 1/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#2
import nltk
import random

nltk.download('punkt')
nltk.download('gutenberg')

words = nltk.corpus.gutenberg.words()

bigrams = list(nltk.bigrams(words))

starting_word = "the"
generated_text = [starting_word]

for _ in range(20):

possible_words = [word2 for (word1, word2) in bigrams if word1.lower() == generated_text[-1].lower()]

next_word = random.choice(possible_words)
generated_text.append(next_word)

print(' '.join(generated_text))

[nltk_data] Downloading package punkt to /root/nltk_data...

vocab.json: 100% 1.04M/1.04M [00:00<00:00, 6.15MB/s]

merges.txt: 100% 456k/456k [00:00<00:00, 2.20MB/s]

tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 8.90MB/s]

config.json: 100% 665/665 [00:00<00:00, 12.9kB/s]

model.safetensors: 100% 548M/548M [00:09<00:00, 41.4MB/s]

generation_config.json: 100% 124/124 [00:00<00:00, 588B/s]

Accuracy: 0.9504823151125402

Classification Report:
precision recall f1-score support

0 0.89 0.97 0.93 389

1 0.96 0.91 0.94 396
2 0.98 0.94 0.96 394
3 0.98 0.98 0.98 376

accuracy 0.95 1555

macro avg 0.95 0.95 0.95 1555
weighted avg 0.95 0.95 0.95 1555

Accuracy: 0.9389623601220752

Classification Report:
precision recall f1-score support

comp.sys.ibm.pc.hardware 0.92 0.91 0.91 212

comp.sys.mac.hardware 0.94 0.93 0.94 198
rec.autos 0.97 0.93 0.95 179

accuracy 0.94 983

macro avg 0.94 0.94 0.94 983
weighted avg 0.94 0.94 0.94 983

Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (4.3.2)

[nltk_data] Downloading package punkt to /root/nltk_data...

Customer Query: I'm having trouble accessing my account.

Semantic Analysis (Synonyms): ['disturb', 'news_report', 'calculate', 'answer_for', 'accounting', 'invoice', 'account_statement', 'b

Customer Query: How can I track my order status?

Semantic Analysis (Synonyms): ['racetrack', 'position', 'parliamentary_procedure', 'society', 'gild', 'tell', 'order_of_magnitude',

Customer Query: The item I received doesn't match the description.

Semantic Analysis (Synonyms): ['oppose', 'invite', 'received', 'couple', 'agree', 'check', 'receive', 'get', 'friction_match', 'meet

Customer Query: Is there a discount available for bulk orders?

Semantic Analysis (Synonyms): ['orderliness', 'dictate', 'ordination', 'edict', 'bank_discount', 'parliamentary_procedure', 'set_up

# Convert data to DataFrame

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)

neg 0.83 0.85 0.84 199

pos 0.85 0.82 0.84 201

accuracy 0.84 400

macro avg 0.84 0.84 0.84 400
weighted avg 0.84 0.84 0.84 400

Review: This product is amazing! I love it.

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...

Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)

Original Article Text:

Parts of Speech Tagging:

Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 9/9
SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB – 1 : CASE STUDY

AIM: To Enhance Customer Feedback Analysis through NLP-based Text Processing

PROBLEM STATEMENT:

OBJECTIVE:

Utilize spaCy and NLP techniques to process customer feedback text, extract tokens, perform
lemmatization, and conduct dependency parsing to uncover underlying relationships between
words.

APPROACH:

Data Collection:
Gather a dataset containing customer feedback from different sources, including emails, social
media comments, and survey responses.

Text Processing with spaCy:

Utilize spaCy to process the customer feedback text. Extract tokens to identify individual words
and perform lemmatization to obtain their base forms.
1
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Insight Generation:
Categorize the feedback based on sentiment, identify frequently occurring topics, or extract
key phrases related to specific issues or praises mentioned by customers.

Implementation:

# Load English tokenizer, tagger, parser, NER, and word vectors

nlp = spacy.load("en_core_web_sm")

# Sample customer feedback data

customer_feedback = [
"The product is amazing! I love the quality.",
"The customer service was terrible, very disappointed.",
"Great experience overall, highly recommended.",
2

"The delivery was late, very frustrating."

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

]

def analyze_feedback(feedback):
for idx, text in enumerate(feedback, start=1):
print(f"\nAnalyzing Feedback {idx}: '{text}'")
doc = nlp(text)

# Extract tokens and lemmatization

tokens = [token.text for token in doc]
lemmas = [token.lemma_ for token in doc]
print("Tokens:", tokens)
print("Lemmas:", lemmas)

# Dependency parsing
print("\nDependency Parsing:")
for token in doc:
print(token.text, token.dep_, token.head.text, token.head.pos_,
[child for child in token.children])

if __name__ == "__main__":
analyze_feedback(customer_feedback)

Dependency Parsing:
The det product NOUN []
3

product nsubj is AUX [The]

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Analyzing Feedback 2: 'The customer service was terrible, very disappointed.'

Tokens: ['The', 'customer', 'service', 'was', 'terrible', ',', 'very', 'disappointed', '.']
Lemmas: ['the', 'customer', 'service', 'be', 'terrible', ',', 'very', 'disappointed', '.']

Analyzing Feedback 3: 'Great experience overall, highly recommended.'

Tokens: ['Great', 'experience', 'overall', ',', 'highly', 'recommended', '.']
Lemmas: ['great', 'experience', 'overall', ',', 'highly', 'recommend', '.']

Dependency Parsing:
4

Great amod experience NOUN []

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Analyzing Feedback 4: 'The delivery was late, very frustrating.'

Tokens: ['The', 'delivery', 'was', 'late', ',', 'very', 'frustrating', '.']
Lemmas: ['the', 'delivery', 'be', 'late', ',', 'very', 'frustrating', '.']

RESULT:
This case study demonstrates the practical application of the provided code snippet using
5

spaCy in a business context, specifically for customer feedback analysis, showcasing how
Page

NLP techniques can be employed to extract valuable insights from unstructured text data.

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 7: CASE STUDY

Program :
import nltk
import os

# Set NLTK data path

nltk.data.path.append("/usr/local/share/nltk_data")

# Download the 'punkt' tokenizer model

nltk.download('punkt')

# Download the 'averaged_perceptron_tagger' model

nltk.download('averaged_perceptron_tagger')

# Sample text
text = "The quick brown fox jumps over the lazy dog."

# Tokenize the text into words

words = nltk.word_tokenize(text)

# Perform part-of-speech tagging

pos_tags = nltk.pos_tag(words)

# Define chunk grammar

chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>} # Chunk sequences of DT, JJ, NN
"""

# Create chunk parser

chunk_parser = nltk.RegexpParser(chunk_grammar)

# Apply chunking
chunked_text = chunk_parser.parse(pos_tags)

# Extract noun phrases

noun_phrases = []
for subtree in chunked_text.subtrees(filter=lambda t: t.label() ==
'NP'):
noun_phrases.append(' '.join(word for word, tag in
subtree.leaves()))

# Output
print("Original Text:", text)
print("Noun Phrases:")
for phrase in noun_phrases:
print("-", phrase)

LAB 5: SENTIMENT ANALYSIS

AIM: To perform sentiment analysis program using an SVM classifier with TF-IDF
vectorization.

4. Split Data into Train and Test Sets: Split the dataset into training and testing sets (e.g.,
Page

80% training, 20% testing).

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

PROGRAM:
# Install necessary libraries
!pip install scikit-learn
!pip install nltk

# Import required libraries

# Download NLTK resources (run only once if not downloaded)

import nltk
nltk.download('movie_reviews')

# Load the movie_reviews dataset

documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
2
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

# Convert data to DataFrame
df = pd.DataFrame(documents, columns=['text', 'sentiment'])

# Split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(df['text'], df['sentiment'], test_size=0.2,
random_state=42)

# Initialize TF-IDF vectorizer

tfidf_vectorizer = TfidfVectorizer()

# Fit and transform the training data

X_train_tfidf = tfidf_vectorizer.fit_transform(X_train.apply(' '.join))

# Initialize SVM classifier

svm_classifier = SVC(kernel='linear')

# Train the classifier

svm_classifier.fit(X_train_tfidf, y_train)

# Transform the test data

X_test_tfidf = tfidf_vectorizer.transform(X_test.apply(' '.join))

# Predict on the test data

y_pred = svm_classifier.predict(X_test_tfidf)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Display classification report

3
Page

print(classification_report(y_test, y_pred))

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

neg 0.83 0.85 0.84 199

pos 0.85 0.82 0.84 201

accuracy 0.84 400

macro avg 0.84 0.84 0.84 400
weighted avg 0.84 0.84 0.84 400

4
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 3: TEXT CLASSIFICATION

AIM: To perform Text classification using python and scikit-learn

1. Load the 20 Newsgroups dataset with specified categories.

2. Split the dataset into training and testing sets.

- Import train_test_split from sklearn.model_selection.
- Split the dataset into X_train, X_test, y_train, and y_test.

3. Create a pipeline for text classification.

- Import make_pipeline from sklearn.pipeline.
- Create a pipeline with TF-IDF Vectorizer and LinearSVC classifier.

4. Train the model on the training data.

- Call the fit method on the pipeline with X_train and y_train as input.

5. Predict labels for the testing data.

- Use the trained model to predict labels for X_test.

6. Evaluate the model's performance.

- Calculate accuracy_score to measure the accuracy of the model.
- Print classification_report to see precision, recall, and F1-score for each class.

End Algorithm
PROGRAM:
# Install scikit-learn if not already installed
!pip install scikit-learn

# Import necessary libraries

import pandas as pd

from sklearn.datasets import fetch_20newsgroups

# Load the 20 Newsgroups dataset

# Split the data into training and testing sets

X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

# Create a pipeline with TF-IDF vectorizer and LinearSVC classifier

model = make_pipeline(
TfidfVectorizer(),
LinearSVC()
)

# Train the model

model.fit(X_train, y_train)

# Predict labels for the test set

predictions = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, predictions))

Accuracy: 0.9504823151125402
Classification Report:
precision recall f1-score support

0 0.89 0.97 0.93 389

1 0.96 0.91 0.94 396
2 0.98 0.94 0.96 394
3 0.98 0.98 0.98 376

accuracy 0.95 1555

LAB 6: CASE STUDY

AIM: Parts of Speech Tagging

# Download NLTK resources (if not already downloaded)

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

def pos_tagging(text):
sentences = sent_tokenize(text)
tagged_tokens = []
for sentence in sentences:
tokens = word_tokenize(sentence)
tagged_tokens.extend(nltk.pos_tag(tokens))
return tagged_tokens

tagged_tokens = pos_tagging(article_text)
print("Original Article Text:\n", article_text)
print("\nParts of Speech Tagging:")
for token, pos_tag in tagged_tokens:
print(f"{token}: {pos_tag}")

if __name__ == "__main__":
main()

Output:
Original Article Text:

Manchester United secured a 3-1 victory over Chelsea in yesterday's match.

Goals from Rashford, Greenwood, and Fernandes sealed the win for United.
Chelsea's only goal came from Pulisic in the first half.
The victory boosts United's chances in the Premier League title race.

Parts of Speech Tagging:

LAB 7: CHUNKING

AIM: To perform Noun Phrase chunking

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

ALGORITHM:

1. Import Necessary Libraries: Import required modules from NLTK for tokenization,
POS tagging, and chunking.

2. Download NLTK Resources (if needed): Ensure NLTK resources like tokenizers and
POS taggers are downloaded (nltk.download('punkt'),
nltk.download('averaged_perceptron_tagger')).

3. Define a Sample Sentence: Set a sample sentence that will be used for chunking.

4. Tokenization: Break the sentence into individual words or tokens using NLTK's
word_tokenize() function.

5. Part-of-Speech (POS) Tagging: Tag each token with its corresponding part-of-speech
using NLTK's pos_tag() function.

6. Chunk Grammar Definition: Define a chunk grammar using regular expressions to

identify noun phrases (NP). For example, NP: {<DT>?<JJ>*<NN>} captures
sequences with optional determiners (DT), adjectives (JJ), and nouns (NN) as noun
phrases.

7. Chunk Parser Creation: Create a chunk parser using RegexpParser() and provide the
defined chunk grammar.

8. Chunking: Parse the tagged sentence using the created chunk parser to extract chunks
based on the defined grammar.

9. Display Chunks: Iterate through the parsed chunks and print the subtrees labeled as
'NP', which represent the identified noun phrases.
2
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

PROGRAM:
!pip install nltk
import nltk
from nltk import RegexpParser
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

# Download NLTK resources (run only once if not downloaded)

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Sample sentence
sentence = "The quick brown fox jumps over the lazy dog"

# Tokenize the sentence

tokens = word_tokenize(sentence)

# POS tagging
tagged = pos_tag(tokens)

# Define a chunk grammar using regular expressions

# NP (noun phrase) chunking: "NP: {<DT>?<JJ>*<NN>}"
# This grammar captures optional determiner (DT), adjectives (JJ), and nouns (NN) as a noun
phrase
chunk_grammar = r"""
NP: {<DT>?<JJ>*<NN>}
"""

# Create a chunk parser with the defined grammar

chunk_parser = RegexpParser(chunk_grammar)
3
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

# Parse the tagged sentence to extract chunks
chunks = chunk_parser.parse(tagged)

# Display the chunks

for subtree in chunks.subtrees():
if subtree.label() == 'NP': # Print only noun phrases
print(subtree)

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 6: PARTS OF SPEECH TAGGING

AIM: To perform Parts of Speech (POS) tagging program using NLTK

4. Tokenization: Break down the provided text into individual words (tokens) using
Page

NLTK's word_tokenize() method.

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

PROGRAM:
# Install NLTK (if not already installed)
!pip install nltk

# Import necessary libraries

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Sample text for POS tagging

text = "Parts of speech tagging helps to understand the function of each word in a sentence."

# Tokenize the text into words

tokens = nltk.word_tokenize(text)

# Perform POS tagging

pos_tags = nltk.pos_tag(tokens)

# Display the POS tags

print("POS tags:", pos_tags)

OUTPUT:
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk)
2

(8.1.7)
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

3
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 3: CASE STUDY

AIM: Customer Support Email Classification

# Prepare data and target labels

X = newsgroups.data
y = newsgroups.target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create TF-IDF vectorizer

vectorizer = TfidfVectorizer(stop_words='english', max_features=10000)
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)

# Train the LinearSVC classifier

classifier = LinearSVC()
classifier.fit(X_train, y_train)

# Predict labels for the test set

predictions = classifier.predict(X_test)

# Evaluate the classifier

Classification Report:
precision recall f1-score support

comp.sys.ibm.pc.hardware 0.92 0.91 0.91 212

comp.sys.mac.hardware 0.94 0.93 0.94 198
rec.autos 0.97 0.93 0.95 179
rec.motorcycles 0.96 0.99 0.97 205
sci.electronics 0.92 0.93 0.92 189

accuracy 0.94 983

macro avg 0.94 0.94 0.94 983
weighted avg 0.94 0.94 0.94 983

LAB 4: SEMANTIC ANALYSIS

AIM: To perform Semantic Analysis using Gensim

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

PROGRAM:
# Install necessary libraries
!pip install gensim
!pip install nltk

# Import required libraries

import gensim.downloader as api
from nltk.tokenize import word_tokenize

# Download pre-trained word vectors (Word2Vec)

word_vectors = api.load("word2vec-google-news-300")

# Sample sentences
sentences = [
"Natural language processing is a challenging but fascinating field.",
"Word embeddings capture semantic meanings of words in a vector space."
2
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

# Tokenize sentences
tokenized_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]

# Perform semantic analysis using pre-trained word vectors

Words similar to 'natural': [('Splittorff_lacked', 0.636509358882904), ('Natural',

Page

0.58078932762146), ('Mike_Taugher_covers', 0.577259361743927), ('manmade',

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

('Chris_Manfredini_kicked', 0.47327715158462524)]

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

0.5015827417373657)]
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

6
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

SATHYABAMA INSTITUTE OF SCIENCE & TECHNOLOGY
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCSA 2604 NATURAL LANGUAGE PROCESSING LAB

LAB 4: CASE STUDY

AIM: Enhancing Customer Service with Semantic Analysis

# Initialize NLTK resources

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Function to perform semantic analysis

def semantic_analysis(text):
# Tokenize text
tokens = word_tokenize(text)

# Remove stopwords
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in
stop_words]

# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(token) for token in
filtered_tokens]

# Synonyms generation
synonyms = set()
for token in lemmatized_tokens:
for syn in wordnet.synsets(token):
for lemma in syn.lemmas():
synonyms.add(lemma.name())

return list(synonyms)

# Example customer queries

# Semantic analysis for each query

for query in customer_queries:
print("Customer Query:", query)
synonyms = semantic_analysis(query)
print("Semantic Analysis (Synonyms):", synonyms)
print("\n")

Customer Query: I'm having trouble accessing my account.

Customer Query: The item I received doesn't match the description.

Customer Query: Is there a discount available for bulk orders?

Aim: Autocomplete System for Email Composition

Objectives:

Program :
!pip install transformers
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

def generate_suggestions(self, user_input, context):

input_text = f"{context} {user_input}"
input_ids = self.tokenizer.encode(input_text, return_tensors="pt")

with torch.no_grad():
output = self.model.generate(input_ids, max_length=50, num_return_sequences=1,
no_repeat_ngram_size=2)

generated_text = self.tokenizer.decode(output[0], skip_special_tokens=True)

suggestions = generated_text.split()[len(user_input.split()):]
return suggestions

# Example usage
if __name__ == "__main__":
autocomplete_system = EmailAutocompleteSystem()

# Assume user is composing an email with some context

email_context = "Subject: Discussing Project Proposal\nHi [Recipient],"

while True:
user_input = input("Enter your sentence (type 'exit' to end): ")

if user_input.lower() == 'exit':
break

suggestions = autocomplete_system.generate_suggestions(user_input, email_context)

if suggestions:
print("Autocomplete Suggestions:", suggestions)
else:
print("No suggestions available.")

Analyzing Feedback 1: 'The product is amazing! I love the quality.'

Analyzing Feedback 2: 'The customer service was terrible, very disappointed.'

Analyzing Feedback 3: 'Great experience overall, highly recommended.'

Analyzing Feedback 4: 'The delivery was late, very frustrating.'

Tokens: ['The', 'delivery', 'was', 'late', ',', 'very', 'frustrating', '.']
Lemmas: ['the', 'delivery', 'be', 'late', ',', 'very', 'frustrating', '.']

Dependency Parsing:
The det delivery NOUN []
delivery nsubj was AUX [The]
was ROOT was AUX [delivery, frustrating, .]
late advmod frustrating ADJ []
, punct frustrating ADJ []
very advmod frustrating ADJ []
frustrating acomp was AUX [late, ,, very]
. punct was AUX []

https://colab.research.google.com/drive/15A6dtAf88PKmxYgKa333vDGxQRI7pumd#scrollTo=ADhz1KBmxW3V 1/9
4/23/24, 2:20 PM nlp.ipynb - Colab
#2
import nltk
import random

nltk.download('punkt')
nltk.download('gutenberg')

words = nltk.corpus.gutenberg.words()

bigrams = list(nltk.bigrams(words))

starting_word = "the"
generated_text = [starting_word]

for _ in range(20):

possible_words = [word2 for (word1, word2) in bigrams if word1.lower() == generated_text[-1].lower()]

next_word = random.choice(possible_words)
generated_text.append(next_word)

print(' '.join(generated_text))

[nltk_data] Downloading package punkt to /root/nltk_data...

vocab.json: 100% 1.04M/1.04M [00:00<00:00, 6.15MB/s]

merges.txt: 100% 456k/456k [00:00<00:00, 2.20MB/s]

tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 8.90MB/s]

config.json: 100% 665/665 [00:00<00:00, 12.9kB/s]

model.safetensors: 100% 548M/548M [00:09<00:00, 41.4MB/s]

generation_config.json: 100% 124/124 [00:00<00:00, 588B/s]

Accuracy: 0.9504823151125402

Classification Report:
precision recall f1-score support

0 0.89 0.97 0.93 389

1 0.96 0.91 0.94 396
2 0.98 0.94 0.96 394
3 0.98 0.98 0.98 376

accuracy 0.95 1555

macro avg 0.95 0.95 0.95 1555
weighted avg 0.95 0.95 0.95 1555

Accuracy: 0.9389623601220752

Classification Report:
precision recall f1-score support

comp.sys.ibm.pc.hardware 0.92 0.91 0.91 212

comp.sys.mac.hardware 0.94 0.93 0.94 198
rec.autos 0.97 0.93 0.95 179

accuracy 0.94 983

macro avg 0.94 0.94 0.94 983
weighted avg 0.94 0.94 0.94 983

Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (4.3.2)

[nltk_data] Downloading package punkt to /root/nltk_data...

Customer Query: I'm having trouble accessing my account.

Semantic Analysis (Synonyms): ['disturb', 'news_report', 'calculate', 'answer_for', 'accounting', 'invoice', 'account_statement', 'b

Customer Query: How can I track my order status?

Semantic Analysis (Synonyms): ['racetrack', 'position', 'parliamentary_procedure', 'society', 'gild', 'tell', 'order_of_magnitude',

Customer Query: The item I received doesn't match the description.

Semantic Analysis (Synonyms): ['oppose', 'invite', 'received', 'couple', 'agree', 'check', 'receive', 'get', 'friction_match', 'meet

Customer Query: Is there a discount available for bulk orders?

Semantic Analysis (Synonyms): ['orderliness', 'dictate', 'ordination', 'edict', 'bank_discount', 'parliamentary_procedure', 'set_up

# Convert data to DataFrame

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)

neg 0.83 0.85 0.84 199

pos 0.85 0.82 0.84 201

accuracy 0.84 400

macro avg 0.84 0.84 0.84 400
weighted avg 0.84 0.84 0.84 400

Review: This product is amazing! I love it.

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...

Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)

Original Article Text:

Parts of Speech Tagging:

Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)

LAB – 1 : CASE STUDY

AIM: To Enhance Customer Feedback Analysis through NLP-based Text Processing

PROBLEM STATEMENT:

OBJECTIVE:

Utilize spaCy and NLP techniques to process customer feedback text, extract tokens, perform
lemmatization, and conduct dependency parsing to uncover underlying relationships between
words.

APPROACH:

Data Collection:
Gather a dataset containing customer feedback from different sources, including emails, social
media comments, and survey responses.

Text Processing with spaCy:

Utilize spaCy to process the customer feedback text. Extract tokens to identify individual words
and perform lemmatization to obtain their base forms.
1
Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Insight Generation:
Categorize the feedback based on sentiment, identify frequently occurring topics, or extract
key phrases related to specific issues or praises mentioned by customers.

Implementation:

# Load English tokenizer, tagger, parser, NER, and word vectors

nlp = spacy.load("en_core_web_sm")

# Sample customer feedback data

customer_feedback = [
"The product is amazing! I love the quality.",
"The customer service was terrible, very disappointed.",
"Great experience overall, highly recommended.",
2

"The delivery was late, very frustrating."

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

]

def analyze_feedback(feedback):
for idx, text in enumerate(feedback, start=1):
print(f"\nAnalyzing Feedback {idx}: '{text}'")
doc = nlp(text)

# Extract tokens and lemmatization

tokens = [token.text for token in doc]
lemmas = [token.lemma_ for token in doc]
print("Tokens:", tokens)
print("Lemmas:", lemmas)

# Dependency parsing
print("\nDependency Parsing:")
for token in doc:
print(token.text, token.dep_, token.head.text, token.head.pos_,
[child for child in token.children])

if __name__ == "__main__":
analyze_feedback(customer_feedback)

Dependency Parsing:
The det product NOUN []
3

product nsubj is AUX [The]

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Analyzing Feedback 2: 'The customer service was terrible, very disappointed.'

Tokens: ['The', 'customer', 'service', 'was', 'terrible', ',', 'very', 'disappointed', '.']
Lemmas: ['the', 'customer', 'service', 'be', 'terrible', ',', 'very', 'disappointed', '.']

Analyzing Feedback 3: 'Great experience overall, highly recommended.'

Tokens: ['Great', 'experience', 'overall', ',', 'highly', 'recommended', '.']
Lemmas: ['great', 'experience', 'overall', ',', 'highly', 'recommend', '.']

Dependency Parsing:
4

Great amod experience NOUN []

Page

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

Analyzing Feedback 4: 'The delivery was late, very frustrating.'

Tokens: ['The', 'delivery', 'was', 'late', ',', 'very', 'frustrating', '.']
Lemmas: ['the', 'delivery', 'be', 'late', ',', 'very', 'frustrating', '.']

RESULT:
This case study demonstrates the practical application of the provided code snippet using
5

spaCy in a business context, specifically for customer feedback analysis, showcasing how
Page

NLP techniques can be employed to extract valuable insights from unstructured text data.

Mrs. Parveen . A , Assistant Professor, Dept. of CSE, SIST

ISight User Guide 1
No ratings yet
ISight User Guide 1
619 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
DL Project
No ratings yet
DL Project
21 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
DOC-20250208-WA0002
No ratings yet
DOC-20250208-WA0002
21 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
29-Movie Review NLTK
No ratings yet
29-Movie Review NLTK
27 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
Methodology (Autosaved)
No ratings yet
Methodology (Autosaved)
9 pages
text, pos, wor2vec
No ratings yet
text, pos, wor2vec
3 pages
Parts of Speech Tagger
No ratings yet
Parts of Speech Tagger
12 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
unit4 (1)
No ratings yet
unit4 (1)
23 pages
Assignment-10 (NLP-part-2)
No ratings yet
Assignment-10 (NLP-part-2)
2 pages
SL-3_Assignment No 7
No ratings yet
SL-3_Assignment No 7
14 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
unit 3 4
No ratings yet
unit 3 4
6 pages
Ppt- Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Ppt- Sentiment Analysis Using Machine Learning Algorithms
23 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
CS771: GROUP-19 Sentiment Analysis in Movie Reviews: Project Report
No ratings yet
CS771: GROUP-19 Sentiment Analysis in Movie Reviews: Project Report
28 pages
Ai Project
No ratings yet
Ai Project
15 pages
NLP_Assignment2
No ratings yet
NLP_Assignment2
7 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
nlp_project(documentation)
No ratings yet
nlp_project(documentation)
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Text Classification_movie Review_news Wires
No ratings yet
Text Classification_movie Review_news Wires
5 pages
Assignment data science intern
No ratings yet
Assignment data science intern
8 pages
Cs221 Report
No ratings yet
Cs221 Report
16 pages
NLP_Assignment2 proper RNN working
No ratings yet
NLP_Assignment2 proper RNN working
3 pages
Assingment-3 NLP
No ratings yet
Assingment-3 NLP
5 pages
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
No ratings yet
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
4 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Bhatt Pds Print - 77-85
No ratings yet
Bhatt Pds Print - 77-85
9 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
TextFeatureEnginerring-NLP lec2
No ratings yet
TextFeatureEnginerring-NLP lec2
60 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
0% (1)
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
22 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
3 pages
FALLSEM2024-25 BCSE332P LO VL2024250102168 2024-10-07 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE332P LO VL2024250102168 2024-10-07 Reference-Material-I
18 pages
NLP_A2 (2)
No ratings yet
NLP_A2 (2)
7 pages
Twitter Analysis
No ratings yet
Twitter Analysis
8 pages
AIML IA3 loki & SG (2)
No ratings yet
AIML IA3 loki & SG (2)
31 pages
Adobe Scan 08 Jan 2025
No ratings yet
Adobe Scan 08 Jan 2025
7 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
21 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
NLP Final
No ratings yet
NLP Final
26 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
Sentiment Analysis in Python Using NLTK: December 2016
No ratings yet
Sentiment Analysis in Python Using NLTK: December 2016
3 pages
Neural Networks
No ratings yet
Neural Networks
8 pages
Maneesha Nidigonda Major Project
No ratings yet
Maneesha Nidigonda Major Project
11 pages
AI Lab Manual aktu
No ratings yet
AI Lab Manual aktu
11 pages
Dav Exp7 56
No ratings yet
Dav Exp7 56
8 pages
Final Presentation
No ratings yet
Final Presentation
18 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
3160715-System Software Lab-Manual
No ratings yet
3160715-System Software Lab-Manual
51 pages
Lecture 04
No ratings yet
Lecture 04
35 pages
Endsem NLP IMPORTANT QUESTIONS
No ratings yet
Endsem NLP IMPORTANT QUESTIONS
2 pages
Elsa/Oink/Cqual++: Open-Source Static Analysis For C++
No ratings yet
Elsa/Oink/Cqual++: Open-Source Static Analysis For C++
29 pages
Homework and Exams
No ratings yet
Homework and Exams
8 pages
Dr. TV. Geetha
No ratings yet
Dr. TV. Geetha
176 pages
CD Assignment1
No ratings yet
CD Assignment1
1 page
Gomez-ESP2000
No ratings yet
Gomez-ESP2000
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
36 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Unit - 3 Syntax Analysis: 3.1 Role of The Parser
No ratings yet
Unit - 3 Syntax Analysis: 3.1 Role of The Parser
6 pages
Delta Vocab
No ratings yet
Delta Vocab
11 pages
Introduction To Programming Handout
No ratings yet
Introduction To Programming Handout
30 pages
ABNEY 1996 Statistical Methods and Linguistics
No ratings yet
ABNEY 1996 Statistical Methods and Linguistics
23 pages
unit 4 (1)
No ratings yet
unit 4 (1)
39 pages
M.sc. Computer Science Syllabus - 2021-2022 Onwards
No ratings yet
M.sc. Computer Science Syllabus - 2021-2022 Onwards
66 pages
1999 - Caplan - Verbal Working Memory and Sentence Comprehension
No ratings yet
1999 - Caplan - Verbal Working Memory and Sentence Comprehension
50 pages
java errors
No ratings yet
java errors
15 pages
HL7apy: A Python Library To Parse, Create and Handle HL7 v2.x Messages
100% (1)
HL7apy: A Python Library To Parse, Create and Handle HL7 v2.x Messages
10 pages
L2 - Structure of a Compiler
No ratings yet
L2 - Structure of a Compiler
43 pages
NLP QB Final
No ratings yet
NLP QB Final
51 pages
Observability at Twitter - Technical Overview, Part I - Twitter Blogs
No ratings yet
Observability at Twitter - Technical Overview, Part I - Twitter Blogs
7 pages
Chatbot For Healthcare Using Machine Learning
No ratings yet
Chatbot For Healthcare Using Machine Learning
4 pages
XML Parsers: Types of Parsers Using XML Parsers SAX DOM DOM Versus SAX Products Conclusion
No ratings yet
XML Parsers: Types of Parsers Using XML Parsers SAX DOM DOM Versus SAX Products Conclusion
20 pages
CLR 1 Parsing - Javatpoint
No ratings yet
CLR 1 Parsing - Javatpoint
6 pages
Structural Grammar2014 13oct
No ratings yet
Structural Grammar2014 13oct
14 pages
Ijitcs V10 N9 3
No ratings yet
Ijitcs V10 N9 3
11 pages
Edu 569
No ratings yet
Edu 569
2 pages
Lecture 9
No ratings yet
Lecture 9
31 pages