0% found this document useful (0 votes)

63 views

Final Project Report

This document is a project report on using Python for sentiment analysis of restaurant reviews. It includes an introduction discussing what sentiment analysis is and how it can be used to analyze customer feedback and public opinions. The report contains chapters on collecting and preparing review data, different classification algorithms that can be used for analysis like SVM, Naive Bayes, logistic regression, and neural networks, and the methodology used. It discusses implementing various models and analyzing their performance, with the highest accuracy achieved being 69.23% using SGD.

Uploaded by

Utsho Ghosh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Final Project Report

Uploaded by

Utsho Ghosh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 43

1

Project Report
On
Using Python Language for Sentiment Analysis of Restaurant
Reviews

Dept. of
Electrical and Electronic Engineering
Faculty of Engineering and Technology
Begum Rokeya University , Rangpur

Submitted by
Md. Iftik Arman Emon
ID No: 1716017
Reg.No:000010701
Session: 2017-2018

Supervised by
Iffat Ara Badhan
Lecturer , Dept. of EEE
Begum Rokeya University ,Rangpur
2

ACKNOWLEDGEMENT

A project was included in the syllabus of the final year of the Department of Electrical and
Electronic Engineering at Begum Rokeya University, Rangpur. Represented Iffat Ara
Badhon mam’s gave necessary instructions to do this project successfully.I am grateful to her
for this . I also thank all of my friends who have helped me run the project in various ways

Author
……………………………
3

CERTIFICATE

This is to certify that Md. Iftik Arman Emon ID number 1716017 , Reg. number 000010701,
sessional :2017-2018, has successfully finished the project titled “ Using Python Language
for Sentiment Analysis of Restaurant Reviews ”. In order to completed the criteria for the
Bachelor of Engineering in Electrical and Electronic Engineering degree, the project was
carried out under my supervision and guidance. To the best of the of my knowledge and
belief, the project report contains the candidate’s original work, for which she conducted
adequate investigation and can be noted as a unique idea.

…………………………
Iffat Ara Badhan
Lecturer, Department of Electrical and
Electronic Engineering
Begum Rokeya University,Rangpur
4

DECLARATION

The project report “Using Python Language for Sentiment Analysis of Restaurant Reviews ” is
based on my personal work that was completed throughout the course of our studies under
the supervision of Iffat Ara Badhan Mam’s. I the undersigned solemnly swear .I claim that
claims made and judgments made are a results of my study. I further certify that
1. The work contained in the report is original and has been done by me under the
general supervision of my supervisor
2. I have followed the guidelines provided by the university in writing report
3. Whenever I have used materials from other sources, we have given due credit to them
in the text of the report and giving their details in the references

………………………………..
Md. Iftik Arman Emon
ID no:1716017
Reg no: 000010701
Session: 2017-2018
Department of Electrical and Electronic Engineering
Begum Rokeya University, Rangpur
5

LIST OF CONTENTS
ABSTRACT……………………………………………………………………………7

CHAPTER1.INTRODUCTION………………………………………………………8

1.1.Introduction ………………………………………………….8
1.2. Related work………………………………………………....9
1.3.Objective of Project…………………………………………..9

CHAPTER 2. ARCHITECTURE and MODELING……………………………..10

2.1. Architectural diagram…………………………………...……10

2.2. Working Principle…………………………………………….11

CHAPTER 3. DATA COLLECTION and PREPPING OF INFORMATION..…12

3.1.Flssk==1.1.1
3.2.gunicorn==19.9.0
3.3. itsdangerous==1.1.0 ……………………………………13
3.4.jinja2==2.10.1
3.5.MarkSafe==1.1..1
3.6.Werkzeug==0.15.5
3.7. numpy>=1.9.2 …………………………………..14
3.8.scipy>=0.15.1
3.9.Scikit-learn>=1.4.3
3.10. matplotlib>=1.4.3
3.11.Pandas>=0.19 …………………………………….15

CHAPTER 4. ALGORITHM and MODEL of DATA ANALYSIS……………….20

4.1. SVM……………………………………………………………20
4.2. NAIVE BAYES………………………………………………..21-
22
4.3. LOGISTIC REGRETION……………………………………..23
4.4. LTSM………………………………………………………….24-
25
4.5. BERT………………………………………………………….26-27
6

CHAPTER 5. METHODOLOGY………………………………………………….28

5.1. Methodology……………………………………………………..28
5.2 . Classification
 Multinomial Naïve Bayes
 Random Forest
 Decision Tree ………………………….28-
30
 Support Vector Machine
5.3 . Achievement Rating Assessment………………………………..30
5.3 . Predicting a class…………………………………………………31

CHAPTER 6. IMPLEMENTATION AND PERFORMANCE ANALYSIS………….31

6.1. Implementation and Performance Analysis …………………....31-
40
6.2. Conclusion……………………………………………………….41

CHAPTER 7. REFERENCES………………………………………………………..42-43
7

Abstract In the last ten years, the Internet's development has generated vast amounts of data
across all industries. These innovations have given people new avenues for expressing their
ideas on anything through tweets, blog entries, online forums, status updates, etc. Sentiment
analysis is the technique of computationally identifying and classifying opinions stated in a
text, particularly to ascertain if the writer has a good, negative, or neutral attitude towards a
given topic. Any firm should be very interested in client feedback. Therefore, in this present
paper, we use python language classification system to analyze the customer reviews of the
restaurant. This study's major topics are the use of several categorization algorithms and an
evaluation of their effectiveness. According to the simulation findings, the highest accuracy
achieved by SDG at 69.23%

Keywords LR(Logistic Regression model), DT(Decision Tree Model), RF( Random Forest
Model) , MNB( Multinomial Naïve model) , KNN ( K- Nearest Neighbors model) , Linear
SVM ( Linear Support Vector Machine model ) , SGD ( Stochastic Gradient Descent model )
8

Introduction
There is an infinite amount of online activity, including blog posts, video calls, conferences,
monitoring, and other e-commerce and online transactions. have been sparked by the
exponential rise in Internet usage. This makes it necessary to quickly collect, convert, load,
and analyzes vast amounts of diverse, unstructured data[1]. Numerous discussion boards,
blogs, social networks, e-commerce websites, news articles, and other online resources
provide a place for opinion expression that can be used to gauge the opinions of the general
population. Sentiment analysis aids in identifying, extracting, and categories ideas,
sentiments, and attitudes conveyed in textual input on many issues [2]. Additionally, it aids in
reaching objectives such as tracking public opinion on political movements, gauging
customer happiness, forecasting movie sales, and ascertaining critics' view points. To extract
key information about a certain product (including variables such a digital camera, a
computer, books, or films), sentiment analysis can be used to categorize online evaluations of
merchandise from retailers like Ebey and Flip kart . The method of sentiment analysis is
frequently used to track how the public's views on a political candidate are evolving by
looking at online discussion boards. [3]. Since it may be utilized for study into trends or
consumer preferences, monitoring the mood of bloggers is likewise becoming a highly
sought-after research area. Sentiment analysis is turning out to be absolutely essential in the
area of opinion spam. Opinion spam describes criminal practices that aim to deceive readers,
such as writing fraudulent reviews (also known as shilling). It might be seen as an automated
sentiment analysis system giving some target entities unwarranted good assessments in an
effort to advance the entities. It can also mean giving a falsely adverse review of another
organization in an effort to harm its reputation. Studying the reviews and looking at the
sentiment scores is the major objective of sentiment analysis. People primarily rely on user-
generated content while making decisions. Before making a purchase, the user can determine
via sentiment analysis whether the product's information is satisfactory or not. Companies
and advertising agencies use this analysis data to find out more about their products or
services so they can more effectively satisfy client wants. Sentiment analysis is typically
performed at several levels, ranging from coarse to fine. A document's overall sentiment is
determined by coarse- evaluation of sensitivity, while attribute-level sentiment analysis is the
focus of fine-level analysis [4]. In between these two is where emotion evaluation at the
sentence stage exists.

Either a based on information technique or a strategy based on artificial intelligence can be

used for sentiment analysis. We can use language processing techniques or Lexicon methods
to analyze the sentiment under the knowledge-based approach[5]. Unsupervised and
supervised learning are additional categories for the machine learning-based method [6]. To
draw qualities including Word distribution, elements that make up speech tags, phrases and
9

keywords expressing opinions, language processing techniques are employed [7]. However,
using a dataset that is originally categorized by a human, methods for guided artificial
intelligence learn whether the review is favorable, unfavorable, or neither. [6]. In the
Lexicon-based technique, the polarity is determined by matching opinion terms from a
sentiment lexicon with the data. Following that, ratings are given according to the dictionary
terms' unfavorable or bad connotations to the view words [2]. The planned research would
examine patron perceptions of a restaurant's service.

1.1 Related Work

This section provides a brief overview of the sentiment analysis-related research. The topic of
sentiment analysis has seen a lot of research over the last ten years from a variety of
researchers. Initial implementations of sentiment analysis focused on multimodal
categorization, which provides feedback or opinions to a class bipolar like nice or bad. By
summing the grammatical direction of an adverb-and-adjective-containing sentence. The
category of reviews was predicted by averaging the semantic orientation of an adverb-and-
adjective-containing sentence.. The phrase's thumbs-up or thumbs-down rating has then been
determined. A Chinese document's sentiment was examined by Liu et al [8] using Base Line
and a support vector machine, this determines the overall orientation of the paper by a group
of particular phrases from a glossary of emotive words and modifying it in accordance with
the background knowledge Ramachandran and Gehringer [9] suggested preprocessing the
data to enhance the sentence's raw structure's quality. For sentiment analysis, they used the
cosine similarity and latent semantic analysis techniques. More than 300 papers were
surveyed by Pang and Lee [10], who covered the applications and typical difficulties
encountered during the many stages of sentiment analysis, including polarity identification,
sentiment categorization, and summarization. The four main issues, including Word
sentiment categorization, opinion extraction, document emotion categorization, and
subjective categorization, were covered by Tang et al. [11]. One of the intriguing research
fields for academics is restaurant patron reviews. For performance evaluation, Schrauwen
employs the Naive Bayes algorithm, Maximum Entropy, and Decision Tree classifier. Use
these classification methods by evaluating their Accuracy, Precision, F1 score, and Recall
[12]. Customers now use online reviews to learn more about the establishments they intend to
visit. Therefore, those reviews are crucial for consumers who wish to learn about the
immaterial qualities of things beforehand [13]. (Ma, Y., The conducts research on sentiment
analysis utilizing the probabilistic latent semantic analysis approach. They gather information based
on the brief review, not the entire comment. The study's findings revealed a 73% accuracy rate for the
data, which is supported by [14]. Utilizing classification algorithms like Support Vector Machine,
Random Forest, Decision Tree Algorithm, Naive Bayes approach, and confusion matrix, (Martin, S.,
2019) employ natural language. The performance of naive bayes is superior than other algorithms
among the classification models[15].

1.2 Objective of Project

 To create a model for prediction that can predict whether a review of

the restaurant will be favorable or unfavorable
 . We'll incorporate it into forecasting algorithms. Logistic Regression,
Multinomial Bayes, Nave Bayes, and Bernoulli Nave Bayes

2 Architecture and Modeling

The proposed work recommends the following procedures for analyzing restaurant patrons'
opinions based on a dataset of customer reviews. Fig. 1 displays the architecture diagram for
the suggested algorithm. The many steps are described in the following.

Database of Data Preparation of

Restaurant Pre-processing Bag of Words
Reviews

Segregation of
Training and
Predict Class Testing data
of a review Performance
using best Analysis of
classifier Classification
using Test data
Classification
using Training
data set

New set of
Reviews

Fig.1 Architectural diagram

Working principle :

The natural language processing (NLP) approach of sentiment analysis, commonly referred to
as opinion mining, is used to ascertain the sentiment or emotional tone of a document.
Sentiment analysis can be used to examine customer input in the context of restaurant
reviews in order to ascertain if the sentiment indicated in the review is positive, negative, or
neutral.
Here is a summary of how Python and machine learning are used to perform sentiment
analysis on restaurant reviews:

 Data Collection : Obtaining a dataset of restaurant reviews is the first stage in

the sentiment analysis process. Reviews should be included in this dataset,
along with any associated sentiment labels (such as positive, negative, or
neutral). You can develop your own dataset by scraping restaurant review
websites, or you can use one of the many publicly accessible datasets for
sentiment analysis.
 Data preprocessing : Data preparation is the next step after obtaining the
dataset. As part of this process, the text data must be cleaned, any extraneous
information removed (such as URLs and special characters), and the text
converted to a standard format (such as lowercase). Tokenization, the process
of breaking the text down into separate words or tokens, is another component
of preprocessing.
 Feature Extraction: Models for machine learning demand numerical features
as input. As a result, the text data must be transformed into a numerical form.
The Bag-of-Words (BoW) model, in which each review is represented as a
vector of word frequencies, is a popular method for feature extraction. The
Term Frequency-Inverse Document Frequency (TF-IDF) representation is
another widely used method that provides higher weight to words that are
significant in a specific review but uncommon across all reviews.
 Model Selection: After the features have been extracted, a machine learning
model needs to be chosen. Support Vector Machines (SVM), Naive Bayes,
Logistic Regression, and neural network-based models like Common models
for sentiment analysis include LSTM (Long Short-Term Memory) and BERT
(Bidirectional Encoding Representations from Transformers).
12

3 Data Collection:
The dataset for this study was created from comments made about various
restaurants .The data used in this study was prepared from foodpanda and other
restaurant sites in rangpur division .The dataset contains 600 reviews .It contains seven
columns .First column contains SL No , Restaurant Name , Third is reviewer, Fourth is
Location Name , Fifth is cuisine, Sixth is Ratting and seventh is sentiment .The reviews
are classified in two categories .They are positive and negative .The range of the
sentiment is (0-5).Here in my dataset positive reviews contains ratting 3-5 out of 5 and
negative reviews contains ratings 1 to 2. From the dataset there are total positive reviews
348 and negative reviews 252.After cleaning,214 small reviews are cleaned. Then total
reviews are 386 which is 190 are positive and 196 are negative reviews.

4 Prepping of Information
The dataset we used is excel. In this excel format, we have used the restaurant reviews to
train the model. Since all the algorithms we are working with are supervised learning, we
have classified the dataset beforehand to train our algorithm. We import the dataset using
Pandas in python .19
The most important phase in establishing a text's atmosphere is preprocessing. In our
approach, the preprocessing is broken down into three basic phases .Here first stage is to
remove the punctuation from the sentences. Special characters such as exclamations, quotes,
etc. are eliminated by creating a suitable pattern expression. The resulting data would consist
only of alphabetical characters.
The second step is to get rid of the stop-words. Stop-words are words in the English language
that are not used for emotion or sentiment but are used as links or articles. Examples of stop-
words include “and”, “with,” “of,” and “the”. NLP techniques such as Stop-words are found
and removed from the dataset using lexical examination, grammatical evaluation, semantic
evaluation, transparency integrating, and pragmatic evaluation. The semantic analysis step
usually deletes the “not” like “not.” However, when it comes to opinion mining, it’s not the
word not that matters. For instance, the review says that “Crust is not good”. By removing
stop-word “crust”, this sentence becomes “crust good” and a negative opinion becomes a
positive opinion. To prevent this from happening, we’ve changed the semantics The
evaluation stage in the NLP and we’ve to ensure that these stop-words aren’t getting
eliminated throughout. The third step is to calculate the sentiment of all data which I import
to the excel sheet .To do this work we have some python Library .Those Python library are
given bellow :-
13

 Flask==1.1.1
 gunicorn==19.9.0
 itsdangerous==1.1.0
 Jinja2==2.10.1
 MarkupSafe==1.1.1
 Werkzeug==0.15.5
 numpy>=1.9.2
 scipy>=0.15.1
 scikit-learn>=0.18
 matplotlib>=1.4.3
 pandas>=0.19

Now I will short description about those python library

 Flask 1.1.1: A well-liked Python web framework called Flask makes it simple and
requires little boilerplate code to create online applications. It is a simple and
adaptable framework that adheres to the WSGI (Web Server Gateway Interface)
standard and is frequently utilized to develop RESTful APIs and web services.
Pip, the Python package manager, can be used to install Flask. Run the following
command after opening your command-line interface:
Pip install Flask==1.1.1

 Gunicorn 19.9.0: A well-liked WSGI (Web Server Gateway Interface) HTTP server
called Gunicorn (Green Unicorn) is frequently used to deliver Python web
applications. Flask, Django, Pyramid, and other web frameworks are just a few of the
ones that it is made to operate well with. Gunicorn is a good option for hosting
production-ready web apps because of its simplicity, performance, and scalability.
The Python package manager, pip, is used to install Gunicorn. Run the following
command after opening your command-line interface:
Pip install gunicorn==19.9.0

 Itsdangerous 1.1.0: Several security-related functions are offered by the Python

library itsdangerous, which is primarily concerned with creating and confirming
cryptographically signed tokens. It is frequently used in web applications and
frameworks like Flask to guarantee the authenticity and integrity of data transmitted
between various application components.
Using the Python package manager, pip, you can install itsdangerous. Run the
following command after opening your command-line interface:
Pip install itsdangerous==1.1.0
14

 Jinja2==2.10.1: Python has a sophisticated and popular templating engine called

Jinja2. It is utilised in other web frameworks including Django and Bottle and is the
standard template engine for the Flask web framework. The logic and display of your
application can be separated with Jinja2, making it simpler to maintain and edit web
pages and other text-based documents.
Utilising pip, the Python package manager, you can install Jinja2. Run the following
command after opening your command-line interface:
Pip install jinja2==2.10.1

 MarkupSafe==1.1.1: A Python package called MarkupSafe offers tools for escaping

and formatting strings so that they can be used safely in markup languages like
HTML and XML. By appropriately escaping user-supplied data before rendering it in
the output, it is frequently used in conjunction with templating engines like Jinja2 to
prevent Cross-Site Scripting (XSS) attacks.
Using the Python package manager pip, you can install MarkupSafe. Run the
following command after opening your command-line interface:
Pip install Markupsafe==1.1.1

 Werkzeug==0.15.5: A complete Python WSGI (Web Server Gateway Interface)

utility library is called Werkzeug. It offers a collection of tools and utilities used
frequently in online development, making it simpler to manage URLs, handle HTTP
requests, and interact with other web-related ideas. The underlying library utilised by
the Flask web framework is called Werkzeug, and other WSGI apps can use it on their
own.
The Python package manager, pip, can be used to install Werkzeug. Run the following
command once your command-line interface is open:
Pip install Werkzeug==0.15.5

 numpy>=1.9.2: In scientific computing, data analysis, and machine learning, NumPy

is a potent Python toolkit for numerical computations. It supports multi-dimensional
arrays and matrices and offers a huge library of mathematical operations to effectively
work with these arrays.The following are some essential facts regarding NumPy,
specifically version 1.9.2 and versions up to and including 1.9.2.
Use pip, the Python package manager, to install NumPy with a version of 1.9.2 or
higher. Run the following command after opening your command-line interface:
Pip install numpy>=1.9.2

 scipy>=0.15.1: SciPy is an uncommercial Python module computing in the scientific

and technical fields. It builds on NumPy and offers a large number of functions for
diverse scientific applications, including signal processing, linear algebra, statistics,
optimisation, integration, and interpolation. For complex scientific computations,
15

SciPy is frequently used in disciplines including physics, engineering, biology, and

data science. Use pip, the Python package manager, to install SciPy with a version
equal to or higher than 0.15.1. Run the following command after opening your
command-line interface:
pip install scipy>=0.15.1

 scikit-learn>=0.18: A well-known open-source machine learning library for Python

is called scikit-learn, which is frequently abbreviated as sklearn. It is based on
existing libraries like NumPy, SciPy, and matplotlib and offers straightforward and
effective tools for data mining and data analysis. Machine learning professionals
frequently utilize scikit-learn, which provides a number of methods for classification,
regression, clustering, dimensionality reduction, and other tasks. Use pip, the Python
package manager, to install scikit-learn with a version equal to or greater than 0.18.
Run the following command after opening your command-line interface:
pip install scikit-learn>=0.18

 matplotlib>=1.4.3: A popular Python package for producing static, interactive, and

animated data visualisations is called Matplotlib. It is a crucial tool for data
visualisation and analysis because it offers a flexible and user-friendly interface for
creating high-quality plots, charts, and figures.
Use pip, the Python package manager, to install Matplotlib with a version equal to or
greater than 1.4.3. Run the following command after opening your command-line
interface:
Pip install matplotlib>=1.4.3

 pandas>=0.19: A potent Python package for data analysis and manipulation is called
pandas. For handling and analyzing structured data, it offers data structures like Series
(1-dimensional labelled arrays) and Data Frame (2-dimensional labelled data tables).
Pandas is frequently used for data preprocessing, exploration, and cleaning activities
in data science, machine learning, finance, and numerous other fields.
Use pip, the Python package manager, to install pandas with a version equal to or
higher than 0.19. Run the following command after opening your command-line
interface:
Pip install pandas>=0.19
16

The preprocessing stages are a crucial step in obtaining clear and unambiguous information
so that the results of order assurance can be more precisely predicted in the future

 Case Folding: A typical preprocessing step in natural language processing

applications like sentiment analysis is case folding, sometimes referred to as text
normalisation or lowercase conversion. Since capitalization rarely affects the
semantic meaning of words, it entails changing all text to lowercase to achieve
uniformity and consistency.Using the lower() method of the string object, case
folding is simple to do in Python.
I. Add the required libraries:
import re

II. Create a function to handle case folding and any optional further text
preprocessing. After removing any non-alphanumeric characters with a
regular expression, the text will be changed to lowercase.

def preprocess_text(text):
# Remove non-alphanumeric characters and replace with spaces
processed_text = re.sub(r'[^a-zA-Z0-9\s]', ' ', text)
# Convert to lowercase
processed_text = processed_text.lower()
return processed_text

Example: Input function

# Example text for sentiment analysis
text = "This is an EXAMPLE sentence with mixed CASES."

# Preprocess the text using case folding

preprocessed_text = preprocess_text(text)

# Output the preprocessed text

print(preprocessed_text)
17

Output
this is an example sentence with mixed cases

 Symbol Removal: is the stage in which punctuation (period (! ), comma (,), question
mark (? ), exclamation point (!) and other symbols are applied, as well as explicit
characters (&,%, $, #, @ and other symbols) and numbers (0,1,2... to 9).

Example:
Input Function
# Example text with symbols
text_with_symbols = "Hello, this is an example text with some !@#$%^&*()_+
symbols."

# Remove symbols from the text

cleaned_text = remove_symbols(text_with_symbols)

# Output the cleaned text

print(cleaned_text)

Output
Hello this is an example text with some symbols

 StopWords: Common words that regularly appear in a language are known as

stopwords, and they are typically thought to have minimal semantic significance. The
terms "the," "is," "and," "in," "of," and "a," for instance, are stopwords in English.
These stopwords are frequently eliminated in activities involving natural language
processing, such as text categorization or sentiment analysis, to increase the
effectiveness and precision of the analysis.

Python has a number of modules that offer lists of words to avoid in many languages.
The Natural Language Toolkit (NLTK) is one of the most widely used libraries for
NLP activities. If you haven't previously, install the NLTK library before using
stopwords in Python.
Install NLTK library using this
pip install nltk

The StopWords data for the English language after installing NLTK:

import nltk
nltk.download('stopwords')

Example:
18

Input
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download stopwords data for English

nltk.download('stopwords')
nltk.download('punkt')

def remove_stopwords(sentence):
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(sentence)
filtered_sentence = [word for word in word_tokens if word.lower() not in
stop_words]
return ' '.join(filtered_sentence)

# Example sentence with stopwords

sentence = "This is an example sentence with some common stopwords."

# Remove stopwords from the sentence

cleaned_sentence = remove_stopwords(sentence)

# Output the cleaned sentence

print(cleaned_sentence)

Output
example sentence common stopwords .

 Data cleaning: In the pipeline of data preprocessing, data cleansing is a vital stage.
The process of identifying and correcting defects, contradictions, and errors in a data
set is necessary to enhance its quality and get it ready for additional research.
I. Treatment of Missing Values:
Datasets frequently have missing values, which can cause issues during
analysis. Using pandas, a well-liked Python data manipulation toolkit, you
can deal with missing values.
Example:
import pandas as pd

# Read the dataset

df = pd.read_csv('your_dataset.csv')

# Check for missing values

print(df.isnull().sum())

# Drop rows with missing values

df = df.dropna()
19

# Fill missing values with a specific value

df['column_name'] = df['column_name'].fillna(value)
II. Removing Duplicats: Results of analyses may be distorted by duplicate
rows. The drop_duplicates() method in pandas can be used to eliminate
duplicates.

# Remove duplicates
df = df.drop_duplicates()
III. Data Type Conversion: For analysis, make sure columns have the
appropriate data types. You can change the data type using pandas.
# Convert a column to a numeric type
df['column_name'] = pd.to_numeric(df['column_name'])

# Convert a column to a date type

df['date_column'] = pd.to_datetime(df['date_column'])
IV. Outlier Detection and Handling: Statistics can be affected by outliers.
Using several methodologies, such as the Z-score or IQR (Interquartile
Range), you can locate outliers and treat them.

# Detect outliers using Z-score

from scipy import stats
z_scores = np.abs(stats.zscore(df['column_name']))
outliers = df['column_name'][z_scores > threshold]

# Remove outliers
df = df[~df['column_name'].isin(outliers)]

V. Text Cleaning: You can lowercase, remove punctuation, and stopword text
data using various methods.

import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def clean_text(text):
text = text.lower()
text = re.sub(r'[^\w\s]', '', text)
stop_words = set(stopwords.words('english'))
words = word_tokenize(text)
cleaned_words = [word for word in words if word not in stop_words]
return ' '.join(cleaned_words)

df['text_column'] = df['text_column'].apply(clean_text)

VI. Feature Scaling: You may want to use feature scaling to bring numerical
features that are on various scales into a range that is similar.

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df[['numerical_column1', 'numerical_column2']] =
scaler.fit_transform(df[['numerical_column1', 'numerical_column2']])

5 ALGORITHM and MODEL OF DATA ANALYSIS

Now I will short description all of those Algorithm

SVM Algorithm
One effective machine learning approach for sentiment analysis is called Support Vector
Machines (SVM). Finding the sentiment or attitude communicated in a text (such as whether
it is favourable, negative, or neutral) is the aim of sentiment analysis. Based on the features
that have been taken out of the text, SVM can be used to categorize text data into various
sentiment groups.
The following actions to perform sentiment analysis in Python using SVM:
1. Data preprocessing: Cleanse and preprocess the text data to prepare the dataset. This
includes operations like erasing punctuation, changing the text's case to lowercase,
and eliminating stop words.
2. Feature Extraction : Convert the preprocessed text data into numerical features that
SVM may use with feature extraction. Term Frequency-Inverse Document Frequency
(TF-IDF) representation is one such technique.
3. Training the SVM Model: Split your dataset into a training set and a testing set
before starting to train the SVM model. The SVM model should next be trained using
the training set using the retrieved features.
4. Evaluating The Model: Model Evaluation: Using the testing set, assess the trained
SVM model's performance.

Here is a sample Python program that uses the scikit-learn module and the TF-IDF
representation:

# Importing necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
# Sample dataset (replace this with your own dataset)
21

data = pd.DataFrame({
'text': ['I love this product!', 'This is terrible.', 'It is okay.'],
'sentiment': ['positive', 'negative', 'neutral']
})
# Data preprocessing (optional, you can add more steps based on
your needs)
data['text'] = data['text'].str.lower()

# Splitting the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(data['text'],
data['sentiment'], test_size=0.2, random_state=42)

# TF-IDF vectorization
tfidf_vectorizer = TfidfVectorizer()
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

# Creating and training the SVM model

svm_model = SVC(kernel='linear')
# You can also try different kernels like 'rbf'
svm_model.fit(X_train_tfidf, y_train)

# Making predictions on the test set

predictions = svm_model.predict(X_test_tfidf)

# Evaluating the model

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

# Classification report (includes precision, recall, F1-score, etc.)

print(classification_report(y_test, predictions))

Naïve Bayes

Another well-liked machine learning algorithm frequently used for sentiment analysis
jobs is naive bayes. The Bayes theorem-based probabilistic technique is effective with text
data and high-dimensional feature spaces.
Similar to the SVM method, you can use Naive Bayes to perform sentiment analysis
in Python.
1. Data Pre-Processing: Similar to the last example, prepare the dataset by
cleaning and preparing the text data.
2. Feature Extraction: Create numerical characteristics from the text data that
has been preprocessed. Similar to the SVM method, we can utilise the Bag-of-
Words or TF-IDF representation for Naive Bayes.
22

3. Training the Naïve Bayes Model: Train the Naive Bayes model using the
features that were retrieved after dividing the dataset into a training set and a
testing set.
4. Evaluating the Model: Utilizing the testing set, assess the trained Naive
Bayes model's performance.

Here is an example Python program that uses the TF-IDF format and the
scikit-learn library:

# Importing necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Sample dataset (replace this with your own dataset)

data = pd.DataFrame({
'text': ['I love this product!', 'This is terrible.', 'It is okay.'],
'sentiment': ['positive', 'negative', 'neutral']
})

# Data preprocessing (optional, you can add more steps based on your needs)
data['text'] = data['text'].str.lower()

# Splitting the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(data['text'], data['sentiment'],
test_size=0.2, random_state=42)

# TF-IDF vectorization
tfidf_vectorizer = TfidfVectorizer()
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

# Creating and training the Naive Bayes model

naive_bayes_model = MultinomialNB()
naive_bayes_model.fit(X_train_tfidf, y_train)

# Making predictions on the test set

predictions = naive_bayes_model.predict(X_test_tfidf)

# Evaluating the model

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
23

# Classification report (includes precision, recall, F1-score, etc.)

print(classification_report(y_test, predictions))

Logistic Regression

Another well-liked approach for sentiment analysis jobs is logistic regression. It is an

effective binary classification system for handling text data.
I'll show you how to use Python's scikit-learn module to perform sentiment analysis using
Logistic Regression in this example. As with the prior cases, we'll extract features using the
TF-IDF model.
Example:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Sample dataset (replace this with your own dataset)

data = pd.DataFrame({
'text': ['I love this product!', 'This is terrible.', 'It is okay.'],
'sentiment': ['positive', 'negative', 'neutral']
})

# Data preprocessing (optional, you can add more steps based on your needs)
data['text'] = data['text'].str.lower()

# Splitting the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(data['text'], data['sentiment'], test_size=0.2,
random_state=42)

# TF-IDF vectorization
24

tfidf_vectorizer = TfidfVectorizer()
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

# Creating and training the Logistic Regression model

logistic_regression_model = LogisticRegression()
logistic_regression_model.fit(X_train_tfidf, y_train)

# Making predictions on the test set

predictions = logistic_regression_model.predict(X_test_tfidf)

# Evaluating the model

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

# Classification report (includes precision, recall, F1-score, etc.)

print(classification_report(y_test, predictions))

Although we've just used a tiny sample dataset in this example, you should utilise a larger
dataset to improve the performance of your model. To increase the model's accuracy, you
might also wish to experiment with other preprocessing methods and hyperparameter
adjustment.

LTSM

It is effective to use Long Short-Term Memory (LSTM) for sentiment analysis, particularly
when working with Textual information that is sequential. RNNs (recurrent neural
networks)of the network of long short-term memories variety are particularly good at
capturing long-term dependencies in sequences.
I'll show you how to use the Keras library, a high-level neural networks API built on top of
TensorFlow, to do sentiment analysis using LSTM in Python in this example.

Example:
# Importing necessary libraries
import pandas as pd
from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Sample dataset (replace this with your own dataset)

data = pd.DataFrame({
'text': ['I love this product!', 'This is terrible.', 'It is okay.'],
'sentiment': ['positive', 'negative', 'neutral']
})

# Data preprocessing (optional, you can add more steps based on your needs)
data['text'] = data['text'].str.lower()

# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data['text'])
vocab_size = len(tokenizer.word_index) + 1
X = tokenizer.texts_to_sequences(data['text'])
X = pad_sequences(X)

# Splitting the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, data['sentiment'], test_size=0.2,
random_state=42)

# Creating the LSTM model

embedding_dim = 100
max_length = X.shape[1]

model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim,
input_length=max_length))
model.add(LSTM(128))
model.add(Dense(1, activation='sigmoid'))

# Compiling the model

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Training the model

model.fit(X_train, y_train, epochs=10, batch_size=32)

# Evaluating the model

y_pred = model.predict_classes(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Classification report (includes precision, recall, F1-score, etc.)

print(classification_report(y_test, y_pred))
26

Although we've just used a tiny sample dataset in this example, you should utilise a larger
dataset to improve the performance of your model. The text data is transformed into
numerical sequences using the tokenizer, and all of the sequences are made to be the same
length using pad_sequences before being fed into the LSTM.

BERT

Powerful pre-trained language model called BERT (Bidirectional Encoder Representations

from Transformers) was created by Google. Sentiment analysis is just one of the many
natural language processing applications that have seen widespread use. You can use the
Hugging Face transformers library, which offers simple access to pre-trained transformer
models like BERT, to use BERT for sentiment analysis in Python.
Here is a step-by-step tutorial on how to use Python's BERT for sentiment analysis:

1. Install required libraries:

pip install transformers

pip install torch

2. Import Necessary Library:

pip install transformers
pip install torch

3. Lode the pre-trained BERT model and tokenizer:

# Load BERT pre-trained model and tokenizer

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
27

4. Performance the data and tokenize the input sentences:

# Sample dataset (replace this with your own dataset)

data = [
{"text": "I love this product!", "sentiment": "positive"},
{"text": "This is terrible.", "sentiment": "negative"},
{"text": "It is okay.", "sentiment": "neutral"}
]

# Tokenize input sentences

sentences = [item["text"] for item in data]
labels = [item["sentiment"] for item in data]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

5. Make predictions using BERT:

# Make predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = softmax(logits, dim=1)
predicted_labels = torch.argmax(probabilities, dim=1)

6. Evaluate the models predictions:

# Convert predicted labels to sentiment labels

id2label = {0: "negative", 1: "neutral", 2: "positive"}
predicted_sentiments = [id2label[label.item()] for label in predicted_labels]

# Evaluate the model's predictions

for i, item in enumerate(data):

print(f"Text: {item['text']}, Predicted Sentiment: {predicted_sentiments[i]}, True Sentiment:

{item['sentiment']}")

We utilized a tiny sample dataset for this example, however for better model performance,
you should use your own larger dataset. Running BERT models on a CPU could be laborious
because they require a lot of processing. If a GPU is available, think about using it for
improved performance.
28

 Model Training : The preprocessed and feature-extracted data are then used
to train the chosen model. A training set and a testing set are created from the
dataset in order to assess the effectiveness of the model.
 Model Evaluation: The model is assessed using the testing set once it has
been trained to determine its accuracy and other performance measures
including precision, recall, and F1-score. How well the model can predict
sentiment on unobserved data is determined by the evaluation.
 Sentiment Prediction : Once trained and assessed, the model can be used to
anticipate the tone of fresh restaurant evaluations. The model outputs a
sentiment label (such as positive, negative, or neutral) from the input text data
that has been preprocessed and feature extracted.
 Deployment : Finally, the sentiment analysis model that has been trained can
be used to analyse sentiment instantly. It may be incorporated into restaurant
management systems to track patron comments and offer priceless information
for enhancing the patron experience.

6 METHODOLOGY
Data collection and preprocessing, model training, and model evaluation are all steps in the
approach for Using Python Language for Sentiment Analysis of Restaurant Reviews . An instruction
manual for performing sentiment analysis on restaurant reviews is provided below:

Data Collection

Data Processing

Data Labeling

Data Splitting

Feature Extraction

Model Evaluation
29

Hyperparameter

Sentiment Prediction

Deployment

7 Classification
After dividing the dataset into its component parts, we teach the algorithm how to
classify the data by feeding it training data. Numerous classification techniques have been
used, including Naive Bayes, the decision trees, random forests, and a classifier using the
support vector algorithm (SVM). The conditional probability model is the foundation of the
Naive Bayes method. Sentiment Analysis of Restaurant Reviews The Naive Bayes classifier
assumes feature freedom and gives the probability as though the data set to be classified is
expressed as a vector x (X1,..,..Xn) of n distinct features.
p(Ck|x1... xn) =p(Ck) p(xi |Ck)
Here, C k represents K th class name
The decision tree classifier uses a number of conditions and questions to create a tree
structure in which the leaf nodes correspond to the necessary classifications. Entropy is
calculated in order to choose between the tree's roots.
H= ¿−∑ p ( x ) logp ( x)
In order to categorize the provided data, the SVM classifier creates a
→
hyperplane between the set of points x as
w x−b=0
Here, w is the normal vector to the hyperplane.
Each classification algorithm has advantages and disadvantages, and the nature of the
dataset affects how well it performs. Every Set of training data for the method to test set
ratio is changed and tested To raise efficiency.

8 Multinomial Naïve Bayes

The text mining industry's most popular classification technique is this one. Natural
Language Processing (NLP) uses it frequently because of its excellent performance. The
algorithm is supported by the Bayes theorem. utilizing naive Bayes principles for a text
evaluation and class (M) and (N). When N is the supplied instance to be identified and M
is the class of potential outcomes, the Bayes theorem calculates the probability P(M|N),
where M is the class of potential outcomes and N is the supplied instance.
The formula is given below:
P(M|N) = P(M) * P(N|M)/P(N)
Where,
P(N) = prior probability of N
P(M) = prior probability of class M
P(N|M) = occurrence of predictor N given class M probability
Random Forest: A classifier called Random Forest uses a variety of decision trees stored on
various subsets of a dataset to increase the predicted accuracy of the dataset. An assortment of
various decision trees with a single initial contrast make up a random forest. perhaps then selecting
the most advantageous divisor from the full list of elements, the calculation selects a random subset
of the variables.
Decision Tree: Decision trees, a kind of classification method, are a part of the supervised learning
technique. A decision tree uses both internal and external nodes to make decisions. The objective of
a decision tree is to characterise an item by creating a set of valid/false articulations. Entropy for
several qualities is represented mathematically as:

E(M,X) = ⅀ P(c) E(c) c€X

Support Vector Machine: A supervised machine learning approach called support vector
machines can be applied to classification or regression issues. Support vector regression
(SVR), which is an extension of support vector classification (SVC), is one example of a
specific sort of SVM that can be used for different machine learning applications.

9 Achievement Rating Assessment:

The efficacy of any algorithm for classification is determined by passing the test dataset
to a model that has been trained.. Through that process, we can see how well an the
dataset and can predict the types of data that come in because the algorithm has adjusted
to it. The number of False Positive (FP), True Positive (TP), and False Negative (FN),
True Negative (TN), values produced by the provided dataset is contained in the
confusion matrix that we create. The equations The incorrect acceptance rates (FAR) and
incorrect rejection rates (FRR) should be given.

FT FN
FAT = ∧FRR =
FT + TN FN +TP

Utilize this formula to figure out its precision.

TP+TN
Accuracy=
TP+TN + FP+ FN

a sentiment analytics system's efficiency agrees with human judgements determines

how accurate it is. For future prediction, the appropriate algorithm is employed based
on the value of Accuracy.

We can calculate the precision, recall, and f1-support of the performance metric
evaluation for such algorithms using the following equation.
a) Precision: The word "precision" refers to a high predictive value. To determine
precision, apply the following equation.

Precision = TP/ (FP + TP)

High precision indicates that the algorithm is doing properly

b) Recall: Recall is the proportion of favorable evaluations among all positive

reviews that are accurately categorized. Recall can be calculated using the equation
below.

Recall = TP / (TP + FN)

c) F1-source: The f1-score of each technique must be determined in order to select a

particular learning algorithm from a large selection of algorithms. The f1-score can be
calculated using the following equation.
f1 -score = (2 * precision * recall) / (precision + recall)

10 Predicting a Class
The chosen algorithm can be utilized for predicting the class of a fresh dataset. when it is
received. The machine can generate the most appropriate class because it has already learned
the characteristics of the dataset. Because we used reviews of restaurants when a new client
submits a review, it is added to our dataset and used as part of the algorithm, This decides
whether the evaluation of the restaurant is favorable or bad.

11 Implementation and Performance Analysis

The accuracy,Precision,Recall,F1 Score of LR( Logistic Regression model),

DT(Decision Tree Model), RF( Random Forest Model) , MNB( Multinomial Naïve
model) , KNN ( K- Nearest Neighbors model) , Linear SVM ( Linear Support Vector
Machine model ) , SGD ( Stochastic Gradient Descent model ) of are given a table which
I calculate of those models :

Dataset Summary:

Class Name: negative

Number of Documents:196
Number of Words:2124
Number of characters:18534
Number of Average length :8.725988700564972
Number of Unique Words:742

Most frequent Words:

Was 117
I 56
Not 58
Food 52
Very 38
Good 34
Bad 31
But 29
They 29
Chicken 27

Class Name : positive

Number of Documents:190
Number of Words:1598
Number of characters:14353
Number of Average length:8.981852315394242
Number of Unique Words:550

Most frequent Words:

Was 123
Good 85
Food 49
Very 33
But 32
I 27
So 26
To 22
Quality 21
not 21

Dataset Summary Visualization

Class Names Category Values

0 negative Total Documents 196
1 Positive Total Documents 190
2 negative Total Words 2124
3 positive Total Words 1598
4 negative Unique Word 742
5 positive Unique Word 550

The graphs of that table are given bellow :

Performance table for Unigram :

Accuracy Precision Recall F1 Score

76.840000
LR 76.920000 76.980000 76.860000

DT 76.920000 76.840000 76.980000 76.860000

RF 76.920000 76.840000 76.980000 76.860000

MNB 76.920000 76.840000 76.980000 76.860000

KNN 76.920000 76.840000 76.980000 76.860000

Linear SVM 76.920000 76.840000 76.980000 76.860000

RBF SVM 76.920000 76.840000 76.980000 76.860000

SGD 76.920000 76.840000 76.980000 76.860000

The ROC curve Analysis for unigram features are given bellow

Precision-Recall Curve of those values are given bellow :

Performance Table for Bigram:

Accuracy Precision Recall F1 Score

74.24
LR 73.08 73.81 73.04

DT 73.08 74.24 73.81 73.04

RF 73.08 74.24 73.81 73.04

MNB 73.08 74.24 73.81 73.04

KNN 73.08 74.24 73.81 73.04

Linear SVM 73.08 74.24 73.81 73.04

RBF SVM 73.08 74.24 73.81 73.04

SGD 73.08 74.24 73.81 73.04

The ROC curve Analysis for Bigram features are given bellow:
38

Precision-Recall Curve of those values are given bellow :

Performance Table for Tri-gram:

Accuracy Precision Recall F1 Score

79.49
LR 78.21 78.97 78.17

DT 78.21 79.49 78.97 78.17

RF 78.21 79.49 78.97 78.17

MNB 78.21 79.49 78.97 78.17

KNN 78.21 79.49 78.97 78.17

Linear SVM 78.21 79.49 78.97 78.17

RBF SVM 78.21 79.49 78.97 78.17

SGD 78.21 79.49 78.97 78.17

The ROC curve Analysis for Tri-gram features are given bellow:
40

Precision-Recall Curve of those values are given bellow :

12 Conclusion

In this study, we test the effectiveness of various algorithms on a dataset of restaurant

reviews and examine the algorithm that performs the best. There are three features.
They are Unigram, Bigram, Trigram. Those accuracy given bellow
For Unigram:
Highest Accuracy achieved by LR at = 76.92
Highest F1-Score achieved by LR at = 76.86
Highest Precision Score achieved by LR at = 76.84
Highest Recall Score achieved by LR at = 76.98

For Bigram:
Highest Accuracy achieved by LR at = 73.08
Highest F1-Score achieved by LR at = 73.04
Highest Precision Score achieved by LR at = 74.24
Highest Recall Score achieved by LR at = 73.81

For Trigram:

Highest Accuracy achieved by LR at = 78.2100000000001

Highest F1-Score achieved by LR at = 78.17
Highest Precision Score achieved by LR at = 79.490000000001
Highest Recall Score achieved by LR at = 78.97
From those data, we can say that the highest accuracy of those three features are Trigram
(78.21%).If I selected the Stochastic Gradient Descent model ,then we get the accuracy is
(69.23%).By using this model we can calculate sentiment of any review which is positive or
negative .This model also give the probability of positive or negative sentiment.
42

11.References

[1] K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analysis: Tasks,
approaches and applications,” Knowledge-Based Syst., vol. 89, pp. 14–46, Nov. 2015,
doi: 10.1016/j.knosys.2015.06.015.
[2] V. A. and S. S. Sonawane, “Sentiment Analysis of Twitter Data: A Survey of
Techniques,” Int. J. Comput. Appl., vol. 139, no. 11, pp. 5–15, Apr. 2016, doi:
10.5120/ijca2016908625.
[3] S. Schrauwen, “Machine Learning Approaches To Sentiment Analysis Using the
Dutch Netlog Corpus,” 2010.
[4] M. S. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning
techniques,” in 2013 Fourth International Conference on Computing,
Communications and Networking Technologies (ICCCNT), IEEE, Jul. 2013, pp. 1–5.
doi: 10.1109/ICCCNT.2013.6726818.
[5] A. P. Jain and P. Dandannavar, “Application of machine learning techniques to
sentiment analysis,” in 2016 2nd International Conference on Applied and Theoretical
Computing and Communication Technology (iCATccT), IEEE, 2016, pp. 628–632.
doi: 10.1109/ICATCCT.2016.7912076.
[6] G. Gautam and D. Yadav, “Sentiment analysis of twitter data using machine learning
approaches and semantic analysis,” in 2014 Seventh International Conference on
Contemporary Computing (IC3), IEEE, Aug. 2014, pp. 437–442. doi:
10.1109/IC3.2014.6897213.
[7] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and
applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, Dec. 2014,
doi: 10.1016/j.asej.2014.04.011.
[8] R. Liu, R. Xiong, and L. Song, “A sentiment classification method for Chinese
document,” in 2010 5th International Conference on Computer Science & Education,
IEEE, Aug. 2010, pp. 918–922. doi: 10.1109/ICCSE.2010.5593462.
[9] L. Ramachandran and E. F. Gehringer, “Automated Assessment of Review Quality
Using Latent Semantic Analysis,” in 2011 IEEE 11th International Conference on
Advanced Learning Technologies, IEEE, Jul. 2011, pp. 136–138. doi:
10.1109/ICALT.2011.46.
[10] B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Found. Trends® Inf.
43

Retr., vol. 2, no. 1–2, pp. 1–135, 2008, doi: 10.1561/1500000011.

[11] H. Tang, S. Tan, and X. Cheng, “A survey on sentiment detection of reviews,” Expert
Syst. Appl., vol. 36, no. 7, pp. 10760–10773, Sep. 2009, doi:
10.1016/j.eswa.2009.02.063.
[12] O. Sharif, M. M. Hoque, and E. Hossain, “Sentiment Analysis of Bengali Texts on
Online Restaurant Reviews Using Multinomial Naïve Bayes,” in 2019 1st
International Conference on Advances in Science, Engineering and Robotics
Technology (ICASERT), IEEE, May 2019, pp. 1–6. doi:
10.1109/ICASERT.2019.8934655.
[13] S. C, D. P. Ravikumar, and M. A. M.J, “Sentiment Analysis of Customer Feedback on
Restaurant Reviews,” SSRN Electron. J., 2019, doi: 10.2139/ssrn.3506637.
[14] M. Adnan, R. Sarno, and K. R. Sungkono, “Sentiment Analysis of Restaurant Review
with Classification Approach in the Decision Tree-J48 Algorithm,” in 2019
International Seminar on Application for Technology of Information and
Communication (iSemantic), IEEE, Sep. 2019, pp. 121–126. doi:
10.1109/ISEMANTIC.2019.8884282.
[15] G. Ganu, Y. Kakodkar, and A. Marian, “Improving the quality of predictions using
textual information in online user reviews,” Inf. Syst., vol. 38, no. 1, pp. 1–15, Mar.
2013, doi: 10.1016/j.is.2012.03.001.

Data Analytics Using Python
100% (1)
Data Analytics Using Python
982 pages
2024 Generative AI Red Teaming Transparency Report
No ratings yet
2024 Generative AI Red Teaming Transparency Report
41 pages
7-Harnessing The Powerofbrandsocialmedia Marketing Onconsumeronlineimpulse Buying Intentions, A Stimulus-Organismresponse Framework
No ratings yet
7-Harnessing The Powerofbrandsocialmedia Marketing Onconsumeronlineimpulse Buying Intentions, A Stimulus-Organismresponse Framework
12 pages
Kumpulan Kuis Kuis
100% (1)
Kumpulan Kuis Kuis
24 pages
Benchmarking Arabic AI With Large Language Models
No ratings yet
Benchmarking Arabic AI With Large Language Models
30 pages
W3 Pricing
0% (1)
W3 Pricing
4 pages
Simulation Exercises-1
No ratings yet
Simulation Exercises-1
2 pages
Inventory Numerical
No ratings yet
Inventory Numerical
2 pages
Cocacola Project
No ratings yet
Cocacola Project
21 pages
ITM Case
No ratings yet
ITM Case
10 pages
Introduction To Simulation
100% (1)
Introduction To Simulation
24 pages
Simulasi Layanan Spbu Dengan Gas Pump Station Tunggal: Modeling and Simulation
No ratings yet
Simulasi Layanan Spbu Dengan Gas Pump Station Tunggal: Modeling and Simulation
2 pages
Model Paper
No ratings yet
Model Paper
160 pages
3 - Introduction To Integer Programming
No ratings yet
3 - Introduction To Integer Programming
20 pages
An Integrated Fuzzy AHP and TOPSIS Appro PDF
No ratings yet
An Integrated Fuzzy AHP and TOPSIS Appro PDF
25 pages
Jurnal Model Spiral
No ratings yet
Jurnal Model Spiral
11 pages
Lecture 6 - Pricing Decision
No ratings yet
Lecture 6 - Pricing Decision
30 pages
House of Risk A Model For Proactive Supply Chain R
No ratings yet
House of Risk A Model For Proactive Supply Chain R
16 pages
Simulasi Antrian Pada Super Market Menggunakan Software Arena
No ratings yet
Simulasi Antrian Pada Super Market Menggunakan Software Arena
6 pages
Erp and Ecommerce PDF
No ratings yet
Erp and Ecommerce PDF
2 pages
Clean Sweep v3
No ratings yet
Clean Sweep v3
19 pages
Mamdani Sugeno
100% (2)
Mamdani Sugeno
34 pages
Introduction To ERP
100% (1)
Introduction To ERP
76 pages
Numpy User
No ratings yet
Numpy User
502 pages
Chapter - 9 Presentation For CRM Ma-347199418
No ratings yet
Chapter - 9 Presentation For CRM Ma-347199418
15 pages
Mohammad Arief Hidayat
No ratings yet
Mohammad Arief Hidayat
34 pages
Artikel 10 147 154 Analisis Sentimen Review Penjualan Produk Umkm Pada Kabupaten Nias Dengan Komparasi Algoritma Klasifikasi Machine Learning
No ratings yet
Artikel 10 147 154 Analisis Sentimen Review Penjualan Produk Umkm Pada Kabupaten Nias Dengan Komparasi Algoritma Klasifikasi Machine Learning
8 pages
Rawabdeh WasteModel
100% (1)
Rawabdeh WasteModel
23 pages
Laporan E-Commerce
No ratings yet
Laporan E-Commerce
7 pages
Compare Data Mining Tools
No ratings yet
Compare Data Mining Tools
11 pages
Application of Data Mining Predict Employee Competency at PT. ABC
No ratings yet
Application of Data Mining Predict Employee Competency at PT. ABC
13 pages
Pengujian Black Box Pada Aplikasi Perpustakaan Ber
No ratings yet
Pengujian Black Box Pada Aplikasi Perpustakaan Ber
5 pages
Models For Optimization of Supply Chain
100% (1)
Models For Optimization of Supply Chain
17 pages
Internship Final Report: PT Mandiri Manajemen Investasi
No ratings yet
Internship Final Report: PT Mandiri Manajemen Investasi
12 pages
Makalah PM
No ratings yet
Makalah PM
28 pages
Performance Prism 200302 - 14
No ratings yet
Performance Prism 200302 - 14
4 pages
Project Size Estimation
No ratings yet
Project Size Estimation
27 pages
13 Agent Based Modeling - TEORI
100% (1)
13 Agent Based Modeling - TEORI
41 pages
Review
No ratings yet
Review
2 pages
Specific HR Practices and Employee Commitment - The Mediating Role of Job Satisfaction.
No ratings yet
Specific HR Practices and Employee Commitment - The Mediating Role of Job Satisfaction.
17 pages
Tugas Mpti Project Charter
100% (1)
Tugas Mpti Project Charter
12 pages
LAB Task4
0% (1)
LAB Task4
2 pages
Abstrak Pak Iskandar
No ratings yet
Abstrak Pak Iskandar
14 pages
Erp and Supply Chain Management PDF
50% (2)
Erp and Supply Chain Management PDF
2 pages
Pemodelan Sistem 1 Rev
No ratings yet
Pemodelan Sistem 1 Rev
185 pages
MHP Kelompok 3
No ratings yet
MHP Kelompok 3
28 pages
SW-KP-YY-AxxG1 (1) (1) (Repaired)
No ratings yet
SW-KP-YY-AxxG1 (1) (1) (Repaired)
80 pages
R Pivot Table
No ratings yet
R Pivot Table
7 pages
Erp in Textile Industry PDF
100% (1)
Erp in Textile Industry PDF
2 pages
Role of Information Technology in Supply Chain Management
No ratings yet
Role of Information Technology in Supply Chain Management
15 pages
Canon Production System (CPS)
100% (1)
Canon Production System (CPS)
28 pages
Modul 7 Praktikum Machine Learning Python
No ratings yet
Modul 7 Praktikum Machine Learning Python
32 pages
HRM Unit-1 PDF
No ratings yet
HRM Unit-1 PDF
13 pages
Building Valid and Credible Simulation Models: Ref: Law & Kelton, Chapter 5
100% (2)
Building Valid and Credible Simulation Models: Ref: Law & Kelton, Chapter 5
17 pages
Penerapan ERP Pada Perusahaan NIKE
No ratings yet
Penerapan ERP Pada Perusahaan NIKE
18 pages
Kisi-Kisi Uts Database Binus
No ratings yet
Kisi-Kisi Uts Database Binus
3 pages
Ch-3, SCM Drivers and Metrics
100% (1)
Ch-3, SCM Drivers and Metrics
42 pages
Capacity Requirement Planning
No ratings yet
Capacity Requirement Planning
19 pages
Tugas Manajemen Rantai Pasok
No ratings yet
Tugas Manajemen Rantai Pasok
4 pages
System Modeling
100% (1)
System Modeling
37 pages
Proposal Skripsi - 1
No ratings yet
Proposal Skripsi - 1
37 pages
System usability scale Second Edition
From Everand
System usability scale Second Edition
Gerardus Blokdyk
No ratings yet
Mini 2
No ratings yet
Mini 2
34 pages
Black Book Final 2
No ratings yet
Black Book Final 2
40 pages
Detection of Cyberbullying Incidents On Instagram Social Network
No ratings yet
Detection of Cyberbullying Incidents On Instagram Social Network
8 pages
Sentiment Analysis Wikipedia
No ratings yet
Sentiment Analysis Wikipedia
6 pages
Multimodal Sentiment Analysis A Survey
No ratings yet
Multimodal Sentiment Analysis A Survey
11 pages
SDE Tehsin Bhati
No ratings yet
SDE Tehsin Bhati
2 pages
Intelligent Negotiation Bot Using Machine Learning Techniques
No ratings yet
Intelligent Negotiation Bot Using Machine Learning Techniques
6 pages
Toxic Language Identification Via Audio Using A Self-Attentive Convolutional Neural Networks (CNN)
No ratings yet
Toxic Language Identification Via Audio Using A Self-Attentive Convolutional Neural Networks (CNN)
7 pages
AI Presentation
100% (1)
AI Presentation
11 pages
AI-Driven Financial Analysis Exploring
No ratings yet
AI-Driven Financial Analysis Exploring
35 pages
Natural Language Processing For Sentiment Analysis in Social Media
No ratings yet
Natural Language Processing For Sentiment Analysis in Social Media
3 pages
Final BE Project Report
No ratings yet
Final BE Project Report
74 pages
Sentimental Analysis of Amazon Reviews Using Naive
No ratings yet
Sentimental Analysis of Amazon Reviews Using Naive
11 pages
All Projects F19
No ratings yet
All Projects F19
186 pages
Package Sentimentr': R Topics Documented
No ratings yet
Package Sentimentr': R Topics Documented
49 pages
NLP Thesis
100% (3)
NLP Thesis
7 pages
CPS Abhi Kavathiya
No ratings yet
CPS Abhi Kavathiya
2 pages
Sentiment Analysis Using Bert On Yelp Restaurant Reviews
No ratings yet
Sentiment Analysis Using Bert On Yelp Restaurant Reviews
63 pages
Etd 2018 8944
No ratings yet
Etd 2018 8944
172 pages
Khushiii Project - Payal (Autosaved) 3
No ratings yet
Khushiii Project - Payal (Autosaved) 3
92 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
23 pages
Building AI Solutions
No ratings yet
Building AI Solutions
16 pages
unit1
No ratings yet
unit1
23 pages
Prediksi Soal (Ii)
No ratings yet
Prediksi Soal (Ii)
17 pages
THESIS TOPIC Soil Prediction Arsenic
No ratings yet
THESIS TOPIC Soil Prediction Arsenic
5 pages
Survey On Blockchain-Based Anonymous Tip-Off System
No ratings yet
Survey On Blockchain-Based Anonymous Tip-Off System
5 pages
Opinion Mining On Social Media Data Sentiment Analysis of User Preferences
No ratings yet
Opinion Mining On Social Media Data Sentiment Analysis of User Preferences
21 pages
Arunabha Gupta: Profile
No ratings yet
Arunabha Gupta: Profile
2 pages