0% found this document useful (0 votes)
19 views51 pages

Major Project Presentationn (2) - 1

Uploaded by

trishhh3174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views51 pages

Major Project Presentationn (2) - 1

Uploaded by

trishhh3174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Major Project Presentation

on
“Sentiment Analysis of Amazon Product Reviews using Natural Language
Processing”

Presented by:
Shuhel Shabana Ferdhousy (20RISTCS012)
Suhail Akhtar Mazarbhuiya (20RISTCS014)
Tanbeer Ahmed Laskar (20RISTCS015)
Trishna Das (20RISTCS017)

Under The Guidance Of:


Mr. Jay Krishna Das
Assistant Professor
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
CONTENTS
1. INTRODUCTION
2. LITERATURE SURVEY
3. PROBLEM STATEMEMT
4. MOTIVATION
5. OBJECTIVE
6. PROPOSED SYSTEM
7. METHODOLOGY
8. DESIGN
9. RESULT
10. FUTURE SCOPE
11. CONCLUSION
12. REFERENCES
INTRODUCTION
• Sentiment, within the realm of natural language processing, emerges as a rich tapestry
woven from the threads of emotions, including feelings, opinions, thoughts, and
attitudes, conveyed through written expression.

• Amidst the sphere of product reviews, sentiment holds a significant sway, acting as a
pivotal indicator of consumer satisfaction, product efficacy, and brand perception.

• Sentiment analysis, also known as opinion mining, operates at the intersection of


language, technology, and psychology, serving as a tool to decipher the emotions
conveyed in written text.

• Moreover, sentiment analysis plays a crucial role in enhancing the customer experience.
By identifying areas of dissatisfaction or concern, businesses can proactively address
customer issues, improving overall satisfaction and loyalty.
LITERATURE SURVEY
[1] Khan, M. and Srivastava, A., 2024. “A Review on Sentiment Analysis of Twitter Data
Using Machine Learning Techniques”. International Journal of Engineering and
Management Research, 14(1), pp.186-195.
• The paper explores sentiment analysis of Twitter data using machine learning techniques
to extract valuable insights from user-generated content for organizations and
governments.
• The datasets used in the studies included evaluations of public perception, sentiment in
Spanish tweets, sentiment categorization using CNN and LSTM, text semantics analysis
in Hindi and Kannada, and aspect-based sentiment analysis of Indonesian cinema
reviews.
• The accuracy of the paper is 95.5%.
LITERATURE SURVEY (cont…)
[2] Mahalakshmi, V., et al. "Twitter sentiment analysis using conditional generative
adversarial network." International Journal of Cognitive Computing in Engineering 5
(2024): 161-169
• The paper introduces a novel approach using Conditional Generative Adversarial
Network (CGAN) for Twitter sentiment analysis, outperforming existing methods.
• The paper mentions that about 20% of the dataset was used for validation, while the
remaining 80% was utilized for training the sentiment analysis model using the CGAN
network.
• The paper achieved an accuracy rate of 93.33% using the proposed CGAN approach for
sentiment analysis on Twitter data .
LITERATURE SURVEY (cont…)
[3] Jacob N, Viswanatham VM. “Sentiment Analysis using Improved Atom Search Optimizer
with a Simulated Annealing and ReLU based Gated Recurrent Unit”. IEEE Access. 2024
Mar 7.
• The research paper proposes a novel approach using ReLU-GRU for sentiment analysis
on Twitter, achieving high accuracy rates on COVID-19 and Sentiment-140 datasets
through feature extraction, selection, and classification methods.
• The paper utilized two datasets: the COVID-19 dataset containing tweets from Indian
users during the lockdown period and the Sentiment-140 dataset comprising 1.6 million
tweets classified as positive, negative, or neutral.
• The paper achieved high accuracy rates, with the ReLU-GRU model demonstrating
accuracy percentages of 97.62% for joy, 96.88% for fear, 97.99% for sadness, and
98.99% for anger sentiment factors on the COVID-19 dataset.
LITERATURE SURVEY (cont…)
[4] Aljuhani, Sara Ashour, and Norah Saleh Alghamdi. "A comparison of sentiment analysis
methods on Amazon reviews of Mobile Phones." International Journal of Advanced
Computer Science and Applications 10.6 (2023).
• This paper presents a comprehensive analysis of sentiment analysis methods on mobile
phone reviews.
• The methodology followed a systematic process to extract features, apply machine
learning algorithms, and classify reviews based on sentiment analysis techniques.
• The accuracy of the paper:
 Balanced Dataset: CNN achieved the best accuracy of 92.72% and Logistic
Regression (LR) achieved an accuracy of 80.00%.
 Unbalanced Dataset: CNN achieved an accuracy of 79.60% and Logistic Regression
(LR) achieved an accuracy of 80.00%.
LITERATURE SURVEY (cont…)
[5] Wahyudi, Mochamad, and Dinar Ajeng Kristiyanti. "Sentiment analysis of smartphone
product review using support vector machine algorithm-based particle swarm optimization."
(2022): 189-201.
• The paper analyzes smartphone product reviews from www.gsmarena.com, through
preprocessing 100 positive and 100 negative reviews through tokenization, stopwords
removal, and stemming.
• Support Vector Machine (SVM) and Particle Swarm Optimization (PSO) are employed
for sentiment analysis, with PSO enhancing feature selection for improved classification
accuracy.
• Evaluation using 10 Fold Cross Validation shows SVM achieving 82.00% accuracy,
while SVM-based PSO achieves a higher accuracy rate of 94.50%.
LITERATURE SURVEY (cont…)
[6] AlQahtani, Arwa SM. "Product sentiment analysis for amazon reviews." International
Journal of Computer Science & Information Technology (IJCSIT) Vol 13 (2021).
• This paper used the Amazon dataset extracted via Prompt Cloud, which had a total of
413,840 reviews, labelled as 1, 2, 3, 4 and 5 star-ed reviews.
• The proposed methodology had the following steps: Data Collecting, Data Cleansing and
Pre-Processing, Feature Extraction, Model Training and Evaluation.
• In the feature extraction phase, Bag-of-Words, Term frequency-Inverse document
frequency and GloVe algorithms were used.
• For model training, Naїve Bayes, Logistic Regression, Random Forest, Bidirectional
Long-Short Term Memory and BERT algorithms were used.
• After evaluation of the models, it was found that the model using BERT gave the best
accuracy of 94% for multiclass (positive, negative and neutral) classification, and an
accuracy of 95% for binary (positive and negative) classification.
LITERATURE SURVEY (cont…)
[7] Jagdale, Rajkumar S., Vishal S. Shirsat, and Sachin N. Deshmukh. "Sentiment analysis
on product reviews using machine learning techniques." Cognitive Informatics and Soft
Computing: Proceeding of CISC 2017. Springer Singapore, 2019.
• The paper showcases the successful application of machine learning algorithms in
accurately categorizing product reviews as positive or negative, highlighting the
significance of sentiment analysis for business strategy improvement.
• The review discusses the use of machine learning algorithms like SVM and NB, hybrid
approaches, sentiment lexicons, and preprocessing tasks employed for sentiment
classification in product reviews.
• The paper utilizes a diverse dataset from Amazon, including reviews of various products
structured in JSON format.
• Experimental results show high accuracy rates, with Naïve Bayes achieving 98.17%
accuracy and Support Vector Machine achieving 93.54% accuracy for camera reviews,
indicating the effectiveness of the methodology.
LITERATURE SURVEY (cont…)
[8] Farooqui NA, Ritika AS, Saini A. “Sentiment analysis of twitter accounts using natural
language processing”. International Journal of Engineering and Advanced Technology.
2019;8(3):473-9.
• The paper conducts sentiment analysis on Twitter data through various preprocessing
steps like emotion tagging and POS tagging to create feature vectors for analysis.
• The paper employs NLP methods for sentiment analysis, including feature vector and
plain sentiment text mining, utilizing sentiment analyzers like Senti WordNet and
machine learning models such as SVM and Neural Network.
• The paper collects data from Twitter via the streaming API, focusing on political parties'
tweets for sentiment analysis, particularly in the context of presidential elections.
• Evaluation in the paper includes measuring positive/negative scores, mean absolute error
(MAE), and classification into positive, negative, and neutral sentiments, with an
impressive accuracy rate of 95%.
LITERATURE SURVEY (cont…)
Sl.
Title Dataset Methodology Accuracy
No.

A Review on Sentiment
ML (Naive Baies, SVM,
Analysis of Twitter Data 5 datasets
1 Logistic Regression) DL 95.5%
Using Machine Learning are used
(RNN, LSTM, CNN)
Techniques

Twitter sentiment analysis US Election


2 using conditional generative 2020 CGAN Network, CNN, LSTM 93.33%
adversarial network Tweets

Sentiment Analysis using Sentiment-


Improved Atom Search 140 dataset
3 Optimizer with a Simulated comprising ReLU-GRU 97.99%
Annealing and ReLU based 1.6 million
Gated Recurrent Unit tweets

A comparison of sentiment
6 datasets
4 analysis methods on Amazon CNN, Logistic Regression 92.72%
are used
reviews of Mobile Phones
LITERATURE SURVEY (cont…)
Sl.
Title Dataset Methodology Accuracy
No.

Sentiment analysis of
smartphone product review 200 (100
positive
5 using support vector machine Bag-of-Words, TF-IDF, GloVe 94.5%
and 100
algorithm-based particle negative)
swarm optimization
400,000 Bag-of-Words, Term
6 Product sentiment analysis customer 95%
frequency-Inverse document
for amazon reviews reviews frequency, GloVe
Sentiment analysis on
6 datasets SVM, Naïve Bayes
7 product reviews using 98%
are used
machine learning techniques
Sentiment analysis of twitter
1400 Utilize Tweepy API, Text Blob,
8 accounts using natural 95%
tweets Senti WordNet,SVM
language processing

Table 1: Literature Survey


RESEARCH GAP
While the existing research has made a significant in sentiment analysis, there is need for
further exploration in the following areas:

• Robustness of sentiment analysis models: While various machine learning techniques


have been applied for sentiment analysis, there is a need to explore more robust and
various efficient models using various techniques than can achieve high accuracy rates.

• API-based Sentiment Analysis: While the papers focus on sentiment analysis of


product reviews, there's a research gap in exploring the use of APIs (Application
Programming Interfaces) for sentiment analysis. Investigating the effectiveness of using
APIs, such as the Amazon Product Advertising API, for sentiment analysis could provide
insights into the advantages and limitations of using real-time data from e-commerce
platforms like Amazon
PROBLEM STATEMENT

• Existing sentiment analysis techniques struggle to accurately interpret the nuances of


human emotions and opinions expressed in product reviews due to the inherent
complexity of language and context.

• There is a pressing need for more sophisticated sentiment analysis methods that can
effectively capture the diverse range of sentiments conveyed through written
communication, especially within the context of product reviews.

• By analyzing sentiments expressed in product reviews, businesses can identify areas for
improvement in the customer experience and implement targeted interventions to
address consumer concerns effectively.
OBJECTIVES

• To implement web scraping to gather diverse product reviews and ratings, facilitating
real-time insights.

• To compare the different classification models and evaluate them.

• To select the best models after evaluation and build a hybrid model that analyzes
consumer sentiment in product reviews.

• To deploy the classification model for front-end integration, enabling users to submit
reviews and receive predictions.
METHODOLOGY
Algorithm
 Download dataset from Kaggle.
• Step 1: To collect data from various sources.
 Collect data using Web Scraping.
• Step 2: To preprocess the collected data.

• Step 3: To transform the preprocessed data.

• Step 4: To split the transformed dataset for training and testing.

• Step 5: To compare different classification models and evaluate them.

• Step 6: To select the best models and build a hybrid classification model.

• Step 7: To build an interface from the hybrid model.


METHODOLOGY
Algorithm
• Step 1: To collect data from various sources.  Remove all the non-alphabet
character.
• Step 2: To preprocess the collected data.  Convert the texts to lowercase.
 Split sentences to extract
• Step 3: To transform the preprocessed data. words.
 Build corpus from the extracted
words.
• Step 4: To split the transformed dataset for training and testing.

• Step 5: To compare different classification models and evaluate them.

• Step 6: To select the best models and build a hybrid classification model.

• Step 7: To build an interface from the hybrid model.


METHODOLOGY
Algorithm  Convert textual data to vector
• Step 1: To collect data from various sources. representations using Count
Vectorizer algorithm.
 Get the root form or stem of
• Step 2: To preprocess the collected data.
the words by removing
• Step 3: To transform the preprocessed data. suffixes using Porter
Stemmer algorithm.
• Step 4: To split the transformed dataset for training and testing.

• Step 5: To compare different classification models and evaluate them.

• Step 6: To select the best models and build a hybrid classification model.

• Step 7: To build an interface from the hybrid model.


METHODOLOGY
Algorithm
 Training Dataset (Downloaded
• Step 1: To collect data from various sources. from Kaggle).
 Testing Dataset (Collected
• Step 2: To preprocess the collected data. using Web Scraping).

• Step 3: To transform the preprocessed data.

• Step 4: To split the transformed dataset for training and testing.

• Step 5: To compare different classification models and evaluate them.

• Step 6: To select the best models and build a hybrid classification model.

• Step 7: To build an interface from the hybrid model.


METHODOLOGY
 Compare different models such as:
Algorithm 1.Random forest
2.XGBoost
• Step 1: To collect data from various sources.
3.SVM
4.Decision Tree
• Step 2: To preprocess the collected data.
5.Logistic Regression
6.Adaboost
• Step 3: To transform the preprocessed data.
 Evaluate the models.
• Step 4: To split the transformed dataset for training and testing.

• Step 5: To compare different classification models and evaluate them.

• Step 6: To select the best models and build a hybrid classification model.

• Step 7: To build an interface from the hybrid model.


METHODOLOGY
Algorithm
• Step 1: To collect data from various sources.

• Step 2: To preprocess the collected data.

• Step 3: To transform the preprocessed data.

• Step 4: To split the transformed dataset for training and testing.

• Step 5: To compare different classification models and evaluate them.

• Step 6: To select the best models and build a hybrid classification model.

• Step 7: To build an interface from the hybrid model.

 Build a hybrid classification


model from the selected
models.
METHODOLOGY
Algorithm
• Step 1: To collect data from various sources.

• Step 2: To preprocess the collected data.

• Step 3: To transform the preprocessed data.

• Step 4: To split the transformed dataset for training and testing.

• Step 5: To compare different classification models and evaluate them.

• Step 6: To select the best models and build a hybrid classification model.

• Step 7: To build an interface from the hybrid model.

 Build an interface for front end


integration.
METHODOLOGY (cont…)
Flowchart
Data Collection Data Collection
(Downloaded Dataset) (Web Scraping)

Data Preprocessing

Data Transformation

Dataset

Training Data Testing Data


(Downloaded Dataset) (Web Scraping)

Model Comparison,
Training and Testing

Building hybrid model

Building Interface
Fig 1: Methodology Flowchart
METHODOLOGY (cont…)
Dataset (Downloaded)

Classes (Rating) Count


1 161
Negative
2 96
3 152
Positive 4 455
5 2286
Total 3150

Table 2: Downloaded Dataset


METHODOLOGY (cont…)
Dataset (Collected using Web Scraping)

Classes (Rating) Count


1 2830
Negative
2 2717
3 1950
Positive 4 1953
5 1961
Total 11411

Table 3: Web Scraped dataset


DESIGN (cont…)
SDLC Model

Fig 2: SDLC Model


DESIGN (cont…)
Model Flowchart

Fig 3.1: Model Flowchart


DESIGN (cont…)
Model Flowchart

Fig 3.1: Model Flowchart


DESIGN
Data Flow Diagram (DFD)

Fig 4.1: DFD Level 0


DESIGN (cont…)
Data Flow Diagram (DFD)

Fig 4.2: DFD Level 1


DESIGN (cont…)
Data Flow Diagram (DFD)

Fig 4.3: DFD Level 2


DESIGN (cont…)
Use Case Diagram

Fig 5: Use Case Diagram


IMPLEMENTATION
Objective 1: To implement web scraping to gather diverse product reviews and
ratings, facilitating real-time insights.
• Code:
IMPLEMENTATION (cont…)
IMPLEMENTATION (cont…)

• Result:
IMPLEMENTATION (cont…)
Objective 2: To compare the different classification models and evaluate them.

• Code:
IMPLEMENTATION (cont…)
IMPLEMENTATION (cont…)
IMPLEMENTATION (cont…)
IMPLEMENTATION (cont…)
IMPLEMENTATION (cont…)
RESULT
Output of web scraping:
RESULT (cont..)

• Test Accuracy & Train accuracy:


RESULT (cont..)
Comparison of Models:

Sl. No. Model Name Training Accuracy Testing Accuracy

1 Logistic Regression 0.92 0.82

2 Random Forest 0.98 0.84

3 XGBoost 0.98 0.86

4 SVM 0.94 0.82

5 KNN 0.98 0.83

6 Gradient Boost 0.98 0.83

7 Multi-layer Perceptron 0.98 0.86

8 Decision Tree 0.98 0.78

Table 5: Model’s Accuracy


RESULT (cont..)
• Accuracy of Hybrid Model (XGBoost & MLP):
 Training Accuracy: 98%

 Testing Accuracy: 87%

• Performance metrics of Hybrid Model:

Training accuracy Testing accuracy Precision Recall F1-Score

Hybrid
model 0.98 0.87 0.86 0.87 0.86

46
FUTURE SCOPE
 Advanced NLP Techniques: Future improvements in models like BERT
will make sentiment analysis more accurate by understanding complex
language details. Integrating text with images and videos will help
understand customer feelings better.

 Multi-language Support:Expanding to support more languages and


creating models that understand cultural differences will make the
system useful worldwide and more reliable.

 Enhanced Visualization and Reporting:Future updates could include


real-time, interactive dashboards and detailed reports with clear trends
and helpful insights, making it easier to make informed decisions.

 Integration with Business Tools: Integrating sentiment analysis with


CRM and market analysis tools will provide a holistic view of customer
interactions, improve customer service, and offer sentiment-driven
insights for market dynamics and consumer behavior.
CONCLUSION

• Successfully gathered diverse Amazon product reviews and ratings using web
scraping techniques.
• Preprocessed and cleaned the collected data to ensure high-quality inputs for
model training.
• Compared multiple classification models and upon evaluation, the best
performing models were:
Random Forest: Training accuracy = 99%
Testing accuracy = 94%
SVM: Training accuracy = 98%
Testing accuracy = 94%
REFERENCES
[1] Khan, M. and Srivastava, A., 2024. “A Review on Sentiment Analysis of Twitter Data
Using Machine Learning Techniques”. International Journal of Engineering and
Management Research, 14(1), pp.186-195.
[2] Mahalakshmi, V., et al. "Twitter sentiment analysis using conditional generative
adversarial network." International Journal of Cognitive Computing in Engineering 5
(2024): 161-169
[3] Jacob N, Viswanatham VM. “Sentiment Analysis using Improved Atom Search Optimizer
with a Simulated Annealing and ReLU based Gated Recurrent Unit”. IEEE Access. 2024
Mar 7.
[4] Aljuhani, Sara Ashour, and Norah Saleh Alghamdi. "A comparison of sentiment analysis
methods on Amazon reviews of Mobile Phones." International Journal of Advanced
Computer Science and Applications 10.6 (2023).
REFERENCES
[5] Wahyudi, Mochamad, and Dinar Ajeng Kristiyanti. "Sentiment analysis of smartphone
product review using support vector machine algorithm-based particle swarm optimization."
(2022): 189-201.
[6] AlQahtani, Arwa SM. "Product sentiment analysis for amazon reviews." International
Journal of Computer Science & Information Technology (IJCSIT) Vol 13 (2021).
[7] Jagdale, Rajkumar S., Vishal S. Shirsat, and Sachin N. Deshmukh. "Sentiment analysis
on product reviews using machine learning techniques." Cognitive Informatics and Soft
Computing: Proceeding of CISC 2017. Springer Singapore, 2019.
[8] Farooqui NA, Ritika AS, Saini A. “Sentiment analysis of twitter accounts using natural
language processing”. International Journal of Engineering and Advanced Technology.
2019;8(3):473-9.
THANK YOU!

You might also like