0% found this document useful (0 votes)
15 views

Areeba-MS-IT Thesis Defence Final (Compatibility Mode)

Uploaded by

Areeba Ishtiaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Areeba-MS-IT Thesis Defence Final (Compatibility Mode)

Uploaded by

Areeba Ishtiaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Department of Information Technology

MS-IT Synopsis Defense


Prediction of Amazon Product Helpfulness using
Transformer Based Features Embeddings of
Reviews
Program MS-IT

Student Name: Areeba Ishtiaq


Student Registration: INFT221503002
Supervisor Name: Dr. Kashif Munir
Co-Supervisor Name: Dr. Muhammad Rizwan
Agenda

• Introduction
• Literature Review
• Research Gap and Problem Statement
• Proposed Solution (Goal and Objectives)
• Methodology
• Result and Discussion
• Conclusion and Future Work
• References

3
Introduction
• Significant rise in product reviews over the last decade.

• Reviews include evaluations from experts as well as input


from individual users.

• Reviews help reduce uncertainty in online purchases by


providing insights from both consumers and specialists.

• Nearly 90% of consumers prioritize reviews before making a


purchase, according to a recent study.

• User ratings and evaluations help make informed decisions


about customer service, product quality, cost, and
satisfaction.
Introduction
• Past methods utilized traditional machine learning along with
feature engineering like BERT and BOW to assess the
helpfulness of Amazon reviews.

• Despite achieving moderate performance, there remains an


accuracy gap.

• The precise selection of features is critical in predicting the


utility of online consumer evaluations.

• Aim of this research is to enhance predictive precision of


review usefulness to aid consumers in allocating time and
resources more efficiently.
Introduction

• By utilizing the Amazon Fine Food Reviews dataset, we


implemented sophisticated machine learning methods, most
notably the feature engineering BERT-Random Forest
technique.
• By integrating BERT, a transformer model, with Random
Forest, this methodology extracts insightful information from
customer feedback.
• We developed robust machine learning algorithms by
integrating class probabilities obtained from evaluations of
review helpfulness.
Literature Review
Ref Dataset Year Model Accuracy Pros Cons
It is crucial to
The model achieved a
Amazon acknowledge that
(Zhao remarkable accuracy of
fine BERT reviews are
& Sun, 2022 79% 0.7982 in accurately
food Model fundamentally
2022) predicting the overall
reviews subjective.
review scores.
Concerns about
potential bias and
manipulation
Comprehensive
within the review
(Hudgi Amazon feedback with a
ecosystem.
ns et fine BERT minimum word count, ,
2023 86% Eliminating
al., food Model helps users make well-
misleading or
2023) reviews informed choices when
spam reviews can
making purchases.
be quite
challenging.
Literature Review
Accura
Ref Dataset Year Model Pros Cons
cy
By incorporating various The research does not
ML algorithms like SVM, include a thorough
(Nauree Amazon random forest, and naive
SVM analysis of possible
n et al., Alexa 2022 91.5% Bayes, the analysis
Model
2022) reviews becomes more robust and ways to enhance or
less dependent on a single broaden its scope.
method.

The results may not be


This analysis guides
applicable to other
product development, product categories due
(Noriega Amazon RoBER
marketing strategies, and to the fact that the
et al., fine food 2020 Ta 82%
language used and the
2023) reviews Model customer support
factors impacting
activities. sentiment can change
between them.
Literature Review
Ref Dataset Year Model Accuracy Pros Cons
Failure to consider
additional pertinent
(Bilal & Yelp Surpassing characteristics or elements
Almazro Shopping 2022 KNN 59.6% conventional bag-of- of review content may
i, 2023) reviews result in less-than-ideal
words methods.
performance of the model
in practical situations.
Research Gap and Problem Statement

• Traditional machine learning techniques were previously used


with BERT, BOW, and other feature engineering approaches to
determine the helpfulness of user evaluations on Amazon
items.
• Although these approaches showed moderate performance
ratings, there is still an accuracy gap that has to be resolved.
• The Amazon Fine Food dataset has complex attributes that
require a sophisticated feature engineering strategy to improve
the efficacy of product review helpfulness.
Proposed Solution (Goal and Objectives)

• The precise selection of features is critical in predicting the


utility of online consumer evaluations.
• To fix this problem we utilized a benchmark dataset called
Amazon Fine Food Reviews to develop sophisticated machine
learning techniques.
• We suggest using a unique transformer technique named
BERT-Random Forest BERF (BERT-RF) for feature
engineering to improve the value of user evaluations for
Amazon's gourmet food items.
Proposed Solution (Goal and Objectives)
The main objectives of this proposed research are to chase the
following:
• By utilizing the Amazon food review dataset, data cleansing will
be performed to get higher accuracy in the task of classification.
• We have employed the SMOTE technique to achieve a balanced
dataset.
• Removal of the duplicate reviews to avoid biased results based
on the potential end.
• We utilized a refined BERT model together with four advanced
machine-learning techniques to determine the usefulness of the
goods. K-fold cross-validation is used to validate the
performance outcomes of the models, and hyperparameter tuning
is used to increase performance effectiveness.
Methodology
Methodology
Methodology
Methodology
• Dataset: Amazon Fine Food Reviews from Kaggle.
• Employed Smote for balancing of the dataset.

Class distribution before Class distribution after


SMOTE SMOTE
Methodology
Word Cloud Visualization
• Provides a visual summary of common
phrases in the dataset.
• Aids in understanding patterns quickly.
• Facilitates efficient data exploration and
analysis.
• Enhances decision-making abilities.

Text Preprocessing
• Systematically removes stop words,
punctuation, and digits.
• Utilizes techniques like tokenization,
stemming, and lemmatization.
Methodology
Feature Engineering
• Involves discovering and extracting significant features from raw data.
• Enhances ML model effectiveness and prediction accuracy.
Transformer‐based Feature Engineering RF Probability‐based Feature Engineering
(BERT)
• Utilizes Random Forest model
• BERT model automatically extracts
probabilities to create additional features.
features from textual data.
• Enhances ML model prediction
• Outperforms existing models in NLP
capabilities.
tasks.
• Useful for managing complex datasets
• Requires minimal fine-tuning for
and imbalanced classes.
specific jobs.
• Provides insights into data distribution
• Provides accelerated development and
and assists in creating wrapper
decreased data prerequisites.
algorithms.
Methodology
• The methodology we propose utilizes the Amazon fine food
reviews dataset obtained from Kaggle for research purposes.
• In order to maintain the accuracy and reliability of the data, the
original dataset is subjected to preprocessing techniques that
reduce unwanted disturbances and encode relevant information.
• Data preparation involves preprocessing to remove stop words
and noise.
• Our solution proposes the utilization of a feature engineering
technique that combines BERT and Random Forest to improve
the value of customer reviews for Amazon fine food products.
• Newly selected dataset is divided into two segments: one for
training and the other for testing, following a split ratio of 80%
for training and 20% for testing.
Methodology
• Data Split: 80% training, 20% testing.

• ML Models: Random Forest, K-Nearest Neighbors, Decision

Tree, LightGBM.

• Hyperparameter Tuning: Grid Search

• Evaluation Metrics: Accuracy, Precision, Recall, F1 Score.

• Validation: k-fold cross-validation.


Result and Discussion
• A series of experiments are undertaken to assess the
performance of learning models by employing feature
extraction methods on the Amazon Fine Food Reviews dataset.
• A comprehensive analysis is conducted on the outcomes
produced by four distinct algorithms, namely Light Gradient
Boosting Machine (LGBM), KNN, and DT.
• The initial approach employed feature extraction strategies
BERT and RF, in conjunction with additional ML algorithms,
to forecast the helpfulness of customer reviews.
Result and Discussion
ML MODELS RESULTS WITH BERT FEATURES
Accuracy Precisio F1
Classifier Target Recall
(%) n Score
No 0.83 0.92 0.87
RF 86.79 Yes 0.91 0.81 0.86
Average 0.87 0.87 0.87
No 0.74 0.72 0.73
DT 73.40 Yes 0.73 0.75 0.74
Average 0.73 0.73 0.73
No 0.94 0.72 0.82
KNC 83.77 Yes 0.78 0.95 0.85
Average 0.86 0.84 0.84
No 0.75 0.87 0.80
LGBM 79 Yes 0.85 0.71 0.77
Average 0.80 0.79 0.79
Result and Discussion
ML MODELS RESULTS WITH BERT+RF PROBABILITY FEATURES
Classifi Accuracy Precisio
Target Recall F1 Score
er (%) n
No 0.97 0.97 0.97
RF 96.65 Yes 0.97 0.97 0.97
Average 0.97 0.97 0.97
No 0.97 0.97 0.97
DT 96.63 Yes 0.97 0.97 0.97
Average 0.97 0.97 0.97
No 0.95 0.98 0.96
KNN 96.36 Yes 0.98 0.95 0.96
Average 0.96 0.96 0.96
No 0.98 0.97 0.98
LGBM 98 Yes 0.97 0.98 0.98
Average 0.98 0.98 0.98
Result and Discussion
Result and Discussion
• K-Fold cross validations: We used k-fold cross-validation
methods to evaluate generalization and confirm the
performance ratings of methodology, as shown in Table 5:

Models K‐folds Mean Accuracy


RF 10 0.97
DT 10 0.97
•. KNN 10 0.96
LGBM 10 0.98

• The proposed technique showed a high k-fold accuracy score


of 0.98 in comparison.
Result and Discussion
K-FOLD CROSS-VALIDATION USING LGBM

K-Folds K fold Accuracy


1 0.9794
2 0.9752
3 0.9781
4 0.9765
5 0.9794
6 0.9788
7 0.9791
8 0.9766
9 0.9790
10 0.9771
Average 0.9781
Standard deviation (+/-) 0.0010
Result and Discussion
Ref Year Proposed Performance
Technique Accuracy

(Hudgins et al., 2023 Naïve Bayes 85%


2023)

(Zhao & Sun, 2022 BERT Model 79%


2022)

Proposed 2024 LGBM 98%


Conclusion and Future Work

• The accuracy scores of the learning models indicate that


LGBM excels on the test dataset.
• LGBM performed exceptionally well in terms of f1 score,
accuracy, and recall while using both feature extraction
approaches (BERT-RF) together.
• Random Forest (RF) provides the maximum recall score using
feature extraction approaches with BERT. Additionally, K-NN
also achieves satisfactory results when used in combination
with BERT-RF.
• LGBM demonstrated exceptional accuracy by implementing
the feature engineering approach (BERT-RF).
Conclusion and Future Work

• Develop a user-friendly GUI for online shoppers.

• Enhance the practical use of customer reviews.

• Improve model precision and performance.

• Aim to enhance service quality, decision-making, and user


experience.
References
• Hudgins, T., Joseph, S., Yip, D., & Besanson, G. (2023). Identifying Features and
Predicting Consumer Helpfulness of Product Reviews. SMU Data Science Review,
7(1), 11.
• Bilal, M., & Almazroi, A. A. (2023). Effectiveness of fine-tuned BERT model in
classification of helpful and unhelpful online customer reviews. Electronic
Commerce Research, 23(4), 2737-2757.
• Wei, J., Ko, J., & Patel, J. (2021). Predicting amazon product review helpfulness.
IEEE Transactions on Neural Networks, 5(1), 3-14.
• Zhao, X., & Sun, Y. (2022). Amazon fine food reviews with BERT model. Procedia
Computer Science, 208, 401-406.
• Naureen, A., Siddiqa, A., & Devi, P. J. (2022). Amazon Product Alexa’s Sentiment
Analysis Using Machine Learning Algorithms. In Innovations in Electronics and
Communication Engineering: Proceedings of the 9th ICIECE 2021 (pp. 543-551):
Springer.
• Noriega, I., Alvarez, H., Ramírez, C., & Cantu-Ortiz, F. (2023). Sentiment Analysis
of Amazon Reviews using Deep Learning Techniques.

You might also like