Areeba-MS-IT Thesis Defence Final (Compatibility Mode)
Areeba-MS-IT Thesis Defence Final (Compatibility Mode)
• Introduction
• Literature Review
• Research Gap and Problem Statement
• Proposed Solution (Goal and Objectives)
• Methodology
• Result and Discussion
• Conclusion and Future Work
• References
3
Introduction
• Significant rise in product reviews over the last decade.
Text Preprocessing
• Systematically removes stop words,
punctuation, and digits.
• Utilizes techniques like tokenization,
stemming, and lemmatization.
Methodology
Feature Engineering
• Involves discovering and extracting significant features from raw data.
• Enhances ML model effectiveness and prediction accuracy.
Transformer‐based Feature Engineering RF Probability‐based Feature Engineering
(BERT)
• Utilizes Random Forest model
• BERT model automatically extracts
probabilities to create additional features.
features from textual data.
• Enhances ML model prediction
• Outperforms existing models in NLP
capabilities.
tasks.
• Useful for managing complex datasets
• Requires minimal fine-tuning for
and imbalanced classes.
specific jobs.
• Provides insights into data distribution
• Provides accelerated development and
and assists in creating wrapper
decreased data prerequisites.
algorithms.
Methodology
• The methodology we propose utilizes the Amazon fine food
reviews dataset obtained from Kaggle for research purposes.
• In order to maintain the accuracy and reliability of the data, the
original dataset is subjected to preprocessing techniques that
reduce unwanted disturbances and encode relevant information.
• Data preparation involves preprocessing to remove stop words
and noise.
• Our solution proposes the utilization of a feature engineering
technique that combines BERT and Random Forest to improve
the value of customer reviews for Amazon fine food products.
• Newly selected dataset is divided into two segments: one for
training and the other for testing, following a split ratio of 80%
for training and 20% for testing.
Methodology
• Data Split: 80% training, 20% testing.
Tree, LightGBM.