Evaluation of Text Transformers for Classifying Sentiment of Revi
Evaluation of Text Transformers for Classifying Sentiment of Revi
ARROW@TU Dublin
2023
Recommended Citation
Jamshidian, M. (2022). Classifying Sentiment of Reviews by Using TF-IDF, BERT (word embedding),
SBERT (sentence embedding) with Support Vector Machine Evaluation. [Technological University Dublin].
This Dissertation is brought to you for free and open access by the School of Computer Science at ARROW@TU
Dublin. It has been accepted for inclusion in Dissertations by an authorized administrator of ARROW@TU Dublin.
For more information, please contact [email protected], [email protected],
[email protected], [email protected].
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.
Evaluation of Text Transformers for
Classifying Sentiment of Reviews by
Using TF-IDF, BERT (word
embedding), SBERT (sentence
embedding) with Support Vector
MachineEvaluation
Mina Jamshidian
January 2023
Declaration
I certify that this dissertation which I now submit for examination for the award of
M.Sc. in Computing (Data Science), is entirely my own work and has not been taken
from the work of others save and to the extent that such work has been cited and
acknowledged within the text of my work.
This dissertation has been prepared in accordance with the regulations for postgraduate
study at of the Technological University Dublin and has not been previously submitted
in whole or part for an award at any other Institute or University.
The work reported in this dissertation adheres to the principles and requirements
of the Institute’s guidelines for ethical research.
Date: 05/01/2023
Abstract
As the online world evolves and new media emerge, consumers are sharing their
reviews and opinions online. This has been studied in various academic fields, in-
cluding marketing and computer science. Sentiment analysis, a technique used to
identify the sentiment of a piece of text, has been researched in different domains
such as movie reviews and mobile app ratings. However, the video game industry has
received relatively little research on experiential products. The purpose of this study
is to apply sentiment analysis to user reviews of games on Steam, a popular gaming
platform, in order to produce actionable results. The video game industry is a major
contributor to the entertainment industry’s revenue and customer feedback is crucial
for game developers. Sentiment analysis is widely used by companies to discover
what customers are saying about their products. This paper proposes a process for
evaluating video game acceptance using game user reviews through the application of
sentiment analysis techniques.
The focus of this study is to examine the performance of different Text Transformer
techniques in the context of text mining when applied to Steam game reviews, using
an Support Vector Machine classifier. The goal is to compare the effectiveness of
these methods for predicting sentiment, and to develop software that can accurately
predict sentiment and explain the prediction through text highlighting. Specifically,
the study aims to compare a sentiment analysis classifier based on the traditional
TF-IDF text feature representation method to classifiers using the more recent BERT
and SBERT techniques. The ultimate goal is to develop a more clear and accurate
sentiment prediction tool.
I am deeply grateful to my supervisor, Dr. Bojan Božić, for his guidance and support
throughout this process. Thank you for believing in my capabilities and providing
direction during challenging times. Your encouragement and trust have been invaluable
to me.
I am thankful to Dr. Luca Longo for their assistance in helping me clarify my area of
interest.
I would like to extend my gratitude to my husband, Afshin Mehrabani, for his love
and support throughout my master’s course. His encouragement and understanding
enabled me to successfully complete this program.
Table of contents
List of figures ix
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Research Methodologies . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Research Scope and Limitations . . . . . . . . . . . . . . . . . . . . 5
1.6 Document Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
vi
TABLE OF CONTENTS
5 Conclusion 65
5.1 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Design, Experimentation, Evaluation & Results . . . . . . . . . . . . 66
5.4 Contributions & Impact . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 Future Work & Recommendations . . . . . . . . . . . . . . . . . . . 68
vii
TABLE OF CONTENTS
References 71
viii
List of figures
3.1 CRISP-DM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 model1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 SBERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.9 F1 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.10 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.11 K-Fold Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . 35
ix
LIST OF FIGURES
x
LIST OF FIGURES
xi
List of tables
xii
List of Acronyms
DT Decision Trees
FN False Negatives
FP False Positives
ML Machine Learning
NB Naive Bayes
RF Random Forest
xiii
List of Acronyms
TN True Negatives
TP True Positives
xiv
Chapter 1
Introduction
1.1 Background
The video game industry has exploded in popularity and profitability in recent years,
with billions of people around the world enjoying a wide variety of games on various
platforms. In this competitive market, it’s important for game developers and com-
panies to create high-quality products that stand out from the competition. Strong
storytelling, a stable multiplayer server, and fluid combat are all key elements that can
contribute to a game’s success and ensure it is well-received by players. Conducting a
sentiment analysis can be useful in understanding how players feel about a game and
how their emotions may be related to different aspects of the game (Fang & Zhan,
2015).
By analyzing the sentiment of customer reviews, companies can gain valuable insights
into the opinions and experiences of their players, which can in turn lead to increased
profits (Utz et al., 2012). Sentiment analysis can also help to uncover hidden senti-
ments within reviews that may not be immediately apparent (Lu & Wu, 2019). Overall,
sentiment analysis can be a useful tool for understanding the acceptance of a video
game among its players and can help companies make informed decisions about how
to improve and market their games (Fang & Zhan, 2015; Vieira & Brandão, 2019).
1
Introduction
field of sentiment analysis, techniques have become increasingly advanced over the
past decade, allowing for more accurate and nuanced analysis of text and language. In
order to classify text using machine learning algorithms, it is often necessary to first
transform the raw text using techniques such as text transformation(text Vectorization) ,
stemming, and lemmatization. These techniques help to pre-process the text and make
it more suitable for analysis by breaking it down into smaller units and standardizing
the form of words. Once the text has been transformed, it can be fed into machine
learning algorithms for classification, allowing for the automated analysis of sentiment
and other linguistic phenomena (Chouikhi et al., 2020).
2
Introduction
position, and cross-document occurrence of words. It only takes into account the
frequency of words within a single document. Despite its widespread use, TF-IDF has
certain limitations and may not be suitable for all natural language processing tasks.
In this study, we aim to explore the effectiveness of different approaches to converting
text into vectors, including traditional machine learning algorithms and pre-trained
transformer models such as BERT and SBERT, and to provide insight into which may
be most suitable for specific tasks or applications.
The use of BERT word embeddings in natural language processing has been shown to
improve model performance due to their ability to capture subtle differences in word
meaning and context. These embeddings are created using a dynamic process that
takes into account the words surrounding a given word, allowing for more precise
representation of features. BERT sentence embeddings, or SBERT, are an extension of
BERT word embeddings that can be used to compare sentences using methods such as
cosine similarity. SBERT shares many similarities with BERT word embeddings, but
allows for the comparison of entire sentences rather than individual words. Overall, the
use of BERT and SBERT embeddings has the potential to improve the accuracy and
effectiveness of natural language processing models (Reimers & Gurevych, 2019).
Research Question:
“Is it possible for a Support Vector Machine classifier model that utilizes ’BERT’
or ’SBERT’ as pre-trained transformer techniques to achieve statistically signifi-
cant higher accuracy compared to a Support Vector Machine model employing
TF-IDF as the transformer technique for text classification of review sentiments?”
3
Introduction
during which various metrics will be calculated and compared. The outcome of the
hypothesis will be determined by the results of these experiments. To determine the
validity of the hypothesis, statistical difference tests will be conducted between the two
models being compared. If the difference between the models is statistically significant
(p<0.05), the null hypothesis will be rejected and the alternative hypothesis will be
accepted. On the other hand, if the difference is not statistically significant, the null
hypothesis will be accepted and the alternative hypothesis will be rejected.
Evaluating a hypothesis through the scientific method involves designing and conduct-
ing experiments to test the hypothesis, and then analyzing the results using statistical
techniques. This process is essential in order to determine the validity of the research
question and reach a conclusion about the hypothesis. The process of experimenta-
tion and statistical analysis allows researchers to gather evidence and make informed
decisions about the potential accuracy of the hypothesis. It is a key component of
the scientific method, as it helps to ensure that the results of the research are reliable
and can be replicated by other researchers. By carefully evaluating the hypothesis
through experimentation and statistical analysis, researchers can better understand the
underlying phenomena being studied and contribute to the body of knowledge in their
field.
Research Hypothesis:
4
Introduction
The research process in this study follows a deductive approach, where a research
question or hypothesis is formulated and then tested through experiments. In this
approach, researchers start with a general idea or theory and then use specific data and
analysis to either support or refute that idea. In this study, statistical analysis was used
to analyze the results of the experiments and determine whether the initial hypothesis
could be accepted or rejected based on the data.
Overall, the study employed a systematic and data-driven approach to address a specific
research question or test a hypothesis. This approach involves using both secondary
data, which has already been collected for another purpose, and statistical analysis to
draw conclusions and reach a final conclusion. This type of research is characterized
by its rigorous and structured methodology, which helps to ensure that the results are
reliable and accurate.
5
Introduction
to classify the sentiment of the reviews using a Support Vector Machine. To the
best of our knowledge, BERT and SBERT have not been previously used as the text
transformer technique for classifying the sentiment of user reviews on Steam. The goal
of this research is to explore the potential of BERT and SBERT as embeddings for text
transformation and to evaluate their performance in conjunction with a Support Vector
Machine for sentiment classification on user reviews from Steam.
This study has some limitations to consider. One limitation is that the BERT and
SBERT models, which are typically used to train deep learning algorithms, were
applied to train a machine learning algorithm (an SVM classifier) in this research. This
means that the results of this study may not be directly comparable to other studies
that used deep learning approaches to train their models. Another limitation is that
the SVM classifier with the TF-IDF text representation technique was chosen as the
baseline in this study. While this approach was used in a previous study (Alzami et al.,
2020), it is possible that the accuracy achieved in this study could be different due to
the use of a different dataset.
6
Introduction
The aim of this chapter is to explain how the CRISP-DM methodology was utilized
in the development of the dissertation and the execution of experiments related to the
research question. The Steam dataset is initially introduced, and data understanding is
given significant emphasis in this section. This involves understanding the motivations
behind the selection of specific data preparation and data cleansing techniques for use
on the dataset. The data preparation process is then thoroughly explained, including
7
Introduction
all steps taken. Following this, the text transformation techniques utilized in the
experiment, such as TF-IDF, BERT, and SBERT, are described in detail. The modeling
phase of the experiment is then discussed, which involves the use of SVM with
various text transformation techniques such as the base model and a model that will
be compared to the base model. Finally, the evaluation phase of the experiment is
described, including a thorough explanation of the methods and steps used in this
process.
This chapter serves as a thorough examination of the data preparation process, as well
as the results and analysis of various experiments conducted. Specifically, this chapter
delves into the implementation and results of using TF-IDF, BERT, and SBERT with
SVM, and includes a discussion of the cross-validation results and statistical tests
performed to validate or refute the hypothesis. To begin, the data preparation process
is thoroughly explained, including any necessary preprocessing or cleaning steps.
Next, the results of each experiment are presented, along with a detailed discussion of
the techniques used and their effectiveness. This includes a thorough analysis of the
performance of TF-IDF, BERT, and SBERT with SVM, including any limitations or
strengths of each method.
Following the presentation of the results, cross-validation results and statistical tests
are described, highlighting the methods used to validate or refute the hypothesis
being tested. Finally, an in-depth analysis of the results is presented, including an
interpretation of the findings and their implications. Overall, the "Results, Evaluation
and Discussion" chapter provides a comprehensive look at the data preparation process,
the results of various experiments, and the analysis and interpretation of those results.
Chapter 5 - Conclusions:
This chapter presents a summary of the entire study, highlighting its main objectives
8
Introduction
and the results achieved. It reflects on the research conducted and presents the final
conclusions drawn from the findings. To begin, the chapter summarizes the main
objectives of the study and the methods used to achieve them. This includes a brief
overview of the data preparation process and the experiments conducted, as well as
a summary of the results obtained. Next, the chapter presents the final conclusions
of the study, highlighting the main findings and their implications. This includes an
in-depth analysis of the results, as well as an interpretation of their significance.
In conclusion, this chapter identifies potential directions for future research that could
expand upon the findings of this study. This includes suggestions for further studies
that could deepen our understanding of the topic, as well as ideas for how the results
of this study could be applied in practical settings. Overall, this chapter provides a
comprehensive summary of the study and its findings, along with ideas for future
research that could further advance our understanding of the topic.
9
Chapter 2
The purpose of this chapter is to review and summarize existing research on sentiment
analysis and the analysis of consumer expressions, such as reviews. This research
typically involves the use of encoding techniques, which are methods for representing
text data in a format that can be processed by machine learning algorithms, as well
as classification techniques, which are methods for assigning a label or category to a
piece of text based on its content. In this review, some of the most commonly used
technical techniques for sentiment analysis will be compared and their advantages and
disadvantages will be discussed. Based on the findings of this overview, research gaps
and questions will be identified and a research question will be formulated. The goal
of this review is to provide a comprehensive summary of the current state of the field
and to identify areas where further research is needed.
In recent years, the use of digital platforms for collecting text-based consumer reviews
has become increasingly common. These reviews are a valuable source of information
for companies, as they provide insight into what customers think about a product or
service. User reviews are typically the most common form of user feedback and can be
10
Review of relevant literature and previous research
On the other hand in the modern digital age, many products, including games, can only
be purchased online and are not available through traditional brick-and-mortar stores.
This means that for many consumers, the only way to gather information about the user
experience of a product before making a purchase is to rely on online reviews or ratings
(Sobkowicz & Stokowiec, 2016). As a result, the sentiment of these online reviews
can be an important factor for consumers when deciding whether or not to purchase
a product. This is why sentiment analysis techniques are often used to evaluate the
overall sentiment of a large number of online reviews, in order to provide a summary
of the user experience of a product and help potential buyers make informed decisions.
By doing this, businesses can provide guidance to their clients, recommend appropriate
products, and resolve negative feedback by implementing these frameworks. It is also
possible to apply sentiment analysis to competitors in order to prevent repeating the
mistakes they have made in the past. Therefore, for game products, sentiment analysis
can be extremely useful and helpful. Text reviews have been analyzed using a variety
of approaches by marketing researchers over the years. It has been hypothesized by
the authors of Alantari et al. (2022) that diagnostic and predictive skills may be a
trade off that is faced empirically.
According to research conducted by Iqbal et al. (2022), the use of machine learning
techniques with neural networks and text preparation techniques resulted in the most
precise predictions for sentiment analysis tasks. There are a wide range of analytical
11
Review of relevant literature and previous research
methods that can be used to evaluate sentiments, including machine learning and
deep learning techniques. These methods involve the use of algorithms and statistical
models to analyze and interpret the emotional tone of text data. By applying these
techniques to large datasets, it is possible to identify patterns and trends in sentiments
and use this information to make predictions or take informed decisions.
It is widely recognized that machine learning models cannot be applied directly to raw
text reviews for sentiment analysis tasks without first preprocessing and transforming
the data. This is why numerous studies have been conducted to demonstrate the
importance of preprocessing and transforming text data before applying machine
learning algorithms. There are a variety of methods that can be used to prepare text
reviews for machine learning classification, including techniques such as stemming,
lemmatization, and removal of punctuation and special characters. The choice of text
transformer and machine learning classifier can also have an impact on the accuracy
of the results. Different transformer and classifier combinations may be more or less
effective for specific datasets or tasks, so it is important to carefully consider which
methods to use in order to achieve the best possible results.
2.1.2 Classifier
In a study by Zuo (2018), the effectiveness of sentiment analysis was evaluated in terms
of accuracy, precision, and recall by analyzing the sentiment of a large scale Steam
Review dataset. The study compared the performance of two supervised machine
learning algorithms, namely Decision Tree and Gaussian Naive Bayes, and found that
the Decision Tree model achieved an accuracy of approximately 75% compared to the
Gaussian Naive Bayes model. This result was specific to the Steam Review dataset
used in the study. Overall, the findings of this study suggest that the Decision Tree
algorithm may be a more effective method for sentiment analysis tasks when applied
to the Steam Review dataset.
The study was done by Balakrishnan et al. (2020) investigated four supervised learning
algorithms using Python for a Sentiment and Emotion Analysis. In particular, Support
Vector Machine, Naive Bayes, Decision Trees and Random Forest were compared
12
Review of relevant literature and previous research
for Sentiment and Emotion Analysis. Accuracy and F1 scores indicate the Random
Forest classifier with 75.62% accuracy and almost 1000 reviews achieved the highest
classification accuracy. A larger dataset could have provided better results according
to the study’s authors, who believe the study would have been more accurate if a larger
dataset had been used (Balakrishnan et al., 2020) .
According to the study obtained by Normah (2019), the purpose of the study was to
examine customer sentiment toward Windows Phone Store applications. This was done
based on automated categorization of reviews in order to identify positive and negative
sentiments. Because of its simplicity and level of performance, Nave Bayes has been
proven to be a reliable classification model for a wide range of textual domains. This
is also true for a wide range of different types of textual data. For the validation of the
model, we used tenfold cross validation. For the measurements, a Confusion Matrix
and ROC curve were used. This study showed an accuracy rate of 84.50%, which
indicates that Naive Bayes is a suitable model for text classification especially in the
case of sentiment analysis (Normah, 2019).
According to this study, two classifier types have been applied namely Naive Bayes
and Support Vector Machines along with different feature selection methods in order to
perform sentiment analysis on movie reviews. Different methods of feature selection
and how they affect sentiment analysis were discussed. As a result, it is evident from
the classification results that the Linear SVM classifier provides a higher level of
accuracy compared to the Naive Bayes classifier. SVM has also been identified as a
more effective method for sentiment analysis in many previous studies but the results
obtained from linear SVM are also superior. Based on the model described in this paper,
hybrid techniques can be beneficial for sentiment analysis. Incorporating the corpus
and selecting features in an effective manner can lead to significant improvements
(Tripathi & S, 2015).
This study was conducted by Jeffrey et al. (2020) using Steam Review Datasets
consisting of one million reviews of four video games. The Support Vector Machine
algorithm and the Nive Bayes algorithm are both able to achieve approximately 85%
13
Review of relevant literature and previous research
accuracy, but they differ greatly in many ways. In the SVMS, certain interactions
can occur between features. However, in the NB, features are considered to be
independent features that do not interact with each other and are therefore not taken
into consideration in the calculation. This study, according to the findings of the
authors, indicates that pretrained text transformers, such as BERT, can be used to
increase performance by leveraging the learning process (Jeffrey et al., 2020).
Srivastava et al. (2021) conducted this study in order to focus on feature generation by
using a bag-of-words based method as well as the TF-IDF to generate the sentiment
analysis features as well as the use of machine learning to build up the sentiment
analysis of Customer reviews. An experiment was conducted using a dataset of 20k
reviews which were cleaned and pre-processed, and then TF-IDF and Bow were applied
to extract features. The training and evaluation of the classifiers was carried out after the
implementation of the classifiers. Classifiers are evaluated based on accuracy metrics.
Among the three classifiers used to determine accuracy, MultinomialNB achieved the
highest accuracy for Bag of Word features, while Random Forest performed better for
TF-IDF. In Bag of Word, MultinomialNB had an accuracy rate of 82% and in TF-IDF
Random Forest, it had an accuracy rate of 78% (Srivastava et al., 2021).
This study was carried out by Arief and Deris (2021) to observe the impact of text pre-
processing on the processing of a set of unstructured product reviews, using sentiment
classifiers like Decision Tree, Naïve Bayes, and Support Vector Machine. In terms
of performance, SVM performed superiorly, with an accuracy of 88,13%, however
the Nave Bayes classifier is faster, as it takes less time to execute. Furthermore, the
experimental results using TF-IDF for feature extraction may result in improved classi-
fication accuracy. In light of the results obtained by this approach, it can be concluded
that a good text preprocessing sequence is critical to the classifier’s ability to predict
the outcome of data that is unstructured (Arief & Deris, 2021).
In a research study conducted by Alzami et al. (2020), the aim was to identify a
combination of feature extraction and machine learning methods that could improve
14
Review of relevant literature and previous research
the accuracy of polarity sentiment analysis. To achieve this, the authors used a variety
of feature extraction techniques such as Word Bags, TF-IDF (term frequency-inverse
document frequency), and Word2Vector, and applied machine learning algorithms
including Random Forest, Support Vector Machines (SVM), K-Nearest Neighbors
(KNN), and Nave Bayes. The results of this study may provide insights into which
combinations of feature extraction and machine learning methods are most effective
for polarity sentiment analysis tasks (Saifullah et al., 2021).
According to this study, the use of Support Vector Machines (SVM) with TF-IDF
(term frequency-inverse document frequency) feature extraction resulted in an 87.3%
performance for classifying the polarity of customer reviews in unstructured sentiment
analysis tasks. This method involved the preprocessing of documents by removing
punctuation and special characters and applying stemming techniques to standardize
the words. The study also found that it is possible to achieve even better results in
sentiment analysis by using transformer methods such as BERT (a deep learning
transformer). Transformer models are a type of neural network architecture that are
particularly well-suited for natural language processing tasks and can be used to
effectively analyze the sentiment of text (Alzami et al., 2020).
In the study by Cahyanti et al. (2020), the authors explored the use of support vector
machine (SVM) classification for analyzing movie review data. They found that
using term frequency-inverse document frequency (TF-IDF) as a method of weighting
words was effective in improving the accuracy of the SVM model. Additionally, they
discovered that combining the extraction of latent features with TF-IDF using latent
features Dirichlet allocation (LDA) could further improve performance by modeling
topics in the review data. The combination of TF-IDF and LDA resulted in the highest
performance, with an accuracy of 82.16%. This suggests that by combining these two
techniques, it is possible to overcome the limitations of SVM when applied to movie
review data.
According to a study by Dang et al. (2020), the combination of deep learning architec-
15
Review of relevant literature and previous research
tures with word embeddings (such as Word2Vec) can be more effective than traditional
term frequency-inverse document frequency (TF-IDF) models for sentiment analysis
tasks. In a separate study by Mohamed Ali et al. (2019), Word2Vec was used to create
numerical representations of words and was tested alongside various deep learning
and hybrid models. The hybrid model was found to be the most effective, achieving
an accuracy of 89.2%. Additionally, Beseiso and Alzahrani (2020) found that using
BERT word embeddings as features in their model led to better performance compared
to other feature combinations.
In a study conducted by Dong et al. (2020), the authors used BERT to analyze reviews
of online commodities. The BERT model was first trained, and then the review texts
were encoded using a representation layer. Next, CNN and BERT were used to extract
local features from the review text vectors, with a semantic connection layer being
applied to merge the information from these two complementary models. Finally, a
sentiment classification layer was used to classify the reviews based on their sentiment.
According to the experimental results, the F1 value of the BERTCNN model (i.e., the
combination of BERT and CNN) was 14.4% higher than the F1 values of BERT and
CNN separately. The combination of BERT and CNN may improve the accuracy of
sentiment analysis in a specific context, according to the findings of a study.
A study found that using BERT in combination with another classifier can increase
the accuracy of the classifier (Dong et al., 2020). However, the combination of
BERT with a convolutional neural network (CNN) did not result in improved accuracy
compared to using a support vector machine (SVM) with term frequency-inverse
document frequency (TF-IDF) (Huang et al., 2023). This raises the question of
whether using SVM with BERT or SBERT (Sentence-BERT) transformers could be
more accurate than using SVM with TF-IDF. It would be interesting to see if the
increased efficiency and representation learning capabilities of BERT and SBERT can
improve the performance of SVM in classification tasks.
16
Review of relevant literature and previous research
It was recognized during a thorough literature review that there was a significant
research gap since the BERT (Word Embedding) and SBERT (Sentence Embedding)
Text transformers had never been used for sentiment analysis in the review domain,
particularly in the gaming domain (Steam Platform). This is the primary reason this
was identified as the major research gap. In addition, pretrained text transformer
techniques can be used to effectively analyze the contextual meaning of words and
sentences within a corpus of text, thereby providing a strong foundation for performing
in-depth text analysis
17
Review of relevant literature and previous research
18
Chapter 3
In this chapter, the steps that will be taken in the project will be outlined and explained.
The project aims to develop a framework that can perform end-to-end conversions of
text into numerical form, which can then be used to train a text classifier for sentiment
analysis of user reviews. The research methods included in this framework are those
that are relevant to the project’s goals and have been thoroughly reviewed in the
literature. The details of the experimental process that will be used to collect and
analyze data will be described. This includes the research design, which will be
explained in terms of the type of study, the research question being addressed, and
the hypotheses being tested. The sample selection process will also be described,
including the criteria used to select participants and the size of the sample. The data
collection methods that will be used will be described, including information about
the instruments or tools used to gather data and the procedures followed to ensure the
accuracy and reliability of the data. The data analysis techniques that will be used
to analyze the data will be described, including any statistical techniques or machine
learning algorithms that will be used, as well as any other methods used to interpret
the results of the study.
Overall, this chapter will provide a clear and detailed explanation of the methods that
will be used to collect and analyze data, ensuring transparency and replicability in
the research process. The main purpose of this framework is to efficiently process
text data, transforming it into a format suitable for use in machine learning and text
classification for sentiment analysis of user reviews. Its ultimate goal is to use machine
19
Experiment Design and Methodology
learning techniques to analyze user reviews for sentiment. This framework is designed
to facilitate the efficient and effective performance of necessary computations, making
it a valuable tool for sentiment analysis of user reviews.
3.1 Methodology
Conducting sentiment analysis can be a complex and challenging process, as it involves
analyzing the emotions or opinions expressed in text data. One way to approach this
task is by using the CRISP-DM (Cross-Industry Standard Process for Data Mining),
a recent study by Nabiha et al. (2021) showed this to be the case. CRISP-DM is
a widely used methodology for data mining that provides a structured approach to
understanding and addressing a research question.
By following the steps of CRISP-DM as it is obvious in Figure 3.1 and using the
Python programming language, it is possible to simplify and clarify the sentiment
analysis process, making it more understandable and easier to implement. This can
be especially helpful when deploying the project in a real-world setting. Figure 3.2
illustrates the methodology and steps that will be followed in this project using the
CRISP-DM method.
20
Experiment Design and Methodology
Fig. 3.2 Diagram of the experimental implementation with the associated phases of
CRISP-DM
21
Experiment Design and Methodology
This dataset is based on Kaggle1 data which contains over 21 million user reviews of
approximately 300 different games on Steam. Based on the documentation provided
by Steam, the reviews were obtained using the Steamworks API2 . In the data set, there
are 23 features, but some of them are collected specifically for this study, which will
be detailed in the next chapter.
Data cleaning begins with the removal of unwanted features and observations from the
dataset. In this experiment, it was conducted by analyzing the most popular games
among users and removing irrelevant observations, a target dataset was selected based
on specific features. Irrelevant observations and features pertain to observations and
features that do not address the specific problem the researcher is trying to solve.
Conducted this step in two parts, first before deep cleaning of reviews for each specific
text transformer. Secondly, the cleaning of reviews for each text transformer was
completed, and the new features were added to the dataset with their specific cleaning.
1 https://www.kaggle.com/najzeko/steam-reviews-2021
2 https://partner.steamgames.com/doc/store/getreviews
22
Experiment Design and Methodology
For both parts all missing values were discarded completely from the reviews column.
As a result, text data cannot be treated in the same manner as numeric data in the event
that a value is missing.
Conversion of type refers to changing the type of data in each column to the appropriate
one. As the target variable of this experiment in dataset is categorical boolean, It need
to be changed to numeric boolean and all reviews need to be changed to string format.
3.3.4.1 Lowercasing
In the text, multiple spaces have been replaced with a single space in order to make the
texts more readable .After that the contraction verbs in the text were reformed to their
full forms, such as "n’t" to "not". For preparing the text for TF-IDF, compound nouns
that contained hyphens were divided into parts and the numbers were eliminated.
The removal of punctuation from text data is another common technique for prepro-
cessing text data in order to improve its readability. The punctuation can be added or
removed according to the needs of the experiments. To prepare the text for TF-IDF,
punctuation was replaced with spaces from the text. In the other hand, for preparing the
text for BERT and SBERT, because these two embedding were trained on Wikipedia
23
Experiment Design and Methodology
data, it may result in a more accurate embedding when including the numbers and
some of the punctuation and the hyphenated compound nouns in the text. These
punctuation marks (?, !,& .) remain to allow SBERT to detect the end of a sentence.
3.3.4.5 Stemming
While stemming can be a useful tool for natural language processing tasks, it does
have some limitations. Because it is based on a set of rules, it may not always produce
the correct stem for a given word. Additionally, this method does not consider the
context or parts of speech of the words, potentially leading to less accurate results in
some situations. For this reason, lemmatization, which takes into account the context
and parts of speech of the words, is often preferred over stemming in more complex
natural language processing tasks.
3.3.4.6 Lemmatization
Lemmatization is a process that involves reducing words to their base form, known
as lemmas. This is similar to stemming, which involves reducing words to their root
form. However, lemmatization takes into account the parts of speech and the context
in which the words are used, while stemming does not. It can be useful for natural
language processing tasks, such as information retrieval and text classification, because
24
Experiment Design and Methodology
it allows words with similar meanings to be treated as the same word, even if they
have different inflections or endings. This can help improve the accuracy of the model
and the relevance of the search results.
Lemmatization can be performed using POS tags, which provide additional context
about the word’s role in the sentence. This can help to determine the most appropriate
lemma for the word. For example, the word "jumps" might be tagged as a verb, while
the word "jump" might be tagged as a noun. The lemmatizer would then use this
information to determine the correct lemma for each word. Overall, it can be a useful
tool for natural language processing tasks, as it allows for the reduction of words to
their base form while taking into account their parts of speech and context. This can
help improve the accuracy and relevance of the results.
3.4.1 TF–IDF
25
Experiment Design and Methodology
the total number of words in the document. Inverse document frequency is a measure
of how common or rare a word is across a collection of documents. To calculate the
inverse document frequency (IDF), the logarithm of the total number of documents is
taken in a collection, divided by the number of documents containing the word. The
resulting value is then multiplied by the term frequency of the word in the document
in question.
The combination of term frequency and inverse document frequency allows for the
importance of words to be evaluated within the context of a specific document, as well
as relative to the rest of the collection of documents. Words that are more common
across the entire collection of documents will have a lower inverse document frequency,
and therefore a lower overall weight in the calculation. On the other hand, words
that are rare or specific to a particular document will have a higher inverse document
frequency, and therefore a higher overall weight in the calculation. TF-IDF is often
used in information retrieval and machine learning tasks, such as building search
engines, summarizing documents, and other natural language processing and text
analysis tasks. It is a simple and intuitive approach that can provide useful insights
into the importance and relevance of words in a given context.
26
Experiment Design and Methodology
3.4.2 BERT
27
Experiment Design and Methodology
3.4.3 SBERT
The fact that SBERT is computationally efficient makes it suitable for real-time
search applications, where it is important to be able to process queries and return
results quickly. This is because SBERT requires fewer computational resources to run
compared to some other natural language processing models, such as BERT, which
is a very large model. In addition to being computationally efficient, SBERT is also
relatively easy to use. It can be fine-tuned on a variety of natural language processing
tasks with minimal task-specific modifications, which makes it a convenient choice
for researchers and practitioners who want to apply it to different types of problems.
This is because SBERT is designed to generate sentence embeddings, which can be
28
Experiment Design and Methodology
3.5 Modelling
Based on a review of previous literature, it was determined that the Support Vector
Machine (SVM) Classifier was the most appropriate model for use in this experiment
due to its high accuracy rating. The decision to use the SVM Classifier was based
on the results of previous research in this area, which demonstrated that this model
consistently performed well on a variety of tasks. As a result, it was considered the
most suitable model for use in the current experiment.
29
Experiment Design and Methodology
In a study by Alzami et al. (2020), the highest accuracy results in the field of sentiment
analysis of reviews were obtained by using the combination of the Term Frequency-
Inverse Document Frequency (TF-IDF) text vectorization technique and the Support
Vector Machine (SVM) classification technique. The accuracy rate of this approach
was 88.6%, which was the highest accuracy rate reported in the literature for this
specific area of research. These results demonstrate the effectiveness of using the
TF-IDF technique in combination with the SVM classifier for sentiment analysis of
reviews.
The Support Vector Machine (SVM) classifier has been shown to be capable of
achieving high performance in sentiment analysis of reviews in previous research
(Arief & Deris, 2021; Jeffrey et al., 2020) . As a result, this current research builds
upon the findings of these previous studies by using the combination of the Term
Frequency-Inverse Document Frequency (TF-IDF) text vectorization technique and
the SVM classifier as the baseline model. This approach was chosen due to the
demonstrated effectiveness of the SVM classifier in sentiment analysis tasks, as well
as the versatility and effectiveness of the TF-IDF technique for text representation.
Based on a review of previous literature, it has been observed that transformer methods,
such as BERT (a deep learning transformer), may be utilized to improve results in
sentiment analysis. Therefore, this study proposes the construction of models using
BERT (a word embedding technique) for extracting text features in conjunction with
the Support Vector Machine (SVM) classifier, as well as SBERT (a sentence embedding
technique) for extracting text features in combination with the SVM classifier. The
accuracy results of all three models will be compared in the following section. This
approach was selected due to the demonstrated effectiveness of transformer methods,
such as BERT and SBERT, in various natural language processing tasks, including
sentiment analysis.
30
Experiment Design and Methodology
3.6.1 Accuracy
31
Experiment Design and Methodology
To determine the accuracy of a classification model, the total number of true positives
and true negatives is divided by the total number of predictions. Accuracy is a valuable
measurement for comparing a model’s predictions to the true labels and is often used
in conjunction with other evaluation metrics, such as precision, recall, and F1 score, to
gain a more thorough understanding of the model’s performance.
3.6.2 Precision
32
Experiment Design and Methodology
3.6.3 Recall
3.6.4 F1 Score
33
Experiment Design and Methodology
Perception of the model’s efficiency can be gained through the use of a confusion
matrix, which permits visualization of the model’s performance. This matrix can
help to identify where the model may be committing errors. In addition to providing
insights into the strengths and weaknesses of a classification model, it is a useful
method for assessing its performance.
34
Experiment Design and Methodology
Fig. 3.11 Calculate the mean score of the model performance obtained by using the
K-fold method
35
Experiment Design and Methodology
The Shapiro-Wilk test is a statistical test used to assess the normality of a data. This
is important because the normality of a data can affect the choice of statistical tests
and the reliability of the results. To conduct the Shapiro-Wilk test in pandas, the
scipy.stats.shapiro() 3 function can be used. This function takes in a data as an input
and returns two values: the test statistic and the p-value. The p-value is a measure
of the probability that the data follows a normal distribution. If the p-value is below
a certain threshold (usually 0.05), it can be concluded that the data is not normally
distributed. On the other hand, if the p-value is above the threshold, it can be assumed
that the data is normally distributed.
The Student’s t-test is a parametric test that is used when the data is normally distributed
and the variances of the two groups are equal. It compares the means of the two groups
and determines the probability that the observed difference between the means could
have occurred by chance. The test is named after William Sealy Gosset, who developed
it while working as a statistician at the Guinness Brewery in Dublin, Ireland. Gosset
published the test under the pseudonym "Student" in 1908 due to the company’s policy
3 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html
36
Experiment Design and Methodology
at the time of not allowing publication of work done by its employees. There are
two main types of Student’s t-test: the one-sample t-test and the two-sample t-test. If
the data in this study were normally distributed, the two-sample t-test would be the
appropriate statistical test to use.
To perform the Mann-Whitney U test, the observations from both samples are first
ranked together. The test statistic, U, is then calculated based on the ranks of the
observations in the two samples. If U is large enough, it can be concluded that the
medians of the two samples are significantly different. The Mann-Whitney U test is
a two-tailed test, meaning that it can detect differences in either direction (i.e., one
sample having a median that is either higher or lower than the median of the other
sample). It is also known as a non-directional test because it does not assume that one
sample has a higher median than the other.
37
Experiment Design and Methodology
using the Steam Review dataset from Kaggle as the source of review data. The entire
experiment was developed using the Python programming language.
The initial phase of the study was conducted using Jupyter Notebook, a widely-used
tool for interactive computing and data visualization. Jupyter Notebook was selected
for its convenience in local coding and its ability to facilitate easy sharing and col-
laboration. However, the majority of the project was ultimately completed using
Google Collaboratory due to issues that arose when attempting to run transformer-
based language model embeddings (BERT and SBERT) in Jupyter Notebook. Google
Collaboratory is a cloud-based platform that provides an interactive computing envi-
ronment and enables the execution of code and the creation of documents that contain
live code, equations, and visualizations. It was chosen as an alternative to Jupyter
Notebook due to the issues that arose during the initial phase of the study.
To run the scripts in this project, several libraries and modules were required. These
included NumPy, a library for scientific computing with Python; Scikit-Learn, a library
for machine learning; Pandas, a library for data manipulation and analysis; MXNET, a
library for deep learning; NLTK, a library for natural language processing; matplotlib,
a library for data visualization; bert_embedding, a library for BERT embeddings; and
sentence_transformers, a library for SBERT embeddings. These libraries and modules
were used to transform the text data, apply SVM classifiers, and visualize the results
of the experiment.
38
Experiment Design and Methodology
dataset. This means that the results of this research may not be directly comparable
to those of the previous study, as the different datasets and approaches used could
potentially impact the accuracy of the results.
The process of understanding the datasets was a crucial step in the data preparation
process, as it allowed for the identification of a specific target data and tailoring the
process to its specific characteristics. By taking this approach, a customized data
preparation process was created that was tailored specifically to the target data, rather
than relying on a generic approach that may not be effective for all datasets. This
allowed for more effective preparation of the data and ensured that the process was
optimized for the specific characteristics of the target data. Overall, the time spent
on understanding the datasets and customizing the data preparation process was well
worth the effort, as it allowed for better results and more accurate analysis of the data.
Prior to this study, the application of BERT as a word embedding and SBERT as a
sentence embedding in the sentiment analysis of game reviews had not been previ-
ously investigated. This gap in the research was identified through a comprehensive
review of the literature. The utilization of embedding techniques, which capture the
contextual meaning of words within a body of text, has the potential to offer valuable
insights for conducting effective text analysis. As such, exploring the use of these
text representation techniques in the sentiment analysis of game reviews represents a
promising avenue for future research.
39
Chapter 4
This chapter aims to provide a thorough understanding of the methods used to conduct
the research outlined in Chapter 3. This includes detailing the steps taken to design
and execute the experiments, as well as reporting the findings. To determine whether
the Null and Alternate Hypotheses can be accepted or rejected, a statistical difference
test will be applied to the results of the various models. This will allow for the
determination of whether there is a significant difference between the results of the
models and the Null Hypothesis, or if the results support the Alternate Hypothesis. By
analyzing the results in this way, conclusions can be drawn about the effectiveness
of the different models and their ability to accurately predict the outcomes of the
experiments.
The Steam data set contains 23 features, but some of them were collected for the
purpose of this study, as explained in the following section which details come from
steam documentation1 :
1. Game
language: language of the review
1 https://partner.steamgames.com/doc/store/getreviews
40
Results, evaluation and discussion
2. Author
steamid : the user’s SteamID
num_games_owned : number of games owned by the user
num_reviews : number of reviews written by the user
playtime_forever : lifetime playtime tracked in this app
playtime_at_review : playtime when the review was written
3. Game Recommendation
recommended : True and False
Four games with a high number of active users and a significant number of English
reviews were selected from the dataset of all existing games, specifically from the
most 15 popular games according to Steam Info2 as it is obvious in Figure 4.2 . The
target data that was chosen had around 25,000 reviews in total. The selection of active
2 https://steamdb.info/graph/
41
Results, evaluation and discussion
2. The number of games owned by the user on Steam. After sorting and
calculating the number of games owned by each user, it was found that three
users had the most games. In order to calculate the mean number of games
owned by each user, these three users were removed from the calculation. The
resulting mean was 119.66 games. The data for users who owned more than
119.66 games, including the three users with the highest number of games, were
then collected.
3. Amount of reviews that have been written by users on Steam. After finding
the mean number of games owned on Steam for each user, the mean number of
reviews written by these users was calculated. The mean number of reviews was
found to be 8.33. The data for users who had written more than 8.33 reviews
were then collected.
4. The amount of time a user has spent playing on Steam over their lifetime
on Steam.The mean amount of time spent playing on Steam over a lifetime
was calculated for each user. The mean time spent was found to be 11107.61
minutes. The data for users who had played for more than 11107.61 minutes
were then collected.
5. Time spent playing when the review was written by the user on Steam. The
mean playtime when writing a review was calculated for each user. The mean
time spent was found to be 21638.03 minutes. The data for users who had played
for more than 21638.03 minutes were then collected.
After considering all options, the final dataset chosen consisted of data from four
games, as shown in Figure 4.3: "Grand Theft Auto V", "Rust", "Terraria", and
42
Results, evaluation and discussion
Fig. 4.2 The 15 most popular games on the Steam platform (November 2022)
"PAYDAY 2". These games were chosen for their relevance and popularity within the
gaming community and the selection was made based on factors such as relevance,
quality, and potential usefulness for the research. On the x-axis of Figure 4.3, the
column labeled "app_name" displays the name of each game. On the y-axis, the
column labeled "count" shows the number of reviews for each game.
43
Results, evaluation and discussion
First Part:
During the initial data preparation process, it was identified that there were several
instances of the string ’NA’ present in the text column. To maintain the accuracy and
reliability of the data, the drop.na() method was used to eliminate all ’NA’ values
from the dataset. This method effectively removes any rows in the dataset that contain
missing or null values, which is crucial for accurately analyzing and interpreting the
data.
Second Part:
Upon completion of the data preparation process prior to the text transformer phase,
it was discovered that reviews with a minimum length of 3 caused errors during the
text transformation process. As a result, all reviews with a minimum length of 3 were
removed.
The target data was split into train and test sets using Sklearn3 ’s train_test_split
function, with a ratio of 80/20. This means that 80% of the dataset was used for the
train set and 20% was used for the test set. Finally, the following data cleaning steps
were applied to both the train and test sets.
Conversion of type refers to changing the type of data in each column to the appropriate
one. As the target variable of this experiment in dataset is categorical boolean, It need
to be changed to numeric boolean and all reviews need to be changed to string format.
3 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
44
Results, evaluation and discussion
In the next steps, all reviews will be cleaned, reformatted, and added as three new
columns to the train and test datasets and the result as an example is obvious in Figure
4.4. These columns are:
1. "prep_review" for use with the TF-IDF, BERT and SBERT text transformer.
2. "prep_review_tfidf" for use with the TF-IDF text transformer, based on the
"prep_review" column.
These columns will be used to prepare the reviews for the various text transformers
mentioned: the TF-IDF transformer, the BERT transformer, and the SBERT trans-
former.
All reviews were converted to lowercase because it has been shown that machine
learning models may treat lowercase and uppercase letters differently. This is especially
important when performing text transformation techniques such as TF-IDF, BERT, and
SBERT, as words with different casing styles may be treated as distinct entities. Then,
to improve readability, multiple spaces in the reviews were replaced with a single
space. After that, five contraction verbs in the review were reformed to their full forms,
including "n’t" to "not", "can’t" to "can not", "’ve" to "have", "’re" to "are", "won’t" to
"will not", and "’ll" to "will.
Afterwards, to prepare the review for TF-IDF text transformation, the numbers were
eliminated and compound nouns containing hyphens were divided into parts. In the
following steps, in order to prepare the review for TF-IDF, all punctuation was replaced
by spaces except for the following marks: (?, !, &, .). These marks were retained in
the review for use with BERT and SBERT. Finally, all multiple spaces in the review
for TF-IDF, BERT, and SBERT were replaced by a single space.
45
Results, evaluation and discussion
The "word_tokenize" function from the tokenize 4 package in the nltk.stem library
was utilized to divide each review in the dataset into a list of individual words, or
tokens. This is a crucial step in the text preprocessing process as it allows the analysis
to consider each word separately and analyze it in the context of the entire document.
After the review text was tokenized into a list of words, the next step was to perform
lemmatization and stemming.
Lemmatization refers to the reduction of words to their simplest form, also known as
their lemma. This is useful because it allows the analysis to group together related
words that may have different inflections. The WordNetLemmatizer() function from
nltk.stem.wordnet5 was used to perform lemmatization in this case.
The stem of a word is its base form, and stemming is the act of reducing a word
to its stem 6 . This is useful because it allows the analysis to group together related
words that may have different inflections or suffixes. The PorterStemmer() function
from nltk.stem.porter 7 was used to perform stemming in this case. The resulting
preprocessed review can then be used for further TF-IDF analysis.
4 https://www.nltk.org/api/nltk.tokenize.html
5 https://www.nltk.org/_modules/nltk/stem/wordnet.html
6 https://michael-fuchs-python.netlify.app/2021/05/31/nlp-text-pre-processing-iii-pos-ner-and-normalization/
7 https://www.nltk.org/api/nltk.stem.porter.html
46
Results, evaluation and discussion
After preparing all of the reviews and collecting them in specific columns with the
names "prep_review_tfidf" and "prep_review_bert_sbert" the next step is to perform
vectorization. Vectorization refers to the process of converting the text into numerical
representations, or vectors, that can be used as input for machine learning models. In
this case, vectorization will be performed on the "prep_review_tfidf" for TF-IDF and
"prep_review_bert_sbert" for BERT and SBERT. These techniques each have their
own unique ways of representing the text as numerical vectors, and they can be used
to extract different types of features from the text that may be useful for classification.
4.1.6.1 TF–IDF
The TfidfVectorizer8 from the scikit-learn package will be used to convert each state-
ment into its vector representation. This vectorization mechanism has several defined
parameters that can be specified. The following parameters will be applied, and all
other parameters defined by the implementation will use the default values specified
by the package. The n-gram range will be set to ’1,2’ to use unigrams and bigrams.
Unigrams represent single words, and bigrams represent word pairs. By including
word pairs, the data is encoded as specific columns in the data, forcing the pairs to be
considered regarding significance and modeling. The max_features parameter is equal
to 16000, which builds a vocabulary that only considers the top max_features ordered
by term frequency across the corpus.
8 https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.
html
47
Results, evaluation and discussion
For the BERT (word embedding) method, the BertEmbedding from the bert-embedding9
package will be used to convert each statement into its vector representation(to a 768-
dimensional dense vector space) that is implemented with MXNet 10 . The BertEm-
bedding class converts a sequence of words or tokens into their corresponding BERT
embeddings, which are dense, numerical vectors that capture the semantic and syntac-
tic characteristics of the input text. These embeddings then be used as input to text
classification for sentiment analysis, and machine translation. The bert-embedding
package provides a convenient and efficient way to compute these embeddings in
Python.
4.2 Modeling
The purpose of this research was to compare the performance of using BERT and
SBERT as text representation techniques for sentiment analysis, versus using the
traditional TF-IDF method. The study involved vectorizing all of the reviews using
three different text transformation techniques: TF-IDF, BERT, and SBERT. The
resulting features were split into training and testing sets, with "tfidf_features_train"
and "tfidf_features_test" representing the TF-IDF features, "bert_features_train" and
9 https://pypi.org/project/bert-embedding/
10 https://mxnet.apache.org/versions/1.7/api/python/docs/api/mxnet/context/index.html
11 https://www.sbert.net/
12 https://huggingface.co/sentence-transformers/bert-base-nli-mean-tokens
48
Results, evaluation and discussion
For this study, a support vector machine (SVM) classifier with a linear kernel and
a C value of 1 was used to perform sentiment analysis. The SVM classifier was
chosen because it has been found to produce the best results for sentiment analysis in
previous literature. The specific SVM implementation used in this study was the C-
Support Vector Classification provided by the scikit-learn library13 (sklearn.svm.SVC).
This implementation allows for the use of different kernel functions, and the linear
kernel was chosen for this study. The C parameter determines the strength of the
regularization, and a value of 1 was used in this study.
In this study, a baseline model using the term frequency-inverse document frequency
(TF-IDF) text representation technique and a support vector machine (SVM) classifier
was implemented in order to reproduce the results of previous research (Alzami et al.,
2020). That research found that using a SVM with a TF-IDF technique and selecting
the top 16,000 features resulted in an accuracy of 87.30% when applied to an SVM
model for sentiment analysis. In this current study, a similar level of accuracy was
achieved with the baseline model, resulting in accuracy of 87.50% when applied to the
Steam dataset using an SVM and TF-IDF.
The classification report for the steam dataset when using a support vector machine
(SVM) model based on the term frequency-inverse document frequency (TF-IDF)
representation can be found in Figure 4.5. This report provides information on the
performance of the SVM model in classifying the data using the TF-IDF representation.
It includes metrics such as precision, recall, and f1-score for each class, as well as an
overall accuracy score.
13 https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC
49
Results, evaluation and discussion
The aim of this research was to evaluate the effectiveness of using BERT as a text
representation technique compared to using TF-IDF. The BERT model was trained on
data from the ’prep_review_bert_sbert’ column. BERT transforms a sequence of words
into a fixed-length vector representation, which captures the meaning and context of
the words. This representation can be used as a feature in machine learning models for
various NLP tasks. In the current study, it was found that using BERT-based features
with a support vector machine (SVM) model resulted in an accuracy of 84.06% for
sentiment analysis.
The classification report in Figure 4.7 shows the performance of an SVM model using
SBERT as the word embedding for the steam dataset. The report includes metrics of
evaluation such as f1-score, accuracy, precision and recall, which provide a summary
of the model’s performance.
50
Results, evaluation and discussion
The purpose of this study was to examine the effectiveness of using SBERT as a
sentence embedding in comparison to using TF-IDF. The SBERT model was trained
using data from the ’prep_review_bert_sbert’ column. As a sentence embedding,
SBERT converts a sequence of words into a single numerical vector, known as an
embedding, that encapsulates the meaning and context of the sentence as a whole. It
converts a sequence of words as input into a fixed-length vector representation of the
text as output This representation captures the meaning and context of the input text at
the sentence level and can be utilized as a feature in machine learning models.
The classification report in Figure 4.7 shows the performance of an SVM model
using SBERT as the sentence embedding for the steam dataset. The report includes
metrics of evaluation such as f1-score, accuracy, precision, and recall, which provide a
summary of the model’s performance.
51
Results, evaluation and discussion
4.3 Evaluation
In this section, the results of all evaluation metrics for the three models (TF-IDF with
SVM, BERT with SVM, and SBERT with SVM) will be presented, including the
confusion matrix. The accuracy score was chosen as the primary metric for comparison
and final evaluation when applying 10-fold cross validation. This is because accuracy
is not affected by class imbalances in the data and it is important to minimize both
false positives and false negatives. The results of the experiments were reported in the
following sections, and a statistical test was performed to compare the results of the
different models in order to determine whether to accept or reject the null and alternate
hypotheses. The outcome of this test allowed for the conclusion of which model was
the most effective and could be used with confidence in future projects.
Table 4.1 presents the results of all evaluation metrics - including accuracy, F1 score,
precision, and recall - for the three models: TF-IDF with SVM, BERT with SVM,
and SBERT with SVM. As can be seen, the accuracy of the TF-IDF with SVM model
is higher than that of the two BERT and SBERT models with SVM. However, the
precision for the TF-IDF with SVM model is similar to that of the SBERT with SVM
52
Results, evaluation and discussion
model, and the recall of the TF-IDF with SVM model is similar to that of the BERT
with SVM model. The F1 scores of all three models are also almost similar for three
models. It is not clear which of the three models (TF-IDF with SVM, BERT with
SVM, and SBERT with SVM) has the best performance. Therefore, a confusion matrix
was also generated for the three models, and further evaluation using 10-fold cross
validation with accuracy is necessary to determine which model is the most effective.
For the rest of the study evaluation, the three models were evaluated and compared
using only the accuracy score for both the 10-fold cross validation and the final step.
This is because, it is important to minimize both false positives and false negatives.
Moreover, accuracy is the most commonly used metric in literature on this topic for
final evaluation. Beside of that, as shown in the Table 4.1 and 4.8, the precision for
the TF-IDF with SVM classifier is similar to that of the SBERT with SVM classifier,
and the recall for the BERT with SVM classifier is similar to that of the SBERT with
SVM classifier, the accuracy was chosen.
53
Results, evaluation and discussion
The results of the confusion matrix analysis, as presented in Figure 4.9, 4.10, 4.11,
and Table 4.2, show that there are differences in the performance of the three models
being compared in terms of their ability to accurately classify samples as either positive
or negative.
The TF-IDF model has the highest True Positive rate, meaning it is the most effective
at correctly identifying positive samples. On the other hand, the BERT model has the
54
Results, evaluation and discussion
lowest True Positive rate, indicating it is the least successful at accurately identifying
positive samples. In terms of True Negative rate, the SBERT model performs the best,
correctly identifying the highest number of negative samples, while the BERT model
has the lowest True Negative rate, accurately identifying the fewest number of negative
samples. The SBERT model also has the lowest False Positive rate, meaning it is the
least likely to incorrectly classify a negative sample as positive. The BERT model has
the highest False Positive rate, indicating it is the most likely to incorrectly classify
a negative sample as positive. The BERT model also has the highest False Negative
rate, meaning it is the most likely to incorrectly classify a positive sample as negative,
while the TF-IDF model has the lowest False Negative rate, indicating it is the least
likely to incorrectly classify a positive sample as negative.
55
Results, evaluation and discussion
56
Results, evaluation and discussion
This section presents the mean test accuracy scores obtained by applying the 10-fold
cross-validation method to the steam datasets using RepeatedStratifiedKFold14 from
sklearn.model_selection. This method, which is effective for classification problems
with severe class imbalances, allows for the estimated performance of a machine
learning model to be improved by repeating the cross-validation procedure multiple
times (as specified by the n_repeats15 parameter) and reporting the mean result across
all folds from all runs. In this study, n_repeats is set to 3, resulting in a total of 30
folds being used as it is obvius in Figure 4.12, 4.13, 4.14. The mean result is expected
to be a more accurate estimate of the model’s performance. The cross_val_score16
method from the sklearn.model_selection library in Python was also used.
14 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.
RepeatedStratifiedKFold.html
15 https://machinelearningmastery.com/repeated-k-fold-cross-validation-with-python/
16 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html
57
Results, evaluation and discussion
According to the results presented in Table 4.3 and Figure 4.15, the SVM model
based on TF-IDF appears to have the highest mean accuracy when using 10-fold cross
validation.
58
Results, evaluation and discussion
The SBERT-based SVM model, which uses a transformer-based language model called
SBERT as features and a Support Vector Machine (SVM) classifier, is the second
most accurate model among the three models evaluated in this study. The BERT-based
SVM model, which uses a similar transformer-based language model called BERT
and an SVM classifier, the accuracy was the lowest among the three models. Overall,
the model that uses the term frequency-inverse document frequency (TF-IDF) feature
representation and a classifier appears to perform the best among the three models.
59
Results, evaluation and discussion
60
Results, evaluation and discussion
distribution. The Shapiro-Wilk test was conducted on models, and the results are
presented in Table 4.4. It is important to determine the normality of the data in
order to choose the appropriate type of difference test to use. Parametric tests require
the data to follow a normal distribution, while non-parametric tests do not have this
requirement.
For the accuracy scores of the Steam dataset in three models, the Shapiro-Wilk test was
applied. The null hypothesis of this test is that the data follows a normal distribution,
while the alternative hypothesis is that the data does not follow a normal distribution.
If the p-value obtained from the test is lower than 0.05, it indicates that the data does
not follow a normal distribution. However, in the case of the Steam dataset, all three
models had a p-value higher than 0.05, which means that there is sufficient statistical
evidence to support the null hypothesis and conclude that the data follows a normal
distribution. Therefore, parametric tests, including the Student’s t-test, can be used for
statistical analysis and does not need to follow the Mann-Whitney U tests.
The Student’s t-test was selected as the statistical analysis method for this study
because the data in all of the models conforms to a normal distribution. This meant
that the t-test, which is designed for normally distributed data, was an appropriate
choice. In contrast, the Mann-Whitney U test is used when the data is not normally
distributed. Therefore, the decision was made to use the Student’s t-test in this case
because the data was found to be normally distributed and did not need to be analyzed
using the Mann-Whitney U test.
61
Results, evaluation and discussion
The Student’s t-test is a statistical procedure that is used to determine whether there is
a significant difference between the means of two groups or samples. In the context of
machine learning, it can be used to compare the performance of different models or
techniques. In this study, the t-test is used to compare the performance of two models
that use BERT and SBERT text transformers with SVM classifiers, to a baseline model
that uses a TF-IDF base with an SVM classifier. To perform the t-test, it is assumed
that the data is normally distributed. This assumption is important because it allows
for certain statistical calculations to be made, such as the calculation of the mean and
standard deviation. The t-test involves calculating the means and standard deviations
of the two samples and using this information to calculate a t-statistic. The t-statistic
is then used to determine a p-value, which is a measure of the probability that the
difference between the means is due to random chance. If the p-value is below a
predetermined level of significance (usually 0.05), it can be concluded that there is a
statistically significant difference between the means of the two samples.
The data in Table 4.5 suggests that the mean accuracy of the baseline TF-IDF based
SVM model is significantly higher than the mean accuracy of both the BERT and
SBERT based SVM models. This is indicated by the p-values for the comparisons
between these models, which are both less than 0.05, and the positive t-statistics for
both comparisons. These results suggest that the baseline model performs better in
terms of mean accuracy compared to the models that incorporate BERT and SBERT.
62
Results, evaluation and discussion
4.5 Discussion
In this study, a TF-IDF based SVM Classifier model was trained on the Steam dataset
for sentiment analysis classification of users’ reviews on the Steam game platform.
The model achieved an accuracy of 88.21%, which was the highest among the three
models tested, including two pre-trained text transformers (BERT and SBERT). It is
possible that the lower effectiveness of the BERT and SBERT models in this case was
due to their design for use with Deep Learning Neural Network models rather than
traditional Machine Learning models such as SVM.
The results of this research showed that the TF-IDF based SVM model was more
effective at predicting the sentiment of users’ reviews on the Steam platform compared
to the BERT and SBERT based SVM models. However, it would be worthwhile to
investigate in future studies whether the accuracy of BERT and SBERT based Machine
Learning models could be improved by using them with Deep Learning models, as
this would provide a more comprehensive comparison of their effectiveness. The
objective of this study was to leverage the ability of BERT and SBERT, as pre-trained
63
Results, evaluation and discussion
text transformers, to capture contextual meaning and accurately classify text articles
according to sentiment.
64
Chapter 5
Conclusion
65
Conclusion
such as BERT and sentence embedding techniques like SBERT are designed to capture
the meaning of words based on the words surrounding them.
The main focus of the research was to compare the In order to evaluate the performance
of different techniques of text representation for sentiment analysis on the Steam
platform, different techniques of text representation were compared during the research.
A baseline model based on the TF-IDF technique was selected from existing research
(Alzami et al., 2020), and this model was compared against SVM models using BERT
and SBERT for word and sentence embedding. These techniques were chosen based on
the research gap identified during the literature review. According to the results of the
experiments, the most effective technique for representing text for sentiment analysis
on Steam platform has been determined based on the results of the experiments.
66
Conclusion
After all methods were applied, classification reports were generated based on the
steam dataset. To validate the results, a 10-fold cross-validation method was used.
Then Shapiro-Wilk normality tests were conducted to determine whether the 10-
fold cross-validation scores were distributed normally. In order to determine which
statistical difference test would be most appropriate to compare the results of the
different models, this test was performed. If the distribution of the scores was found to
be normal, a Student’s t-test was performed. However, as the normality test revealed
that the scores for all three models were normal, a Mann-Whitney U test was not
used instead. These tests were used to determine whether there was a statistically
significant difference in accuracy between the models, thereby determining whether
the null hypothesis was accepted or rejected.
Finally, according to the results of this study, the traditional TF-IDF text representation
based model of SVM classifiers performed significantly better than that of the BERT
and SBERT based SVM classifiers, which are based on embeddings of words and
sentences respectively. The null hypothesis was rejected, and the alternate hypothesis
was accepted. These results suggest that the traditional TF-IDF technique may be more
effective for sentiment analysis on the Steam platform than modern embedding-based
methods. Further research may be needed to confirm these findings and explore other
potential applications of these techniques.
67
Conclusion
classification, compared to the TF-IDF model, which only considers the frequency of
words that exist in the text.
Although the support vector machine (SVM) model that utilized BERT as word em-
beddings and SBERT as sentence embeddings did not outperform the term frequency-
inverse document frequency (TF-IDF) model with 89% accuracy, both the BERT-based
and SBERT-based SVM models still demonstrated reliable accuracy with scores of
86% and 87% on the steam Kaggle dataset, respectively. These results indicate that
although the use of BERT and SBERT embeddings in a support vector machine (SVM)
model may not have significantly enhanced the overall performance compared to a
term frequency-inverse document frequency (TF-IDF) model, they were still able to
offer valuable insights and information about the contextual relationships within the
text data.
The BERT and SBERT embeddings capture the contextual information and relation-
ships between words and sentences, respectively, in the text data, and this information
can be useful for tasks such as text classification and sentiment analysis. While the
improvement in performance may not have been significant in this particular study, the
BERT and SBERT embeddings could potentially be more beneficial in other Natural
Language Processing(NLP) tasks or when used in combination with other techniques.
Additionally, it is worth noting that the use of these embeddings may still be beneficial
for certain specific applications or scenarios, even if they do not significantly improve
the overall performance of the model.
68
Conclusion
techniques based on embeddings, and may be able to achieve higher accuracy com-
pared to the current method of using TF-IDF and Support Vector Machines (SVMs).
Therefore, it may be worth considering further research into this area in the future
to determine the potential benefits of using transformer-based embeddings and deep
learning algorithms for sentiment analysis tasks.
It would be worthwhile for future research to explore the use of fake review detection
techniques to identify and collect both fake and genuine user reviews. Once these
reviews have been identified and separated, sentiment analysis could be performed
on both fake and real reviews. This approach would provide a more comprehensive
understanding of the sentiment present in user reviews, as it would allow for the
separate analysis of fake and real reviews. Additionally, this approach may result in
more accurate fake review detection, as it may enable the identification of patterns or
characteristics that distinguish fake reviews from real reviews. By analyzing both fake
and real reviews, it may be possible to identify features or indicators that are specific
to fake reviews and use this information to inform future fake review detection efforts.
A potential direction for future research could be to conduct a comprehensive and reli-
able sentiment analysis on fake and genuine user reviews using clustering algorithms.
Clustering algorithms are a type of unsupervised machine learning technique that could
used to group data points into clusters based on their similarity. By applying clustering
algorithms to fake and genuine user reviews, it may be possible to identify patterns
or trends in the sentiment of the reviews. This approach may provide more accurate
results and a deeper understanding of the sentiment present in the reviews, compared
to the current methods used. As such, exploring the use of clustering algorithms in
this context may yield valuable insights and could be a promising area of study.
69
Conclusion
of new and innovative methods for addressing this task. Additionally, it may be
valuable to consider incorporating additional data sources or techniques, such as
natural language processing or data visualization, to enhance the performance of the
selected model. Overall, ongoing exploration and experimentation with different
approaches to sentiment analysis can help to advance the field and provide more
effective solutions for understanding and analyzing the sentiments of users.
70
References
Alantari, H. J., Currim, I. S., Deng, Y., & Singh, S. (2022). An empirical comparison of
machine learning methods for text-based sentiment analysis of online consumer
reviews. International Journal of Research in Marketing, 39(1), 1–19. https://doi.org/
10.1016/j.ijresmar.2021.10.011
Alzami, F., Udayanti, E. D., Prabowo, D. P., & Megantara, R. A. (2020). Document
preprocessing with TF-IDF to improve the polarity classification performance
of unstructured sentiment analysis. Kinetik: Game Technology, Information System,
Computer Network, Computing, Electronics, and Control, 235–242. https://doi.org/10.
22219/kinetik.v5i3.1066
Arief, M., & Deris, M. B. M. (2021). Text preprocessing impact for sentiment clas-
sification in product review. 2021 Sixth International Conference on Informatics and
Computing (ICIC), 1–7. https://doi.org/10.1109/ICIC54025.2021.9632884
Balakrishnan, V., Selvanayagam, P. K., & Yin, L. P. (2020). Sentiment and emotion
analyses for malaysian mobile digital payment applications. Proceedings of the
2020 the 4th International Conference on Compute and Data Analysis, 67–71. https :
//doi.org/10.1145/3388142.3388144
Beseiso, M., & Alzahrani, S. (2020). An empirical analysis of BERT embedding for
automated essay scoring. International Journal of Advanced Computer Science and
Applications, 11(10). https://doi.org/10.14569/IJACSA.2020.0111027
Cahyanti, F. E., Adiwijaya, & Faraby, S. A. (2020). On the feature extraction for senti-
ment analysis of movie reviews based on SVM. 2020 8th International Conference
on Information and Communication Technology (ICoICT), 1–5. https://doi.org/10.1109/
ICoICT49345.2020.9166397
71
REFERENCES
Chouikhi, H., Chniter, H., & Jarray, F. (2020). On the feature extraction for sentiment
analysis of movie reviews based on SVM. 2020 8th International Conference on
Information and Communication Technology (ICoICT).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of
deep bidirectional transformers for language understanding. Proceedings of the
2019 Conference of the North, 4171–4186. https://doi.org/10.18653/v1/N19-1423
Dong, J., He, F., Guo, Y., & Zhang, H. (2020). A commodity review sentiment analysis
based on BERT-CNN model. 2020 5th International Conference on Computer and
Communication Systems (ICCCS), 143–147. https://doi.org/10.1109/ICCCS49078.
2020.9118434
Fang, X., & Zhan, J. (2015). Sentiment analysis using product review data. Journal of
Big Data, 2(1), 5. https://doi.org/10.1186/s40537-015-0015-2
Gallagher, C., Furey, E., & Curran, K. (2019). The application of sentiment analysis
and text analytics to customer experience reviews to understand what customers
are really saying: International Journal of Data Warehousing and Mining, 15(4), 21–47.
https://doi.org/10.4018/IJDWM.2019100102
Hanusz, Z., Tarasinska, J., & Zielinski, W. (2016). Shapiro–wilk test with known
mean [Artwork Size: 89–100 Pages Publisher: REVSTAT-Statistical Journal].
REVSTAT-Statistical Journal, 89–100 Pages. https://doi.org/10.57805/REVSTAT.
V14I1.180
Huang, B., Zhang, J., Ju, J., Guo, R., Fujita, H., & Liu, J. (2023). CRF-GCN: An effec-
tive syntactic dependency model for aspect-level sentiment analysis. Knowledge-
Based Systems, 260, 110125. https://doi.org/10.1016/j.knosys.2022.110125
Iqbal, A., Amin, R., Iqbal, J., Alroobaea, R., Binmahfoudh, A., & Hussain, M. (2022).
Sentiment analysis of consumer reviews using deep learning. Sustainability,
14(17), 10844. https://doi.org/10.3390/su141710844
72
REFERENCES
Jeffrey, R., Bian, P., Ji, F., & Sweetser, P. (2020). The wisdom of the gaming crowd.
Extended Abstracts of the 2020 Annual Symposium on Computer-Human Interaction in Play,
272–276. https://doi.org/10.1145/3383668.3419915
Kasuya, E. (2001). Mann–whitney u test when variances are unequal. Animal Behaviour,
61(6), 1247–1249. https://doi.org/10.1006/anbe.2001.1691
Lu, K., & Wu, J. (2019). Sentiment analysis of film review texts based on sentiment
dictionary and SVM. Proceedings of the 2019 3rd International Conference on Innovation
in Artificial Intelligence - ICIAI 2019, 73–77. https://doi.org/10.1145/3319921.
3319966
Mishra, P., Singh, U., Pandey, C., Mishra, P., & Pandey, G. (2019). Application of
student’s t-test, analysis of variance, and covariance. Annals of Cardiac Anaesthesia,
22(4), 407. https://doi.org/10.4103/aca.ACA_94_19
Mohamed Ali, N., El Hamid, M. M. A., & Youssif, A. (2019). SENTIMENT ANALY-
SIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MOD-
ELS. International Journal of Data Mining & Knowledge Management Process, 09(3),
19–27. https://doi.org/10.5121/ijdkp.2019.9302
Nabiha, A., Mutalib, S., & Malik, A. M. A. (2021). Sentiment analysis for informal
malay text in social commerce. 2021 2nd International Conference on Artificial Intelli-
gence and Data Sciences (AiDAS), 1–6. https://doi.org/10.1109/AiDAS53897.2021.
9574436
Normah, N. (2019). Naïve bayes algorithm for sentiment analysis windows phone
store application reviews. SinkrOn, 3(2), 13. https://doi.org/10.33395/sinkron.
v3i2.242
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer,
L. (2018, March 22). Deep contextualized word representations. Retrieved
December 7, 2022, from http://arxiv.org/abs/1802.05365
Reimers, N., & Gurevych, I. (2019, August 27). Sentence-BERT: Sentence embeddings
using siamese BERT-networks. Retrieved December 26, 2022, from http :
//arxiv.org/abs/1908.10084
73
REFERENCES
Saifullah, S., Fauziah, Y., & Aribowo, A. S. (2021). Comparison of machine learning
for sentiment analysis in detecting anxiety based on social media data [Pub-
lisher: arXiv Version Number: 1]. https://doi.org/10.48550/ARXIV.2101.06353
Sobkowicz, A., & Stokowiec, W. (2016, May 23). Steam review dataset - new, large scale
sentiment dataset.
Srivastava, R., Bharti, P., & Verma, P. (2021). Sentiment analysis using feature genera-
tion and machine learning approach. 2021 International Conference on Computing,
Communication, and Intelligent Systems (ICCCIS), 86–91. https://doi.org/10.1109/
ICCCIS51004.2021.9397135
Tripathi, G., & S, N. (2015). Feature selection and classification approach for sentiment
analysis. Machine Learning and Applications: An International Journal, 2(2), 01–16.
https://doi.org/10.5121/mlaij.2015.2201
Utz, S., Kerkhof, P., & van den Bos, J. (2012). Consumers rule: How consumer reviews
influence perceived trustworthiness of online stores. Electronic Commerce Research
and Applications, 11(1), 49–58. https://doi.org/10.1016/j.elerap.2011.07.010
Vieira, A., & Brandão, W. (2019). Evaluating acceptance of video games using convo-
lutional neural networks for sentiment analysis of user reviews. Proceedings of
the 30th ACM Conference on Hypertext and Social Media, 273–274. https://doi.org/10.
1145/3342220.3344924
Zuo, Z. (2018). Sentiment analysis of steam review datasets using naive bayes and
decision tree classifier, 7.
74
Appendix A
The code for this study can be found on the Github under the repository "Msc_Diss_Cod
1.
The code for appendix A can be found on the Github under the repository "Msc_Diss_Cod"
with the filename "D02124995_Mina_Jmashidian_Disseratation_Main_Coding.ipynb 2 .
1 https://github.com/minajm/Msc_Diss_Cod
2 https://github.com/minajm/Msc_Diss_Cod/blob/main/D02124995_Mina_Jmashidian_
Disseratation_Main_Coding.ipynb
75
Research Coding Part After Collection of Target Data
76
Research Coding Part After Collection of Target Data
77
Research Coding Part After Collection of Target Data
78
Research Coding Part After Collection of Target Data
79
Research Coding Part After Collection of Target Data
80
Research Coding Part After Collection of Target Data
81
Research Coding Part After Collection of Target Data
82
Research Coding Part After Collection of Target Data
83
Research Coding Part After Collection of Target Data
84
Appendix B
The code for appendix B can be found on the Github under the repository "Msc_Diss_Cod"
with the filename "D02124995_Mina_Jmashidian_Extract_Top4Gmaes.ipynb 1
1 https://github.com/minajm/Msc_Diss_Cod/blob/main/D20124995_Mina_Jamshidian_Extract_
Top4Gmaes.ipynb
85
Collecting Target Data
86
Collecting Target Data
87
Collecting Target Data
88
Collecting Target Data
89