Product Fake Reviews Detection With Sentiment Analysis Using Machine Learning
Product Fake Reviews Detection With Sentiment Analysis Using Machine Learning
53030
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Abstract: Recently, Sentiment Analysis (SA) has become one of the most interesting topics in text analysis, due to its promising
commercial benefits. One of the main issues facing SA is how to extract emotions inside the opinion, and how to detect fake
positive reviews and fake negative reviews from opinion reviews. Moreover, the opinion reviews obtained from users can be
classified into positive or negative reviews, which can be used by a consumer to select a product. This paper aims to classify
product reviews into groups of positive or negative polarity by using machine learning algorithms. In this study, we analyse
online product reviews using SA methods in order to detect fake reviews. SA and text classification methods are applied to a
dataset of product reviews. More specifically, we compare five supervised machine learning algorithms: Support Vector Machine
(SVM), for sentiment classification of reviews using two different datasets, including product review dataset V2.0 and product
reviews dataset V1.0. The measured results of our experiments show that the SVM algorithm outperforms other algorithms, and
that it reaches the highest accuracy not only in text classification, but also in detecting fake reviews.
Keywords: Sentiment Analysis; Fake Reviews; Naïve Bayes; Support Vector Machine.
I. INTRODUCTION
This project proposes a machine learning approach to identify fake reviews. In addition to the features extraction process of the
reviews, this project applies several features engineering to extract various behaviors of the reviewers. Opinion Mining (OM), also
known as Sentiment Analysis (SA), is the domain of study that analyzes people’s opinions, evaluations, sentiments, attitudes,
appraisals, and emotions towards entities such as services, individuals, issues, topics, and their attributes[1]. “The sentiment is
usually formulated as a two-class classification problem, positive and negative”. Sometimes, time is more precious than money,
therefore instead of spending time in reading and figuring out the positivity or negativity of a review, we can use automated
techniques for Sentiment Analysis.
The basis of SA is determining the polarity of a given text at the document, sentence or aspect level, whether the expressed opinion
in a document, a sentence or an entity aspect is positive or negative[2]. More specifically, the goals of SA are to find opinions from
reviews and then Classify these opinions based upon polarity. According to, there are three major classifications in SA, namely:
document level, sentence level, and aspect level. Hence, it is important to distinguish aspect level of an analysis process that will
determine the different tasks of SA [3]. The document level considers that a document is an opinion on its aspect, and it aims to
classify an opinion document as a negative or positive opinion. The sentence level using SA aims to setup opinion stated in every
sentence.
The documents used in this work are obtained from a dataset of product reviews that have been collected [9]. Then, an SA technique
is applied to classify the documents as real positive and real negative reviews or fake positive and fake negative reviews. Fake
negative and fake positive reviews by fraudsters who try to play their competitors existing systems can lead to financial gains for
them [4].
This, unfortunately, gives strong incentives to write fake reviews that attempt to intentionally mislead readers by providing unfair
reviews to several products for the purpose of damaging their reputation. Detecting such fake reviews is a significant challenge. For
example, fake consumer reviews in an e-commerce sector are not only affecting individual consumers but also corrupt purchaser’s
confidence in online shopping. Our work is mainly directed to SA at the document level, more specifically, on product reviews
dataset.
Machine learning techniques and SA methods are expected to have a major positive effect, especially for the detection processes of
fake reviews in Product reviews, e-commerce. The conducted experiments have shown the accuracy of results through sentiment
classification algorithms. In both cases (product reviews dataset V2.0 and product reviews datasetV1.0), we have found that SVM is
more accurate than other methods.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5863
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
The rest of this paper is organized as follows. Section II presents the related works. Section III shows the methodology and finally,
Section IV presents the conclusion and future works.
A. Sentiment Analysis
Sentiment analysis also referred to as opinion mining, is an approach to natural language processing (NLP) that identifies the
emotional tone behind a body of text. This is a popular way for organizations to determine and categorize opinions about a product,
service, or idea [10].
B. Textual reviews
Most of the available reputation models depend on numeric data available in different fields; an example is ratings in e- commerce.
Also, most of the reputation models focus only on the overall ratings of products without considering the reviews which are provided
by customers. On the other hand, most websites allow consumers to add textual reviews to provide a detailed opinion about the
product [11].
These reviews are available for customers to read. Also, customers are increasingly depending on reviews rather than on ratings.
Reputation models can use SA methods to extract users’ opinions and use this data in the Reputation system. This information may
include consumers’ opinions about differentfeatures
D. Classification Algorithms
Comparative studies on classification algorithms to verify the best method for detecting fake reviews using different datasets such as
News Group dataset, text documents, and product reviews dataset. It also proves that NB and distributed keyword vectors (DKV)
are accurate without detecting fake reviews.
While finds that NB is accurate and a better choice, but it is not oriented for detecting fake reviews. Using the same datasets, finds that
SVM is accurate with stop Words method, but it does not focus on detecting fake reviews, while finds that SVM is only accurate
without using stop words method, and also without detecting fake reviews.
III. METHODOLOGY
To accomplish our goal, we analyze a dataset of product reviews using the Weka tool for text classification. In the proposed
methodology, as shown in Figure 1, we follow some steps that are involved in SA using the approaches described below.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5864
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
IV. IMPLEMENTATION
A. Step 1: Reviews collection
To provide an exhaustive study of machine learning algorithms, the experiment is based on analyzing the sentiment value of the
standard dataset. We have used the original dataset of the productreviews to test our methods of reviews classification. The dataset is
available and has been used in, which is frequently conceded as the standard gold dataset for the researchers working in the field of
the Sentiment Analysis. The first dataset is known as product reviews dataset V2.0 which consists of 2000 product reviews out of
which 1000 reviews are positive, and 1000 reviews are negative. The second dataset is known as product reviews dataset V1.0, which
consists of total 1400 product reviews, 700 of which are positive and 700 of which are negative. A summary of the two datasets
collected is described in Table II.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5865
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
True negative (TN) are events which are real and are effectively labeled as real, True Positive (TP) are events which are fake and
are effectively labeled as fake. Respectively, False Positives (FP) refers to Real events being classified as fakes; False Negatives
(FN) is fake events incorrectly classified as Real events. The confusion matrix, (1)-(6) shows numerical parameters that could be
applied following measures to evaluatethe Detection Process (DP) performance. In Table III, the confusion matrix shows the counts
of real and fake predictions obtained with known data and for each algorithm used in this study there is a different performance
evaluation and confusion matrix.
The confusion matrix is a very important part of our study because we can classify the reviews from datasets whether theyare fake
or real reviews. The confusion matrix is applied to eachof the five algorithms discussed in Step 4.
V. RESULTS
Fig (1)
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5866
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Fig (2)
Fig (3)
Fig (4)
Fig (5)
VI. CONCLUSION
In this paper, we proposed several methods to analyze a dataset of reviews. We also presented sentiment classification algorithms to
apply a supervised learning of the reviews located in two different datasets. Our experimental approaches studied the accuracy of all
sentiment classification algorithms, and howto determine which algorithm is more accurate. Furthermore, wewere able to detect fake
positive reviews and fake negativereviews through detection processes. Five supervised learning algorithms to classifying Sentiment
of our datasets have been compared in this paper: SVM. Using the accuracy analysis for these five techniques, we found that SVM
algorithm is the most accurate for correctly classifying the views datasets, i.e., V2.0 and V1.0. Also, detection processes for fake
positive reviews and fake negative reviews depend on the best method that is used in this study.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5867
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
VII. ACKNOWLEDGMENT
We take this opportunity to thank the teachers and senior authorities whose constant encouragement made it possible for us to take
up a challenge of doing this project. We express our deepest sense of gratitude towards our Hon’ble Head of department DR. R. V.
PATIL for giving permission to use the college resources and his constant encouragement for thiswork.
We are grateful to Prof. S. P. Gade for her technical support, valuable guidance, encouragement and consistent help without which it
would have been difficult for us to complete this project work. She is a constant source of information to us. We consider ourselves
fortunate to work under the guidance of such an eminent personality.
Last but not the least; we are thankful to our entire staff of COMPUTER ENGINEERING DEPARTMENT for their timely help and
the guidance at various stages of the progress of the project work.
REFERENCES
[1] B. Liu, “Sentiment analysis and opinion mining,” Synthesis lectures on human language technologies, vol. 5,no. 1, 2012, and pp.1–167.
[2] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, 2014, pp.
1093– 1113.
[3] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of EMNLP, 2002, pp. 79–
86.
[4] J. Malbon, “Taking fake online consumer reviews seriously,” Journal of Consumer Policy, vol. 36, no. 2, 2013, pp. 139–157.
[5] R. Xia, C. Zong, and S. Li, “Ensemble of feature sets and classification algorithms for sentiment classification,”Information Sciences, vol. 181, no. 6, 2011, pp.
1138– 1152.
[6] T. Barbu, “SVM-based human cell detection technique using histograms of oriented gradients,” cell, vol. 4, 2012,p. 11.
[7] G. Esposito, LP-type methods for Optimal TransductiveSupport Vector Machines. Gennaro Esposito, PhD, 2014, vol
[8] P. Kalaivani and K. L. Shunmuganathan, "Sentiment classification of product reviews by supervised machine learning approaches," Indian Journal of Computer
Scienceand Engineering, vol. 4, no. 4, pp. 285- 292, 2013.
[9] B. Pang and L. Lee, “A sentimental education: Sentimentanalysis using subjectivity summarization based on minimum cuts,” in Proceedings of the 42nd annual
meeting on Association for ComputationalLinguistics. Association for Computational Linguistics, 2004.
[10] Python Machine Learning and Deep Learning with Python, Scikit-learn, andTensorFlow 2 By Sebastian Raschka, Vahid Mirjalili · 2019.
[11] Deep Learning books I am Good fellow, Yoshua Bengio, Aaron Courville 2016
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5868