Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning
ABSTRACT
Online reviews have great impact on today’s business and commerce. Decision
making for purchase of online products mostly depends on reviews given by the
users. Hence, opportunistic individuals or groups try to manipulate product reviews
for their own interests. This paper introduces some semi-supervised and supervised
text mining models to detect fake online reviews as well as compares the efficiency
of both techniques on dataset containing hotel reviews.
EXISTING SYSTEM
Content based methods focus on what is the content of the review. That is the
text of the review or what is told in it. Heydari et al. [2] have attempted to
detect spam review by analyzing the linguistic features of the review. Ott et al.
[3] used three techniques to perform classification. These three techniques are-
genre identification, detection of psycholinguistic deception and text
categorization.
Behavior feature based study focuses on the reviewer that includes
characteristics of the person who is giving the review. Lim et al. [7] addressed
the problem of review spammer detection, or finding users who are the source
of spam reviews. People who post intentional fake reviews have significantly
different behavior than the normal user. They have identified the following
deceptive rating and review behaviors.
Deceptive online review detection is generally considered as a classification
problem and one popular approach is to use supervised text classification
techniques [5]. These techniques are robust if the training is performed using
large datasets of labeled instances from both classes, deceptive opinions
(positive instances) and truthful opinions (negative examples) [8]. Some
researchers also used semi-supervised classification techniques.
Disadvantages
In the existing work, the system uses only to semi-supervised learning.
Only Text Classification as sentiment text and it never finds fake review.
PROPOSED SYSTEM
In the proposed system, each review goes through tokenization process first.
Then, unnecessary words are removed and candidate feature words are
generated.
Each candidate feature words are checked against the dictionary and if its entry
is available in the dictionary then its frequency is counted and added to the
column in the feature vector that corresponds the numeric map of the word.
Alongside with counting frequency, the length of the review is measured and
added to the feature vector.
Finally, sentiment score which is available in the data set is added in the
feature vector. We have assigned negative sentiment as zero valued and
positive sentiment as some positive valued in the feature vector.
Advantages
The system is very fast and effective due to semi-supervised and supervised
learning.
Focused on the content of the review based approaches. As feature we have
used word frequency count, sentiment polarity and length of review.
SYSTEM REQUIREMENTS
➢ H/W System Configuration:-
Software Requirements:
Operating System - Windows XP
Coding Language - Java/J2EE(JSP,Servlet)
Front End - J2EE
Back End - MySQL