100% found this document useful (2 votes)
954 views

Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning

This paper introduces semi-supervised and supervised text mining models to detect fake online reviews. It compares the efficiency of these techniques on a dataset containing hotel reviews. The proposed system uses tokenization, feature extraction including word frequency, sentiment polarity, and review length to create feature vectors for semi-supervised and supervised classification of reviews as fake or real. The system aims to more accurately detect fake reviews compared to prior work using only semi-supervised learning or sentiment analysis alone.

Uploaded by

Websoft Tech-Hyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
954 views

Detection of Fake Online Reviews Using Semi-Supervised and Supervised Learning

This paper introduces semi-supervised and supervised text mining models to detect fake online reviews. It compares the efficiency of these techniques on a dataset containing hotel reviews. The proposed system uses tokenization, feature extraction including word frequency, sentiment polarity, and review length to create feature vectors for semi-supervised and supervised classification of reviews as fake or real. The system aims to more accurately detect fake reviews compared to prior work using only semi-supervised learning or sentiment analysis alone.

Uploaded by

Websoft Tech-Hyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Detection of fake online reviews using semi-

supervised and supervised learning

ABSTRACT

Online reviews have great impact on today’s business and commerce. Decision
making for purchase of online products mostly depends on reviews given by the
users. Hence, opportunistic individuals or groups try to manipulate product reviews
for their own interests. This paper introduces some semi-supervised and supervised
text mining models to detect fake online reviews as well as compares the efficiency
of both techniques on dataset containing hotel reviews.

EXISTING SYSTEM

 Content based methods focus on what is the content of the review. That is the
text of the review or what is told in it. Heydari et al. [2] have attempted to
detect spam review by analyzing the linguistic features of the review. Ott et al.
[3] used three techniques to perform classification. These three techniques are-
genre identification, detection of psycholinguistic deception and text
categorization.
 Behavior feature based study focuses on the reviewer that includes
characteristics of the person who is giving the review. Lim et al. [7] addressed
the problem of review spammer detection, or finding users who are the source
of spam reviews. People who post intentional fake reviews have significantly
different behavior than the normal user. They have identified the following
deceptive rating and review behaviors.
 Deceptive online review detection is generally considered as a classification
problem and one popular approach is to use supervised text classification
techniques [5]. These techniques are robust if the training is performed using
large datasets of labeled instances from both classes, deceptive opinions
(positive instances) and truthful opinions (negative examples) [8]. Some
researchers also used semi-supervised classification techniques.
Disadvantages
 In the existing work, the system uses only to semi-supervised learning.
 Only Text Classification as sentiment text and it never finds fake review.

PROPOSED SYSTEM

 In the proposed system, each review goes through tokenization process first.
Then, unnecessary words are removed and candidate feature words are
generated.
 Each candidate feature words are checked against the dictionary and if its entry
is available in the dictionary then its frequency is counted and added to the
column in the feature vector that corresponds the numeric map of the word.
Alongside with counting frequency, the length of the review is measured and
added to the feature vector.
 Finally, sentiment score which is available in the data set is added in the
feature vector. We have assigned negative sentiment as zero valued and
positive sentiment as some positive valued in the feature vector.

Advantages

 The system is very fast and effective due to semi-supervised and supervised
learning.
 Focused on the content of the review based approaches. As feature we have
used word frequency count, sentiment polarity and length of review.

SYSTEM REQUIREMENTS
➢ H/W System Configuration:-

➢ Processor - Pentium –IV


➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

Software Requirements:
 Operating System - Windows XP
 Coding Language - Java/J2EE(JSP,Servlet)
 Front End - J2EE
 Back End - MySQL

You might also like