Batch5 3rdreview
Batch5 3rdreview
•Year:2022
•In the recent year we have been experiencing a huge surge on internet, due to many
people have started using internet. Nowadays there are many paid and fake reviews
flooding the e-commerce websites like Amazon, Flipkart and many other e-commerce
websites. In which many customers make decisions based on these fake reviews or
comments provided by others who had similar experiences.
LITERATURE REVIEW
•2.2 Fake Review Detection Of E-Commerce Electronic Products Using Machine Learning Techniques
•
• Year:2021
• Abstract:The rapid growth of internet access has given rise to a digital era. The availability of internet access has pushed
almost 70% of the population to switch to internet for their daily needs and accessories. Mainly, E-commerce platforms are
being used at a much higher rate than ever before. People who buy from these ecommerce platforms make decisions on whether
to buy a product or not solely based on the ratings and reviews of a product that are provided by these platforms. Due to the
simple nature of this review system, sellers and even individuals tend to exploit it by writing dishonest reviews with an
intention of either boosting its ratings or simply to sabotage it. These fake reviews are aimed at deceiving customers and
convince them to buy/deter a certain product. Due to the lack of a robust system to identify real and fake reviews, these spams
manage to show up on top. To avoid this problem and provide a more efficient way to filter and provide a more efficient way to
reviews. This work focus on designing machine learning model for fake review detection and compare the performance of three
different algorithms. As a result of this research work random forest algorithm outperform than other two algorithms. Web
based User Interface(UI) designed to remove fake review and display trusted review based on the ranking.
Boosting Accuracy of Fake Review Prediction Using Minority Oversampling Technique
•Bhawna Saxena,et al., (2022) In recent times prior to making a purchase, the vast majority read reviews about
that product, and their decision is largely driven by the reviews. Deceitful online sellers often gather fake or
spam reviews for their products or services, thereby reducing the effectiveness of online reviews. The review
data is often imbalanced such that the fake reviews greatly outnumber the genuine reviews. An imbalance leads
to a bias, as the model tends to mostly predict the majority class. To attain a high-quality classification outcome,
the issue of imbalanced data should be resolved before applying the classification algorithms. This paper studies
the performance of supervised machine learning classifiers pertaining to fake review detection. The approach put
forward in this paper aims to improve the prediction accuracy of popular supervised learning classifiers Random
- Forest, LightGBM, XGBoost, Naive Bayes, and Decision Tree on an imbalanced review dataset For boosting
the accuracy of these classifiers, the Synthetic Minority Oversampling Technique is used for addressing the class
imbalance problem. The performance of the classifiers has been studied by changing the oversampling
parameters. The application of SMOTE showed a significant improvement in the classifier’s prediction accuracy.
A Novel Semi-supervised Algorithm to Find Accuracy in Fake Review Detection using Comparing with K-
neighbours Algorithm
• SrujanaSree, et al..(2022) This study topic is centred on semi-supervised algorithms and kneighbors algorithms
for optimising false review identification in an effort to find the accuracy of real time fake review detection. The
NSS parameter and the random forest parameter are both adjusted in order to simulate the N-neighbors algorithm
(N=32) and the K-neighbors algorithm (N=32). This is done in order to optimise the pH. In this work, a total of
20 samples were utilised, and the sample size was determined by using the Gpower 80% formula to each of the
two groups. According to the findings, the accuracy achieved by the NSS algorithm is much higher (75.60%)
than that achieved by the Random forest method (74.50%). It was determined that the difference in statistical
significance between the semisupervised algorithm and K-neighbors was 0.916 (p>0.05). When it comes to
detecting fraudulent reviews and improving accuracy %, the results produced by the semi-supervised algorithm
are superior to those produced by the K-neighbors approach.
Detection of fake online reviews using semi-supervised and supervised learning
MujahedAbdulqader, et al..(2022) Online reviews influence consumers’ purchasing decisions. However, identifying
fake online reviews automatically remains a complex problem, and current detection approaches are inefficient in
preventing the spread of fake reviews. The literature on fake reviews detection lacks a comprehensive and interpretable
theory-based model with high performance, which enables us to understand the phenomenon from a psychological
perspective and analyze reviews based on user-generated content as well as consumer behavior. In this research, we
synthesized ten well-founded deception theories from psychology, namely leakage theory, four-factor theory,
interpersonal deception theory, self-presentational theory, reality monitoring theory, criteria-based content analysis,
scientific content analysis, verifiability approach, truth-default theory, and information manipulation theory, and
selected nine relevant constructs to develop a unified model for detecting fake online reviews. These constructs include
specificity, quantity, nonimmediacy, affect, uncertainty, informality, consistency, source credibility, and deviation in
behavior. We characterized the selected constructs using verbal and non-verbal features to validate the proposed model
empirically.
Parametric Analysis for Fake Reviews Identification
Vikas Attri,et al..(2021) Online reviews are one of the most important aspects in a buyer's choice to buy a
new product or use a service. As a result, it serves as a helpful source of data for determining public opinion
regarding these products and services. It also provides companies with an indication of what kind of changes
they need to make in their products to improve further. Thus, reviews also give competitors and product-
based organizations a possible option to create fake reviews in order to advertise or degrade a product based
on their interest. Hence, it is vital that the correct reviews are reached to the customers, and for this, the
detection of fake ones is to be done effectively. In order to reduce the time for fake review detection,
automated techniques are being used in the current scenario. Another concern is how to differentiate between
the original and fake reviews. This paper discusses the various factors that can help in the identification of
the same. They are broadly classified into two types: behavioral and feature-based. Also, the challenges that
are still there in fake the review identification methods are depicted, and the open research areas where
further work can be carried out are also being highlighted. The factors mentioned in the paper can prove
useful for improvising the performance of any fake review detection system once applied to any real data set.
Fake Reviews Detection using Support Vector Machine
R. Poonguzhali,et al(2022) One of the fastest expanding business categories in the world today is internet
shopping. People nowadays buy a lot of things from internet shopping sites. Customers can buy a better quality
products based on the reviews given by previous buyers of the products. Reviews includes text reviews, ratings
and smileys. On a product review there are hundreds of reviews in which some of the reviews would be fake
reviews. Opinion mining from natural languages is a difficult method for evaluating customers' sentiments, but
sentiment analysis provides the best answer. It provides crucial data for decision-making in a variety of fields.
So, we propose a fake reviews detection system using support vector machine which detect the fake reviews of
the products. The primary goal is to suggest higher-quality products to the user. We use the support vector
machine algorithm to classify the reviews into positive and negative groups. Finally fake reviews are predicted
which are posted by the users. The reviews are grouped as negative, positive and neutral. In this system, only
purchased users can post the reviews and duplicates are verified based on user id and booking id. Genuine
reviews are considered for product recommendation
Fake Review Detection on Yelp Dataset Using Classification Techniques in
Machine Learning
Dini AdniNavastara,et al..(2019) The review of a product can influence a buyer's decision to buy the
product. In addition to influencing buyer decisions, fake reviews can also confuse buyers who are
looking for product information from honest and genuine reviews. We need a system that can filter
spam to reduce the negative influence on product selling and product review writings. Spam that will
be detected is the type of brand only spam and not a review. Those types get the initial label through
manual labeling. Manual labeling requires a lot of time and effort. Therefore, in this paper, we
proposed a self-training semi-supervised learning approach. This method labels spam from the
prediction of the labeled training data. The best results were obtained with a scenario without
stemming, merging of review centric features and bigram, SMOTE borderline1 oversampling and
Polynomial SVM kernel that has accuracy 86.33%.
PROBLEM DEFINITION
• The problem statement of the project is to develop a machine learning model that can
accurately detect fake online reviews.
• With the growing popularity of online platforms for purchasing products and services, fake
reviews have become a widespread problem that can mislead consumers and affect
businesses' reputation.
• Detecting fake reviews manually can be time-consuming and challenging, especially for
businesses with a large number of reviews.
• Therefore, there is a need for an automated solution that can detect fake reviews with high
accuracy.
• The aim of this project is to develop a machine learning model that can effectively
distinguish between genuine and fake reviews, thus helping businesses and consumers
make informed decisions based on trustworthy information.
PROPOSED SYSTEM
TRAINING
ANALYSIS STAGE SAMPLES
NAÏVE BAYES
ALGORITHM
Evaluation
TESTING MODEL
SAMPLES BUILDING
WEB APPLICATION
BASED PREDICTION
OUTPUT
NEW TEXT PRE FEATURE
PREDICTION (FAKE/GENIUNE
REVIEW PROCESSING SELECTION
REVIEW)
FUNCTIONAL MODULES DESIGN
CLASS 1
Business
Naïve FAKE
Id,Date,Review Review
Bayes
Id,User
Id,Text,Useful,Funny
FAKE
REVIEW
CLASSIFIER
CLASS 2
GENIUNE
FEATURE Review
SET Decision
GENIUNE
Tree
REVIEW
HARDWARE REQUIREMENTS:
• System : Pentium i3 Processor.
• Hard Disk : 500 GB.
• Monitor : 15’’ LED
• Input Devices : Keyboard, Mouse
• Ram : 4 GB
SOFTWARE REQUIREMENTS:
• Operating system : Windows 10.
• Coding Language : Python 3.9
• Web framework : Flask
• Software :Jupyter Notebook,Pycharm IDE
MODULES
1. Data Collection -
– The dataset has been taken from
https://www.kaggle.com/datasets/omkarsabnis/yelp-reviews-dataset-.
– The Kaggle Yelp dataset is a publicly available dataset of user reviews
and ratings for businesses listed on Yelp, a popular online review
platform.
– The dataset contains over 6 million reviews of businesses across 10
cities in the United States, including Las Vegas, Phoenix, and Toronto.
– The reviews are labeled with star ratings ranging from 1 to 5, where 1
represents a negative review and 5 represents a positive review.
MODULE DESCRIPTION
– Lowercasing: Convert all words to lowercase to avoid issues with case sensitivity.
– Stop Word Removal: Remove stop words such as "the," "and," "a," etc., that do
not contribute much to the meaning of a sentence.
MODULE DESCRIPTION
3.Feature Extraction
– In the context of detecting fake online reviews, feature extraction involves
extracting meaningful features from the preprocessed text data to
distinguish between genuine and fake reviews.
Home Page
Admin Login
Admin dashboard
ADD HOTEL
LIST OF HOTELS
CUSTOMER SIGNUP
Customer Login
Customer Dashboard
Review Page
Customer review
Result page
REFERENCES
1. A. Bitarafan and C. Dadkhah, “SPGD_HIN: Spammer group detection based
on heterogeneous information network,” in Proc. 5th Int. Conf. Web Res.
(ICWR), Apr. 2019, pp. 228–233.
2. J. Soni, N. Prabakar, and H. Upadhyay, “Feature extraction through deepwalk
on weighted graph,” in Proc. 15th Int. Conf. Data Sci. (ICDATA), 2019, pp. 1–
7.
3. J. K. Rout, A. Dalmia, K.-K.-R. Choo, S. Bakshi, and S. K. Jena, “Revisiting
semi-supervised learning for online deceptive review detection,” IEEE Access,
vol. 5, pp. 1319–1327, 2017
6. Z. Wang, T. Hou, D. Song, Z. Li, and T. Kong, “Detecting review spammer groups via bipartite graph
projection,” Comput. J., vol. 59, no. 6, pp. 861–874, Jun. 2016.
7. Amresh Kumar; Manish Kumar;Anandhan.k; Ajay Shanker Singh, “Implementation Of Fake Review
Detection Using Passive Aggressive Classifier ,” Jun 2022 .