0% found this document useful (0 votes)
25 views

A Multilingual Spam Review Detection

The usage of internet services and the World Wide Web has become very common these days, particularly during the Covid-19 epidemic that led to the nationwideinstallation of lockdowns, social isolation, and other precautionary measures. Online platforms facilitate the provision of vast quantities of goods and services, which in turn generates a substantial amount of information. On online purchasing sites, customers have the ability to provide reviews for goods or services they have purchased.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

A Multilingual Spam Review Detection

The usage of internet services and the World Wide Web has become very common these days, particularly during the Covid-19 epidemic that led to the nationwideinstallation of lockdowns, social isolation, and other precautionary measures. Online platforms facilitate the provision of vast quantities of goods and services, which in turn generates a substantial amount of information. On online purchasing sites, customers have the ability to provide reviews for goods or services they have purchased.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 9, Issue 2, February 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

A Multilingual Spam Review Detection


Dr. Kavitha. C1, Arun Kumar K Y2, Dayananda J V3, Harsha Vardhan4, Gopa Sailesh5
1
Head of Computer Science Engineering, 2,3,4,5Student of Computer Science Engineering
Dayananda Sagar Academy of Technology and Management

Abstract:- The usage of internet services and the World Our methodology embraces a holistic analysis,
Wide Web has become very common these days, intertwining linguistic forensics, sentiment dissection, and
particularly during the Covid-19 epidemic that led to the behavioral scrutiny to create a comprehensive fake review
nationwide installation of lockdowns, social isolation, and detection framework. By understanding the mosaic of
other precautionary measures. Online platforms linguistic subtleties, the emotional undertones, and the
facilitate the provision of vast quantities of goods and behavioral signatures, our approach transcends the confines
services, which in turn generates a substantial amount of of traditional detection methods, offering a more robust and
information. On online purchasing sites, customers have adaptive solution to the burgeoning challenge of deceptive
the ability to provide reviews for goods or services they reviews. an era where trust is a fragile commodity,
have purchased. These reviews are helpful to the safeguarding the integrity of online platforms requires a
company and the customers in coming to decisions about dynamic and advanced approach to fake review detection.
business strategies and enhancements to the product or
service. Conversely, some companies hire writers to II. BRIEF OVERVIEW OF A MULTILINGUAL
submit false positive reviews of their own goods or SPAM REVIEW DETECTION USING
services or deceptive negative remarks about those of MACHINE LEARNING TECHNIQUES
their competitors.
A. Machine Learning-Based Fake Reviews Detection
I. INTRODUCTION This study aims to find and evaluate existing
techniques for detecting fraudulent reviews. An effective
A. Fake Review Detection: A Brief Introduction technique in detecting phoney reviews evaluates a review's
In the era of online commerce and information integrity, the reviewers' reputation, and the dependability of
overload, consumer trust is paramount. However, this trust the product or service.
is increasingly jeopardized by the proliferation of fake
reviews manipulative testimonials designed to mislead B. Description of the Fake Reviews Data Set
potential customers. While existing methods often focus on A number of approaches, most notably the Machine
plagiarism detection, our approach seeks to uncover the Learning technique, have been established prior to the
subtleties of deception without relying on copied content. detection of bogus reviews. Supervised, unsupervised, and
semi-supervised learning approaches in machine learning
Fake reviews pose a significant challenge due to their make it easy to analyse several types of data, including
potential to influence consumer decisions, tarnish brand partially labelled, tagged, and unlabelleddata.
reputations, and create an atmosphere of distrust in online
platforms. C. Top 10 Machine Learning Algorithms for Fake Reviews
Detection
Instead, we delve into the intricacies of linguistic Support vector machines, K-Nearest Neighbours
patterns, sentiment analysis, and user behavior to identify (KNN), Neural Networks (Deep Learning), Random Forest,
the underlying markers of deception. By understanding the Gradient Boosting Machines, Recurrent Neural Networks
psychology behind fake reviews, our method aims to (RNN) and Long Short- Term Memory (LSTM), Naive
distinguish between authentic and manipulated content Bayes, and Ensemble Methods. The type of data, the amount
without relying on the presence of plagiarized material. of data available, and the particular traits of the phoney
reviews you're attempting to identify all play a role in the
As online platforms continue to be battlegrounds for algorithm selection.
consumer trust, our innovative approach to fake review
detection without plagiarism offers a robust solution. By D. Confusion Metrics for Models
combining linguistic analysis, sentiment assessment, user The confusion metric, a visualisation of a classification
behavior scrutiny, and contextual understanding, our system model, shows how effectively the model is projected to the
aims to provide a more accurate and comprehensive means outcomes that were previously linked to the early ones. The
of identifying deceptive reviews. As we delve into the confusion metrics may be visualised by using the
intricate layers of deception, we contribute to the ongoing association table as a heatmap.
effort to foster transparency and reliability in the digital
marketplace.

IJISRT24FEB1396 www.ijisrt.com 1494


Volume 9, Issue 2, February 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
E. Accuracy of Machine Learning Algorithms towards positive reviews, which may lead to challenges
Popular models including K-Nearest Neighbours in effectively detecting negative fake reviews.
(KNN), Neural Networks (Deep Learning). The accuracy of  The proposed methodology utilizes feature selection
the classifiers is shown by the reported results of applied based on relationship words, sentiment word count, and
machine learning techniques. part of speech tagging, contributing to the overall
classification accuracy and authenticity of the results.
F. Distribution of the Data  The document highlights the significance of the 80-20
Data distribution in a fake review detection system is dataset split as the best dataset split for training and
analyzing in the distribution of various attributes and testing, leading to improved accuracy results in detecting
features across reviews in variances in sources, user fake reviews.
behaviors, and contextual variables. The efficacy of the  The proposed SKL-based fake review detection model
system hinges on a thorough comprehension of these achieves This entails examining textual and metadata
distributions, which directs the building of machine learning characteristics, 95% accuracy on the Yelp dataset and
89.03% accuracy on the making sure that real and
G. Comparative Analysis of Machine Learning Algorithms fraudulent reviews are distributed fairly, Trip Advisor
Accuracy and measure metrics are used in a dataset, outperforming other state-of-the-art taking into
comparative examination of applied machine learning account time fluctuations, and comprehending
algorithms for fake review identification, offering insights techniques[1].
into the effectiveness of engineering techniques, which  The document emphasizes the novelty of the research,
contribute to the overall classification accuracy and models to precisely detect fraudulent reviews amidst
authenticity of the results. particularly in the multi-level feature extraction system
and feature heterogeneous data patterns.
III. REVIEW OF PAPER 1
B. Demerits:
The paper explores the impact of fake reviews on e-
commerce during and after the Covid-19 pandemic and  Imbalanced Dataset: The Yelp dataset used for
presents a SKL- based fake review detection model[1]. It is the various models.
organized into sections, with a literature review covering the  Limited Comparison: The document compares the
challenges of identifying machine-generated or user- proposed model with state-of-the-art methodologies
generated spam reviews and the increasing sophistication of using a similar dataset, but it does not provide a
fraudulent comments in the e-commerce sector[1]. The comprehensive comparison with a wide range of existing
proposed methodology involves using Text Classification models and techniques in the field of fake review
and Machine Learning techniques, including the bigram detection.
probability model, sentiment analysis, and part of speech  Limited Generalization: The document does not
tagging, to detect fake online reviews. The document also extensively discuss the generalization of the proposed
discusses the dataset collection, experimentation design, and model to different types of datasets or platforms, which
statistical analysis, highlighting the effectiveness of the may limit its applicability in diverse e-commerce
proposed model in detecting fraudulent reviews on platforms settings.
such as Yelpand Trip Advisor.
 Lack of Robustness Testing: The document does not
explicitly mention robustness testing of the proposed
The outbreak of Covid-19 and the subsequent surge in
model under various scenarios or against different types
online shopping due to lockdown and social distancing
of fake reviews, which is crucial for assessing its
measures have intensified the competition between
reliability in real-world applications.
companies in the e- commerce sector[1]. The significance of
 Limited Discussion on False Positives: The document
online reviews in influencing consumer decisions and the
does not thoroughly address the potential issue of false
challenges posed by fraudulent or fake reviews are also
positives in the detection process, which is essential for
emphasized. The proposed SKL-based fake review detection
understanding the model's limitations in accurately
model outperforms other state- of-the-art techniques,
identifying fake reviews.
achieving 95% accuracy on the Yelp dataset and 89.03%
accuracy on the Trip Advisor dataset. The document also  Limited Scalability Discussion: The document does not
provides a comprehensive literature review, statistical provide detailed insights into the scalability of the
analysis, and details of the dataset collection and proposed model, especially in handling large volumes of
experimentation design. reviews and real-time detection requirements..

A. Merits: IV. REVIEW OF PAPER 2

The paper "Detecting Fake Reviews through Sentiment


 The experimental results, including precision, recall, f-
Analysis Using Machine Learning Techniques" presents a
score, and accuracy, show that the Support Vector
study conducted by Elshrif Elmurngi and Abdelouahed
Machine (SVM) outperforms K Nearest Neighbor
Gherbi from École de Technologie Supérieure in
(KNN) and Logistic Regression (LR) in detecting fake
Montreal[2], Canada. statistical analysis, and details of the
reviews[1]. experimentation is imbalanced, biased
dataset collection and experimentation design.

IJISRT24FEB1396 www.ijisrt.com 1495


Volume 9, Issue 2, February 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
A. Introduction: I. Optimization:
Highlight any fine-tuning or optimization steps taken
 Overview of the Issue: Begin by introducing the to enhance the model's precision, recall, or overall
prevalence and impact of fake reviews in online performance.
platforms.Importance of Sentiment Analysis: Highlight
the role of sentiment analysis in discerning the J. Deployment:
authenticity of reviews. Application to Drill Bits:
Establish the relevance of the study specifically to the  Implementation Plan: Outline how the trained model can
domain ofdrill bit reviews. be deployed to analyze and classify new drill bit reviews.
Real-world Applications: Discuss potential applications
B. Data Collection: and benefits of the system in real-world scenarios.

 Dataset Sources: Clearly state where the drill bit review K. Monitoring and Updating:
dataset was collected, emphasizing the need for
diversity. Data Validation: Discuss steps taken to  Continuous Improvement: Emphasize the importance of
validate the authenticity and diversity of the collected ongoing monitoring to ensure the model's effectiveness
data[2]. over time.
 Adaptation to Changes: Discuss strategies for updating
C. Preprocessing: the model to adapt to evolving patterns of fake reviews
in the drill bitdomain.
 Cleaning Steps: Describe the preprocessing steps
undertaken to clean and prepare the drill bit reviews for V. REVIEW OF PAPER 3
analysis. Domain- specific Considerations: Address any
challenges unique to the domain of drill bits and how The paper "Fake Reviews Detection: Survey"
they were handled during preprocessing. emphasizes the significance of online customer reviews in
the digital age[3]. These reviews serve as a form of social
D. Feature Extraction: proof, influencing consumer purchasing decisions and
shaping the reputation of businesses[3]. The authors
 Numerical Representation: Explain the chosen method highlight the potential financial implications of both positive
for converting textual reviews into numerical features. and negative reviews, noting that customer feedback can
Incorporation of Domain-specific Features[2] Discuss lead to product improvements and impact marketing
any unique features relevant to drill bit reviews that were strategies. The introduction also touches on the darker side
included in the analysis. of online reviews, where fake reviews are posted with the
intent to mislead consumers[3]. These deceptive opinions,
E. Annotation Process: often posted by individuals or groups with vested interests,
Detail how the dataset was annotated, specifying the can unfairly promote or criticize products, leading to an
criteria used to label reviews as genuine or fake. Challenges imbalance in the marketplace. The authors argue that the
in Labeling: Discuss any difficulties faced in distinguishing detection of fake reviews is crucial to maintain the integrity
fake reviews within the context of drill bits. of online review systems and toprotect consumers from false
information. The document outlines the structure of the
F. Model Selection: survey, which includes a review of feature extraction
techniques, an examination of existing datasets, and an
 Algorithm Choice: Provide rationale for selecting a analysis of machine learning models applied to fake review
particular machine learning algorithm for sentiment detection[3]. The authors aim to provide a comprehensive
analysis. Customization for Drill Bits: Explain any overview of the state of the art in fake review detection,
adjustments made to the chosen algorithm to tailor it identify gaps in the current research, and suggest directions
specifically for drill bit reviews. for future studies.

G. Model Training: A. Merits:


Detail how the dataset was split into training and
testing sets.Share insights into the training phase, including  Combination of Features: Using a combination of
parameters tuned to optimize performance for drill bit features to train the classifier has been found to achieve
sentiment analysis. better performance than using a single type of feature[3].
 Behavioral and Text Features: Using a combination of
H. Evaluation: behavioral and text features has been shown to
significantly improve fake review detection model
 Performance Metrics: Present the results of the model's performance.
performance using accuracy, precision, recall, and F1  N-gram Features: BoW features, such as unigram,
score. Effectiveness in Detecting Fake Reviews: Discuss bigram, and trigram, have been used in various fake
how well the model performs in identifying fake reviews review detection methods[3], providing different results
within the domainof drill bits. on multiple datasets.

IJISRT24FEB1396 www.ijisrt.com 1496


Volume 9, Issue 2, February 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Semantic features: Semantic features present the limitation in leveraging all available data for detection.
concepts or underlying meaning of words, and have been
found to be better than other features such as LIWC, VI. REVIEW OF PAPER 4
POS, and n-gram in cross- domain.
 Ensemble Learning Model: An ensemble learning In addition, the paper provides a comprehensive
model consisting of multiple classifiers and feature literature survey, including various methods [4]and models
selections has been proposed to detect fake reviews, used for fake review detection, such as Word2Vec-LSTM,
achieving high accuracy on re3a.l- life and semi-real BERT, and ELMo. It also outlines future research
datasets. directions, including the development of text enriches
 Deep Learning Methods: Hierarchical CNN-GRN deep columns and ensemble modeling for improved performance.
learning methods and Multi Instant Learning (MIL)
methods have been proposed to handle variable A. Merits:
lengths of review 4s., outperforming classical CNN and
RNN on multiple benchmark datasets.  Improved Accuracy: Deep learning hybrid models can
 Evaluation and Performance: Various models and enhance the accuracy of fake reviews classification
methods have been evaluated on real-life datasets, compared to traditional methods. The ability of these
showing improved performance in fake review detection models to automatically learn intricate patterns and
with high accuracy. representations in data contributes to more precise
 Handling Non-linearity: Deep learning models, by predictions.
nature, can capture non-linear relationships in data,  Feature Learning: Deep learning excels at feature
which may be crucial for distinguishing between genuine learning, enabling the model to autonomously identify
and fake reviews that might exhibit complex patterns. relevant features and representations from the input
 Scalability: Deep learning models are often scalable, data[4]. This can be advantageous for capturing nuanced
allowing them to handle large datasets efficiently[4]. patterns indicative of fake reviews.
This scalability is beneficial when dealing with vast
amounts of review data in real-world scenarios. Ones, often lack interpretability. Understanding the
inner workingsof the model and the rationale behind specific
B. Demerits: predictions can be challenging, raising concerns about
transparency.
 Data Requirements: Deep learning models typically
demand large amounts of labeled data for training.  Overfitting: Deep learning models are susceptible to
Obtaining a comprehensive and diverse dataset for fake overfitting. This can result in the model performing well
reviews may pose a Imbalanced Dataset Performance: on the training data but failing to generalize effectively
Some proposed models did not perform well with to new, unseen data.
imbalanced datasets, leading to reduced challenge,  Complexity for Small Datasets: In scenarios where the
especially in niche domains. Accuracy and effectiveness dataset is relatively small, the complexity of deep
in detecting fake reviews. learning models might lead to overfitting, diminishing
 Computational Complexity: Training deep learning their performance on unseen data.
modelscan be computationally intensive and may require  Resource Intensiveness: Deploying and maintaining
significant High Computational Resources: Certain deep learning models may require substantial
models require high computational resources, making computational resources and expertise, making them less
them less efficient and scalable for practical use. accessible for smaller organizations or those with limited
Resources, both in terms of hardware and time[4]. This technical capabilities.
complexity can limit the accessibility of these models in
certain environments. VII. REVIEW OF PAPER 5
 Interpretability: Deep learning models, particularly
complex. The paper introduces the growing importance of fake
 Limitations in Short Text Detection: Some models are news detection in the context of online media and its impact
not effective in handling short texts, with reduced on social and political movements[5]. It highlights the
performance for reviews containing less than 20 words. challenges associated with fake news detection, emphasizing
 Semantic Information Capture: Certain models failed 4t.o the need for models to not only understand natural language
but also incorporate world knowledge into their
capture the semantic information of sentences, limiting
computations[5].
theirability to effectively identify deceptive reviews.
 Ineffective Cross-Domain Detection: Some models did
A. Merits:
not achieve the best results in cross-domain detection,
indicati5n.g limitations in adapting to different review
 Adversarial Benchmark: The paper introduces an
contexts and domains. adversarial benchmark designed to test the reasoning
 Ignoring Reviewer Information: Some models ignored capabilities of fake news detection models, addressing the
reviewer information, which could potentially improve limitations of current techniquesin this field.
t6h.e classification model performance, indicating a

IJISRT24FEB1396 www.ijisrt.com 1497


Volume 9, Issue 2, February 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Adversarial Attacks: The document presents three REFERENCES
specific adversarial attacks - negation, party reversal[5],
and adverb intensity - to evaluate the models' [1]. Raj S.B.E., Portia A.A., An Analysis of Credit Card
understanding of text and real- world facts. Fraud Detection Methods, International Conference
 Experimental Setup: The authors fine-tune BERT on Computer, Communication and Electrical
classifiers on the LIAR and Kaggle Fake-News datasets Technology (ICCCET) (2011), 152-156.
and apply the adversarial attacks to test the models' [2]. Jain R., Gour B., Dubey S., A hybrid approach to
performance. credit card fraud detection using rough set and
 Vulnerability Analysis: The results reveal that the decision tree techniques, International Journal of
BERT- based models are vulnerable to negation and Computer Applications 139(10) (2016).
party reversal attacks, while being robust to the adverb [3]. Dermala N., Agrawal A.N., Credit card fraud
intensity attack[5]. The models struggle to respond to detection using SVM and false alarm reduction,
changes in compositional and lexical meaning, International Journal of Innovations in Engineering
highlighting the need for improvement in their reasoning and Technology (IJIET) 7(2) (2016).
capabilities. [4]. Hafiz K.T, Aghili S, Zavarsky P, Using predictive
 Implications and Future Work: The findings emphasize analytics technology to detect credit card fraud, 11th
the need for fake news classification models to be used in Iberian Conference on Information Systems and
conjunction with other fact-checking methods. The Technologies (CISTI) (2016), 1-6.
document also discusses the limitations of the study and [5]. Sonepat H.C.E Survey Paper on Credit Card Fraud
suggests future directions, such as exploring deeper Detection, International Journal of Advanced
model architectures and using more complex adversarial Research in Computer Engineering & Technology
attacks for a more robust evaluation of fake news (2014).
models.

B. Demerits

 Limited Generalization: The models were trained on only


two datasets, and the results may not generalize to
statements unrelated to general US politics, limiting the
broader applicabilityof the findings.
 Computational Limitations: The exploration of shallow
neural network architectures due to computational
limitations may have restricted the depth and complexity
of the models, potentially impacting the robustness of the
evaluation.
 Simplistic Adversarial Attacks: The adversarial attacks
employed in the study were relatively simple, and it is
acknowledged that real humans may be able to negate or
change the intensity of a sentence in more complex
ways, suggesting the need for more sophisticated
adversarial testing.

VIII. CONCLUSION

In conclusion adversarial benchmark for fake news


detection models, aiming to evaluate the reasoning
capabilities of these models. It highlights the vulnerability of
BERT-based models to specific adversarial attacks,
indicating the need for improvement in their reasoning
capabilities. The findings emphasize the importance of using
fake news classification models in conjunction with other
fact-checking methods. Additionally, the document
discusses the impact of data quality on the models' ability to
learn facts and understand text, suggesting that future work
should employ more datasets, explore deeper model
architectures, and use more complex adversarial attacks for
a more robust evaluation of fake news models.

IJISRT24FEB1396 www.ijisrt.com 1498

You might also like