0% found this document useful (0 votes)
8 views

A Review On Various Techniques For Spam Detection: Nitesh J. Kawale Dr. Saad Yunus Sait

The document discusses techniques for detecting spam reviews. It provides an overview of conventional and machine learning methods for spam filtering. Three techniques are applied in the proposed analysis: review-based, review-linguistic, and user-based behavioral analysis. The spam detection problem is converted to a heterogeneous network analysis to identify spam reviews and users.

Uploaded by

asmm.rahaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

A Review On Various Techniques For Spam Detection: Nitesh J. Kawale Dr. Saad Yunus Sait

The document discusses techniques for detecting spam reviews. It provides an overview of conventional and machine learning methods for spam filtering. Three techniques are applied in the proposed analysis: review-based, review-linguistic, and user-based behavioral analysis. The spam detection problem is converted to a heterogeneous network analysis to identify spam reviews and users.

Uploaded by

asmm.rahaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)

IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

A Review on Various Techniques for Spam Detection

Nitesh J. Kawale
Department of Computer Science and Engineering, Dr. Saad Yunus Sait
(Research Associate Professor)
2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) | 978-1-7281-9537-7/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICAIS50930.2021.9395979

SRM Institute Of Science & Technology.


Kattankulathur, TamilNadu 603203, India Department of Computer Science and Engineering,
Email id:- [email protected] SRM Institute Of Science & Technology
Kattankulathur, TamilNadu 603203, India,
Email id:- [email protected]

Abstract—S pam has become a major Internet and electronic 2014 the Govern ment of Canada issued an advise
communication problem in recent years. Many techniques have "encouraging consumers to pay close attention to fakes online
been developed to combat them. This paper provides an overview approvals that suggest that standard consumers were making
of the current method of revision of spam filtering. The them." Since spam screening is an issue that is common and
classification of conventional and learning methods, evaluation
and correlation are provided. S ome personal enemies are tested damaging, developing ways of helping co mpanies and users
and compared for spam articles. The statement for a new spam conduct honest reviews by falsifying reviews is a significant
filtering methodology is taken into account. but difficult problem.

Keywords—Spam Filtering, Machine Learning, Learning Base In the proposed analysis of this spam evaluation, three
Methods,Classification, Fake Review Detection. techniques were applied: the review-based (RB) functionality,
the review-linguistic (RL), and the user-based behavioural
(UB). The spam detection problem is converted into a
I. INT RODUCT ION heterogeneous network analysis of informat ion. Users set up a
The number and effect of online feedback is increasingly network, their reviews and network spam reviews are
growing as the internet increase in both scale and significance. identified as a node. Every type of examination is assigned
Reviews can impact individuals across a number of industries, weight. Spam detection technique determines the importance
but are particularly important for e-co mmerce, where a buyer based on the weight of the feature.
often has to comment or review items the most convenient
way of determining whether to receive co mments and reviews. II. REVIEW OF LIT ERAT URE
Online rev iews can be created for many reasons. To imp rove
and grow their business, online retailers as well as providers of Dixit et al.[2] suggest in the literature a spam review
services often ask their customers to be pleased with the divided into three meetings:
experience they have acquired with their goods or services. In (1) In the case of remarks only concerning the brand or the
case of an extremely positive or negative experience, vendor and negligence.
consumers should also be prepared to inspect a product or (2) The key issue of this paper is the brand analysis -untrue
service. While the seller and buyer have seen their confidence review.
impaired, online feedback will help. Many people engage in (3) Non-reviews – reviews of unrelated text or advertising.
online review prior to online application but reviews can be Flawed review is the main category because it undermines the
poisoned, false or increased, and so decisions based on online integrity of the system of online exams.
reviews should be taken with caution. However, business
owners may invite someone else to judge their goods or allow Spam detection type 1 is difficult because, if not impossible
anyone to make horrific co mparisons of their co mpetitors' by reading it physically, it is difficult to identify false and
services or products. Such inaccurate co mments are seen as actual reviews. In order to show the problems of this
spam reviews and because of the popularity of reviews, can undertaking we take the data set generated by Ott et al.[3] is a
have a strong impact on the online market. fake rev iew examp le. As a human judge, it is hard to trust
what revision is fake and genuine.
Spam review can also have a negative impact on companies
by losing consumers ' trust. The problem is sufficiently serious Review 1: Nice hotel this facility has been turned into
to draw attention to the mainstream govern ments and media. suites/studios. We just had a splendid studio, and we could not
For example, in the New Yo rk Times, "fake rev iews are imagine the suite imp roving. The fu lly equipped kitchen
becoming a co mmon Internet issue and a photographic includes a micro wave and freezer. The washroom was big and
organisation has recently been reporting hundreds and had everything again including high-quality bathroom items.
hundreds of defamatory consumer reviews." Examp les are. In The hotel features a good exercise facility, swimming pool

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 1771

Authorized licensed use limited to: Jahangirnagar University. Downloaded on September 05,2021 at 04:33:32 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

and good clothing. Free b reakfast every mo rning was also become unreliable, making it difficult to use these calculations
great and a great choice. Every morning. The hotel stops were to detect spam[4]. Review spam detection thus constitutes a
safe and cost-effective. It was centrally located and easily large data issue because there are several difficult ies when
reached by metro. I 'm sure I’d stay here again. disassociated sources break down and arrange changing
reviews.
Review 2: On my latest business trip, my wife and I stayed at
Hotel Omni Chicago in Chicago, Illino is, in one of their Info mining and machine learning techniques are an
delu xe suites. Sadly, we were not entirely satisfied with the excit ing co mmit ment to fraudulent evaluation, mainly for the
hotel and I believe I am speaking for both of us. The hotel web and text mining industry. As Liu [5] points out, "the web
advertises facilities at the extravagance level, and the rooms mining process is the way to find useful information and
are definitely below average as you can see on the photos. relationships with the web content through the use of the
When you check-in, you expect service that goes beyond fresh technology and the methods available for mach ine learning.
towels in the toilet if you intend a stay at such a facility. The "Three types of tasks: structure, usage min ing and content are
first thing that seemed to be important was to cool up the room available for web mining. Content mining deals with the
with a new philtre, and the air turned out smelly when it was extraction of knowledge and data and categorization of
first activated. organisations using informat ion mining and mach ine learning.
The evaluation of mining is a clear examp le of content mining.
Secondly, until 10:30 p.m. the gy m is only open. This can Feeling mining consists of trying to find out the feeling of a
certainly be a problem for people who want to practise after text passage (i.e. positive or negative ext remity) by
dinner. Especially given that the fitness centre is not available investigating the characteristics of this passage.By breaking
until late or without stopping. For these reasons I would not down the textual features associated with different conclusion s,
recommend this hotel, just like fo r other co mparable reasons, a classifier can be trained to organise new instances. Spam
if you would look for extravagance services. detection, such as feeling min ing, is in the Content Mining
category and uses features which are not linked directly to the
The two reviews show that the first review is true when the contents [6]. Describe Text mining and Natural Languages
second one is a spam, but no clear signs or flags show the easy Processing (NLP) to develop features for the review text.
to reader. However, the Consumerist [18] and Moreover, the author, features might be correlated with the
MoneyTalksNews [19] websites provide guidance that helps posting and how the review varies fro m different reviews for
users to find fake rev iews. In order to find out these features in the similar item or service.
the review, a co mputer scientist may try to make use of this
argument when preparing information mining and master Although most current machine learn ing methods are not
learning calculations to determine the chance that it is real or sufficiently powerful to control the detection of spam, they are
fake. considered to be mo re effective than manual detection. The
principal issue identified by Abbasi etc.[7] is that no
There were more than 18 million rev iews on Yelp in 2014 distinctions can be found (features) to show how reviews are
and over 200 million rev iews by Trip Advisor. On line reviews described as true or false. A typical methodology for text
are always generated over the Internet on various websites. mining is the use of a word package where single words or
Big data techniques are therefore necessary to address the collections of small words are used as features, but several
issue of spam rev iew. Large data are often quantified with the studies have shown that this isn't enough to produce an
four vs, while an overused trendy expression with an elusive adequately performed spam detection clas sificat ion. As a
definition: consequence, additional functional engineering (extraction)
(1) Volume – information's sheer size and scale, approaches are used to extract an insightful set of functions to
(2) Speed – rates by which processing engines generate and boost spam identification, need to be investigated. Nu merous
consume new information; studies in the literacy examine a nu mber of machine learn ing
(3) Variety – the various organisations that can store techniques used to examine the detection of spam.
information, and
(4) Truthfulness – Information quality level. Revised texts were published in the syntactic and the lexical
features of Jindal et al.[8], Li et al.[9], Mukherjee et al.,[10]
Notice the on line volu me and speeds only while visit ing e- and Shojae et al.[15]. In spite of unigramma and bigram term-
commerce and customer rating pages, e.g. A mazon and Yelp. frequencies, Ott et al. [12] used an additional review.
In addition to a variety of languages in which rev iews are
written, there are many different business sectors (such as Further investigation is also necessary in connection with
hotels, restaurants, e-commerce and homemakers, etc.). the conduct of the reviewer. The review spam authors'
Veracity is an online review problem, because most reviews investigation differs fro m the review spam itself because
are by far undeveloped, so it is not easy to know whether they features that represent the features and behaviours of
are false or not. In dealing with data of such scale, standard reviewers cannot be extracted fro m one review text . An
mach ine learn ing calculations often appear to break down and example is the identificat ion of several IDs for the same

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 1772

Authorized licensed use limited to: Jahangirnagar University. Downloaded on September 05,2021 at 04:33:32 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

creator[13] and spammer gatherings, as well as their


behavioural imp ressions [14-16], for example. Examples of
how spammer behaviour. Chart theory-based approaches can
also be used to define relationships between the reviews and
their respective creators and show promising results[17, 18].
Consolidation of the assessment of spam detection by rev iew
features and the examination of spammers by conducting a
review could be more effect ive than one approach alone to
detect spam.

In the first place, we should deal with in formation


collection before addressing the challenges of improving spam
detection. It is difficult to collection and label on the Web
enough reviews to prepare a review spam classification and
keep in mind that a large amount of reviews are available on
the Web. Informat ion fo rms an important part of every Fig 1.System Design
mach ine learning model. An alternative to collect and label
data is the erroneous use of spamming synthetic reviews to B. System Description:
create spam review data sets, which take honest reviews fro m
The first step is to generate previous knowledge when reports
the data and make false reviews. In order to create a review
come fro m other train ing sets and the input evaluation set of
spam dataset, sun el al.[19] used this tool.
data is handled in three stages, according to previous
knowledge.
In this article[20] we talk about the proposed techniques for
machine learn ing for online spam review detection, with a B.1 Training File Generation:
focus on the engineering of characteristics and their effect on
spam detector efficiency. The existing research findings are The first step is to compute previous knowledge, i.e. the first
analyzed and compared to the benefits of supervised, chance of a spam test u, which is referred to as yu . It uses two
unintended, and semi-supervised learning approaches in versions as semi-controlled and uncontrolled learning.
combination with a comparative analysis. Finally, we suggest
aspects of the spam detection revision that require further In the semi-supervised method,
testing as well as best practices for future research. To the best
of our knowledge, this paper includes almost all of the data sets Yn = 1, if review n is labeled as spam in pre-labeled review.
used or generated in the literature reviewed.
Yn = 0, if the review label is unknown.
III. ANALYSIS AND PROBLEM FORMULAT ION
The effect on the user buying behaviour of the product on In the unsupervised method, prior knowledge is realized by
online review of social media sites. Selling the product using,
depends on the evaluation of the product. The spammer ଵ
generates a spam review of the product to improve product ‫ݕ‬௡ ൌ ቀ ቁ σ௅௟ୀଵ ݂ሺ‫ ݈ݔ‬௡ ሻ (1)

sells or to reduce the product sales of the opponent. The field Where, f (xl n ) is the probability of review ‘n’ being spam.
of work is the programmatic analysis of online review and the
identification of spam reviews. B.2 Network Schema:

Heterogeneous network spam information spam can be Based on a list of spam characteristics the immediate step is
performed using different methods such as examinations the Network Schema for determin ing the characteristics
Linguistic, time and threshing limit, stiffness of reviews by one involved in the detection of spam. For examp le in the case, the
user, average negative user ratio, etc. spam screening and output framework showing the following examp le is when the
spammer’s community detection may occur. Provides list of features includes ACS, NR, ETF and PP1
simultaneous review and weighting features to analyses the
importance of spam detection and the identification of
spammer community detection.

IV. PROPOSED SYST EM


A. System Design:
The following Figure 1 presents the steps involved in the
finding of spam reviews and spammers:

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 1773

Authorized licensed use limited to: Jahangirnagar University. Downloaded on September 05,2021 at 04:33:32 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

algorith ms using data min ing methods for the learning is a


very forward-looking approach.

This statement is supported by the followings:


x Customized server filtering systems are preferable than
customer-based solutions as they offer universal access to
email, cut costs, which for corporate users are very
important;
x The more precise and fewer errors compared to the
overall model are more personalised server-side filtering
systems;
x Customized server-side filling system, wh ich can be
Fig 2. Network Schematic for Schema generated based on a given spam applied in all countries in a different article of the Author
features list on the basis of the Universal Declaration of human rights;

In custom server-side filtres, learning-based algorithms go


beyond conventional ones because of a range of fundamental
B.3 Metapath Definition and Creation: qualities (filtering quality, lack of updates, autonomy and
independence from external knowledge).
The relationship pattern in the system pattern describes the
metapath. The metapaths used in the whole methodology are
presented in Table 1. The metapath length is four and the
REFERENCES
metapath length is 2 for the review. Spam assurance is
available in d ifferent levels. To evaluate these levels, a phase
feature is used to. In view of the u review, the spam level is [1] Lau RY, Liao SY, Kwok RCW, Xu K, Xia Y, Li Y (2011) Text mining
calculated using the metapath pl; and probabilistic language modeling for online review spam detecting.
ACM T rans Manage Inf Syst 2(4):1–30
௣ ሾ௦‫כ‬௙ሺ௫௟ೠ ሿ [2] Dixit, S and Agrawal, A.J. (2016) "SURVEY ON REVIEW SPAM
݉௨ ೗ ൌ (2) DETECTION," International Journal of Computer and Communication

T echnology: Vol. 7 : Iss. 1 , Article 9
Where, s is the number of levels. [3] Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion
spam by any stretch of the imagination. In: Proceedings of the 49th
Annual Meeting of the Association for Computational Linguistics:
The metapath pl is connected to each other after the Human Language T echnologies-Volume 1 (pp. 309–319). Association
calculation with two reviews; v and u with the metaphor for Computational Linguistics
values and a network connection is formed. [4] López V, del Río S, Benítez JM, Herrera F (2015) Cost -sensitive
linguistic fuzzy rule based classification systems under the MapReduce
The proposed framework uses s = 20 levels.
framework for imbalanced big data. Fuzzy Sets Syst 258:5–38
[5] Bing L (2008) Web Data Mining. Book. Springer, Berlin Heidelberg
New York
[6] Bandakkanavar RV, Ramesh M, Geeta H (2014) A survey on detection
V. CONCLUSION
of reviews using sentiment classification of methods. IJRIT CC
After reviewing the above-mentioned literature we co me to 2(2):310–314
[7] Abbasi A, Zhang Z, Zimbra D, Chen H, Nunamaker JF Jr (2010)
the follo wing conclusion. Spammers always change external Detecting fake websites: the contribution of statistical learning theory.
e-mail signs so as to prevent spam filtres. A suitable filter MIS Q 34(3):435–461
system needs to be put in place to allow quick reaction to [8] Jindal N, Liu B, Lim EP (2010) Finding unusual review patterns using
changes and speedy tuning and the qualitative tuning of a new unexpected rules. In: Proceedings of the 19th ACM international
collection of features. conference on Information and knowledge management. (pp. 1549–
1552). ACM, Toronto, ON, Canada
Since the filters on the current clients and servers fiiltre [9] Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review
systems are of extremely poor quality, as filt res are trained in spam. In: IJCAI Proceedings-International Joint Conference on Artificial
a very small nu mber of messages that only a single user or Intelligence, vol 22, No. 3., p 2488
mail provider is transmitted. The hybrid philtre method, i.e. [10] Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in
consumer reviews. In: Proceedings of the 21st international conference
the dynamic h ierarch ical and mult i-agent filtre system, can on World Wide Web. (pp. 191–200). ACM, Lyon, France
however be enhanced by helping users recognise the filterin g [11] Shojaee S, Murad MAA, Bin Azman A, Sharef NM, Nadali S (2013)
error on each level (user levels, mail provider level, Detecting deceptive reviews using lexical and syntactic features. In:
Intelligent Systems Design and Applications (ISDA), 2013 13th
organisational level), and the required filtre setup. Therefore, International Conference on (pp. 53–58). IEEE, Serdang, Malaysia
the combination of two broad approaches such as using a [12] Ott M, Cardie C, Hancock JT (2013) Negative Deceptive Opinion Spam.
model for personal review classification on a server side In: HLT -NAACL., pp 497–501
solution is quite likely to solve this problem. Development of [13] Qian T, Liu B (2013) Identifying Multiple Userids of the Same Author.
personalised server-side filtering systems using classification In: EMNLP., pp 1124–1135
[14] Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh
R (2013) Spotting opinion spammers using behavioral footprints. In:

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 1774

Authorized licensed use limited to: Jahangirnagar University. Downloaded on September 05,2021 at 04:33:32 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

Proceedings of the 19th ACM SIGKDD international conference on [18] Popken B, “30 Ways You Can Spot Fake Online Reviews”,
Knowledge discovery and data mining (pp. 632–640). Chicago, ACM. Consumerist, 14 Apr. 2010, https://consumerist.com/2010/04/14/how-
[15] Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the you-spot-fakeonline-reviews. Accessed 20 Mar. 2021.
16th international conference on World Wide Web (pp. 1189–1190). [19] Morales A, Sun H, Yan X (2013) Synthetic review spamming and
ACM, Lyon, France. defense. In: Proceedings of the 22nd international conference on World
[16] Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of Wide Web companion (pp. 155–156). International World Wide Web
deceptive product reviews. ICWSM 12:98–105 Conferences Steering Committee, Rio de Janeiro, Brazil
[17] Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via [20] Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review
temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD spammers via social review graph. ACM Transactions on Intelligent
international conference on Knowledge discovery and data mining (pp. Systems and T echnology (TIST) 3(4):61
823–831). ACM, Beijing, China

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 1775

Authorized licensed use limited to: Jahangirnagar University. Downloaded on September 05,2021 at 04:33:32 UTC from IEEE Xplore. Restrictions apply.

You might also like