0% found this document useful (0 votes)
14 views9 pages

1 s2.0 S2666307424000196 Main

This paper presents a novel model for detecting fake reviews by combining a transformer-based RoBERTa architecture with LSTM, improving accuracy through semantic and linguistic analysis. The model achieved 96.03% accuracy on the OpSpam dataset and 93.15% on the Deception dataset, outperforming existing methods. Additionally, it employs SHAP and attention techniques to enhance the explainability of its classifications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views9 pages

1 s2.0 S2666307424000196 Main

This paper presents a novel model for detecting fake reviews by combining a transformer-based RoBERTa architecture with LSTM, improving accuracy through semantic and linguistic analysis. The model achieved 96.03% accuracy on the OpSpam dataset and 93.15% on the Deception dataset, outperforming existing methods. Additionally, it employs SHAP and attention techniques to enhance the explainability of its classifications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

Contents lists available at ScienceDirect

International Journal of Cognitive Computing in Engineering


journal homepage: www.keaipublishing.com/en/journals/international-
journal-of-cognitive-computing-in-engineering/

Fake review detection using transformer-based enhanced LSTM


and RoBERTa
Rami Mohawesh a, * , Haythem Bany Salameh a, b , Yaser Jararweh c , Mohannad Alkhalaileh d ,
Sumbal Maqsood e
a
College of Engineering, Al Ain University, Al Ain, UAE
b
Hijjawi Faculty of Engineering Technology, Yarmouk University, Irbid, Jordan
c
Department of Computer Science, Jordan University of Science and Technology, Jordan
d
College of Education, Al Ain University, Al Ain, UAE
e
School of Information Technology, University of Tasmania, Tasmania, Australia

A R T I C L E I N F O A B S T R A C T

Keywords: Internet reviews significantly influence consumer purchase decisions across all types of goods and services.
Deep Learning However, fake reviews can mislead both customers and businesses. Many machine learning (ML) techniques have
Fake Reviews been proposed to detect fake reviews, but they often suffer from poor accuracy due to their focus on linguistic
Transformer
features rather than semantic content. This paper presents a novel semantic- and linguistic-aware model for fake
LSTM
Explainability
review detection that improves accuracy by leveraging advanced transformer architecture. Our model integrates
Foundation models RoBERTa with an LSTM layer, enabling it to capture intricate patterns within fake reviews. Unlike previous
methods, our approach enhances the robustness of fake review detection and authentic behavior profiling.
Experimental results on semi-real benchmark datasets show that our model significantly outperforms state-of-
the-art methods, achieving 96.03 % accuracy on the OpSpam dataset and 93.15 % on the Deception dataset.
To further enhance transparency and credibility, we utilize Shapley Additive Explanations (SHAP) and attention
techniques to clarify our model’s classifications. The empirical findings indicate that our proposed model can
offer rational explanations for classifying specific reviews as fake.

1. Introduction Mohawesh, Tran, Ollington, & Xu, 2020; Mohawesh, Xu, Springer,
Al-Hawawreh, & Maqsood, 2021; Mohawesh, Al-Hawawreh, Maqsood,
In recent years, the Internet has greatly impacted people’s lives. & Alqudah, 2023; Mohawesh, Maqsood, Jararweh, & Salameh, 2023;
Consumers often spend considerable time browsing for product infor­ Mohawesh, Maqsood, & Althebyan, 2023; Myers, et al., 2023). To help
mation, reading reviews, and interacting with others. In addition, the individuals from being misled by incorrect information, internet content
Internet allows users to write their evaluation remarks or reviews on and reviews must be verified.
various subjects according to their experience with some products or There are two main methods to verify internet options and reviews:
services. Those users can provide their opinions and reviews to criticize Human and automated detection. The work in (Ott, Choi, Cardie, &
or promote a variety of goods and services (Saumya, Singh, Baabdullah, Hancock, 2011) stated that, around 57 % of the time, human judges
Rana, & Dwivedi, 2018; Singh, et al., 2017). Customers usually purchase could not detect fake reviews. Therefore, manual identification is not
goods with several positive reviews, which can increase the vendor’s practical. It has been established that automated detection using intel­
income (Saini, Saumya, & Singh, 2017). Moreover, unfavorable ratings ligent technology can be quicker and more accurate than human experts.
can result in financial damages for the involved companies (Ho-Dac, Several research efforts have been conducted to create automatic fake
Carson, & Moore, 2013; Hakak, et al., 2021; Khan, et al., 2021). Since review detection models using traditional ML techniques, some of which
anybody can submit reviews, unfair positive or negative comments have performed relatively well.
about items, services, and businesses are possible (Mohawesh, Liu, Arini, Nevertheless, most of these models have shortcomings, resulting in
Wu, & Yin, 2023; Maqsood, Xu, Springer, & Mohawesh, 2021; poor precision in distinguishing between fake and authentic reviews.

* Corresponding author.
E-mail addresses: [email protected] (R. Mohawesh), [email protected] (H.B. Salameh).

https://doi.org/10.1016/j.ijcce.2024.06.001
Received 21 September 2023; Received in revised form 11 June 2024; Accepted 11 June 2024
Available online 13 June 2024
2666-3074/© 2024 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

Specifically, to detect fake reviews, existing models were designed based 81.82 %, respectively.
on language characteristics and have failed to capture the reviews’ se­ Cagnina and Rosso (2015) used character n-grams, emotion traits,
mantic content. Hence, new semantic-based methods that effectively and LIWC to detect fake reviews. Subsequently, an SVM algorithm is fed
detect fake reviews using ML are still necessary. Lately, transformers or these extracted features to classify the reviews. An aspect of deep lan­
sophisticated pre-trained architectures have garnered significant inter­ guage that is based on a model for identifying bogus reviews was pro­
est in tasks related to text classification and achieved improved out­ posed by Xu and Zhao (2012). Next, they classified reviews using an
comes compared to the existing state-of-the-art methods (Lan, et al., SVM classifier. Fusilier, Montes-y-Gómez, Rosso, and Cabrera (2015)
2019; Liu, et al., 2019; Yang, et al., 2019). A novel semantic-based suggested a model for detecting fake reviews based on character n-gram
model for detecting fake reviews can be developed with the help of content attributes. Then, NB was utilized to detect fraudulent reviews.
transformer architectures or methodologies due to their effectiveness in Recently, a wide variety of deep learning models, including convolu­
building deep contextualized embeddings for different types of texts. tional neural networks (CNN), recurrent neural networks (RNN), and
Nevertheless, the best match for the complete training data could not be gated neural networks (GRU), have been widely used in different ap­
achieved based on a single model due to the variability and dynamism of plications and have produced favorable outcomes (Mohawesh, Maq­
fake review variation attributes (Mohawesh, Liu, et al., 2023; Moha­ sood, Jararweh, et al., 2023; Rifai, et al., 2023; Yin, Liu, Wu, Arini, &
wesh, et al., 2020; Mohawesh, Xu, Springer, et al., 2021; Mohawesh, Xu, Mohawesh, 2023). Extraction of the semantic presentation from
et al., 2023; Mohawesh, Xu, Tran, et al., 2021; W. Khan & Haroon, dimensional data was the focus of these models. Ren and Zhang (2016)
2022). An individual transformer could be too very sensitive or biased presented a recurrent convolutional neural network approach with an
towards some input features, leading to sub-optimal performance. attention mechanism to learning document representation. As expected,
In summary, the key contributions of this paper can be summarised the suggested methodology successfully identified fake reviews. More­
as follows: over, Yafeng Ren and Donghong Ji (2017) proposed a hybrid fake review
detection algorithm using deep neural networks and incorporating
1. Introducing a hybrid strategy that employs the RoBERTa transformer phrase context information (GRNN– CNN). They used both CNN and
and LSTM architectures. The proposed approach categorizes reviews GRU. A successful test of their suggested model on the deception dataset
as either genuine or fake according to their syntactical, grammatical, yielded 83.34 % accuracy. Afterward, Zhang, Du, Yoshida, and Wang
and semantic characteristics. (2018) created a model for recurrent convolutional deep neural net­
2. Integrating the RoBERTa model with an LSTM to identify real or fake works (DRI-RCNN) to detect false reviews by analyzing the circum­
reviews. The proposed technique utilizes accuracy, precision, recall, stances in which words appear. The DRI-RCNN has demonstrated
and F1 score as assessment criteria to determine the leniency of the effective performance of accuracy levels of 80.8 %, and 82.9 % on the
proposed model. deception and spam datasets, respectively. The authors (Mohawesh, Xu,
3. Conducting extensive experimental results to investigate the per­ Tran, et al., 2021) examined several promising ML approaches to detect
formance of the proposed model compared to several state-of-the-art fake reviews. Their experimental findings showed that the RoBERTa
methods. The results reveal that superior performance is obtained by transformer outperformed state-of-the-art approaches with a deception
the proposed model in terms of accuracy compared with the state-of- dataset accuracy of 91.02 percent. In addition, they illustrated that
the-art approaches. The proposed model is shown to be more accu­ RoBERTa, DistilBERT, and BERT achieved better results with small
rate than ten recent models developed to identify fake reviews. datasets. Although several ML algorithms have been designed to detect
false reviews and distinguish between real and fake reviews, contextu­
The rest of the paper is organized as follows: Section 2 provides the alized text representation models were not well investigated. This paper
related work. Section 3 elaborates on the proposed methodology. Sec­ provides a novel model that could be utilized with any contextualized
tion 4 describes the experimental setting, datasets, pre-processing, and text representation and neural classifier.
evaluation metrics. Section 5 presents the results and analysis. Section 6
concludes the paper with the main findings. 3. Methodology

2. Related works The proposed model incorporates pre-processing and blends state-of-
the-art transformer designs with an LSTM strategy as shown in Fig. 1.
Several previous studies have attempted to identify fake reviews by The following subsequent sections elaborate on these phases and the
employing classic ML algorithms such as Naive Bayes (NB) and Support structure of the proposed model:
Vector Machine (SVM), which can learn discriminant features from re­
views (Cardoso, Silva, & Almeida, 2018). A good instance of this is when
someone employs a linguistic inquiry, and word count (LIWC) instru­
ment (Krishnamoorthy, Sathiyanarayanan, & Proença, 2024) (Penne­
baker, Boyd, Jordan, & Blackburn, 2015). Ott, et al. (2011) suggested a
methodology to automatically detect fraudulent reviews by combining
psychological factors extracted from reviews with n-grams and feeding
the resulting data to support vector machines. The method showed an
accuracy of 90 % in classifying false reviews, which is significantly
higher than human evaluators. Human judges achieved 57 % accuracy
using the Op-Spam dataset. The proposed approach in Feng, Banerjee,
and Choi (2012) detected fake reviews using part-of-speech character­
istics and context-free grammar. The reported findings demonstrated
that using these features in combination with the existing system
considerably improved the accuracy of identifying fake reviews. Later,
Li, Ott, Cardie, and Hovy (2014) proposed a model called Sparse Addi­
tive Generative Model (SAGE) to detect false reviews that employ topic
modeling (DePaulo, et al., 2003) with a generalized additive model
(Hastie Trevor and Tibshirani Robert 1990). The accuracy of the pro­
posed model on the deception and Op-Spam datasets was 83.10 % and Fig. 1. The architecture of the RoBERTa model.

251
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

3.1. Pre-processing stage ( )


ft = σ Wxf xt + Whf ht− 1 + bf
Several procedures are carried out to clean the incoming review data
ot = σ (Wxo xt + Who ht− 1 + bo )
and prepare it for use as input for the transformers. We filter unnec­
essary information and terms like emojis and URLs throughout this
̃ct = tanh(Wxc xt + Whc ht− 1 + bc )
process. Finally, we consider each piece of feedback and separate it into
individual words.
ct = ft ⊗ ct− 1 + it ⊗ ̃ct

3.2. Robustly optimised BERT approach (RoBERTa) ht = ot tanh ⊗ (ct )

RoBERTa is an improved BERT that is more powerful than the where ft , it , ct , ot reflect the values of f, i, c, o at the time t, respectively.
original BERT (Alippi et al., 2011). It trains the machine on longer se­ The term W describes the hidden layer updating weights and b repre­
quences and skips sentence prediction. RoBERTa is a pre-trained model sents the bias. The tanh and σ represent the hyperbolic tangent and
consisting of almost 63 million news items in English, English Wikipe­ sigmoid functions, respectively.
dia, and book corpora. It is employed because of its high-quality
masking patterns and flexibility in adapting to each review. 3.4. The proposed model

3.3. Long short-term memory (LSTM) The proposed model employs a feed-forward network with 768
hidden sizes trained with the RoBERTa- base-uncased model. The input
LSTM is a type of RNN architecture and is currently a mainstream IDs and attention masks of the sentence being modeled are used in the
RNN structure (Hochreiter & Schmidhuber, 1997). LSTM solves the RoBERTa model. The RoBERTa tokenizer is employed for this purpose; it
vanishing gradient by exchanging memory blocks with hidden accepts the input sequence as [CLS] and [SEP] appended to the begin­
self-connected units. The memory block is used to store information and ning and end of a sentence, respectively. Then, it returns the input IDS
learn long-range text sequences. The memory unit tells the network and attention masks for each sentence. These data are then used as input
what to learn and what to forget. LSTM consists of four components (the to the RoBERTa model, which generates an embedding vector for each
input gate, the forget gate, the output gate, and the cell activation token of a hidden size of 768. To improve LSTM’s grasp of sentence
vector). First, the input gate manages the size of the extra memory semantics, RoBERTa offers contextualized representations at the sen­
content. Second, the forget gate f, w specifies a certain amount of the tence level. In recent works, researchers have demonstrated how
memory that must be erased. Third, the output gate o, w modulates the combining LSTM with word embedding models can significantly
memory content output amount. Fourth, the cell activation vector c, improve detection performance (Goyal, Du, Ott, Anantharaman, &
contains two components, modulated new memory c̃t and forgotten Conneau, 2021). Since combining LSTM with RoBERTa can improve
previous memory ct− 1 . t identifies the t th time. With 100-dimensional prediction accuracy, it follows that the suggested model has a deeper
GloVe embeddings, our LSTM model was trained. The dimensions and grasp of semantic meaning. As a "collective representation" for classifi­
time steps of the output were both set to 300. An ADAM optimizer with a cation tasks, [CLS] is encoded via a multi-layer encoding procedure that
learning rate of 0.001 was used to reduce binary cross-entropy loss. The incorporates all representative information of all tokens. This means
activation function for the final output layer was sigmoid. The model that the [CLS] token’s embeddings can be used to represent the entire
was trained across ten epochs with 64 and 512-element batches. Several sentence and then fed into a classifier to perform a classification task.
runs were repeated with different numbers of layers for the LSTM The classifier is coded from the ground up. The review text has been
approach. classified using a classification model composed of a RoBERTa and an
The LSTM architecture is given in Fig. 2, where the mathematical LSTM layer. The classification model used for this purpose is RoBERTa
equations of LSTM operations are shown as follows: supplemented with an LSTM layer. It learns contextualized word rep­
resentations from a vast number of unlabelled text datasets. In the NLP
it = σ (Wxi xt + Whi ht− 1 + bi )
evaluations, RoBERTa performs well because of its complex structure
and excellent nonlinear representation learning capability. LSTMs can
effectively boost performance since they can memorize and discover the
pattern of crucial information. For classification purposes, we include a
128-node feed-forward linear layer. Inputs are normalized using a batch
normalization layer. Following this, a dropout layer with a rate of 0.6 is
implemented to mitigate the potential for overfitting. The input reviews
are labeled as either fake or real thanks to the inclusion of the two
feed-forward layers, each having an output size of the output of one cell
and prioritized by selecting a threshold of 0.9 above the other. Fig. 3
depicts the overall architecture of the proposed model.

4. Performance evaluation

This section outlines the detailed analysis of the performed experi­


ments, including the description of the datasets, the evaluation metrics,
and the proposed model’s performance. It also provides a performance
comparison of the proposed model with respect to a number of state-of-
the-art models.

4.1. Experimental setup

Fig. 2. The architecture of LSTM. Arrowed lines describe the weight matrices. Training is performed for ten iterations across all datasets with a

252
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

Fig. 3. The proposed model- RoBERTa with LSTM.

mini-batch size of 32. We use the early stop method based on validation
loss to avoid the overfitting of the model as the delta=”0″ (Prechelt,
1998). AdamW is employed to optimize the effectiveness of the pro­ where TP represents the “true positive”, TN indicates the “true nega­
posed model (Loshchilov & Hutter, 2017). We adopt binary tive”, FP is the “false positive”, and FN is the “false negative”.
cross-entropy to calculate the loss finally (Rosasco, De Vito, Caponnetto,
Piana, & Verri, 2004) • Precision: This metric describes the proportion of successfully pre­
We program our model using Python language and the Google CoLab dicted reviews to the total number of reviews for a given class which
interface (Halyal, 2019). In addition to this, we utilize the newly can be calculated as follows:
developed transformer library (Wolf, et al., 2020). Several iterations of
the tests are conducted, each with a unique set of parameters (Table 1). number ofcorrect predictions of each class TP
Precision(P) = =
total number of predictions of each class TP + FP
4.2. Dataset description (1)

Two publicly accessible benchmark datasets are utilized in this


• Recall: This metric shows the proportion of relevant reviews achieved
study. Deception dataset (Li, et al., 2014) and OpSpam (Ott, et al.,
from the total number of reviews and is calculated as follows:
2011). Both data sets solely contain review text, with no corresponding
metadata information. The deception dataset was constructed using number of correct predictions TP
Recall (R) = = (2)
crowdsourcing platforms. The dataset includes 3032 reviews and data total number of predictions TP + FN
about three different sectors (hotels, doctors, and restaurants). The
OpSpam dataset was developed based on reviews related to the most
prestigious hotels in Chicago. This dataset depicts a semi-real world. The • F1 score: This metric shows the average of precision and recall and
dataset includes sixteen hundred text reviews for 20 hotels in Chicago, can be calculated as follows:
United States of America metropolitan region. Eight hundred reviews 2 × recall × precision
belong to fraudulent reviews, and the remaining eight hundred consist F − measure = (3)
recall + precision
of authentic reviews. A label of "1″ denotes that the reviews are false,
while a "0″ label specifies that the reviews are authentic. The testimonies
come from a large number of distinct sources. The Amazon Mechanical
Turk (AMT) platform is used to create the bogus reviews. In contrast, the
4.4. Results and discussion
remaining reviews are obtained from various online review sites such as
Yelp, TripAdvisor, and Expedia. In the experiments, we use 80 % of the
4.4.1. Results for the op-spam dataset and deception dataset
Deception and OpSpam datasets for training and 20 % of each dataset for
Table 3 shows that the accuracy, precision, recall, and F1-score for
testing. Table 2 presents the statistical data for both datasets.
the proposed model for the Op-Spam dataset are 96.03 %, 91.48 %,
99.31 %, and 95.36 %, respectively. The results of the experiments
4.3. Evaluation metrics whereas for the deception dataset are 93.15 %, 85.39 %, 97.40 %, 91.20
%. The results of the experiments reveals that the proposed model per­
In the related work of detecting fake reviews, recall, accuracy, and forms well on OpSpam and Deception datasets. This is due to the
the F-measure are the most popular performance metrics for evaluating model’s capability of effectively capturing the complete contextualized
the detection performance of various models are given that the number depiction of each review, as well as its most pertinent characteristics, by
of reviews for both groups is equal. We use these metrics to evaluate the employing a variety of models and perspectives.
performance of the proposed model.
4.4.2. Comparison with state-of-the-art models
• Accuracy: This metric comprehensively estimates the number of To determine the efficacy of the proposed model, we compare its
cases that have been accurately identified, which can be determined performance to a set of reference approaches, including SVM (Ott, et al.,
as: 2011), SVM (Cagnina & Rosso, 2015), SVM (Feng, et al., 2012), SAGE
TP + TN (Li, et al., 2014), RCNN (Lai, Xu, Liu, & Zhao, 2015), GRNN–CNN (Y.
Accuracy = Ren & D. Ji, 2017), DRI-RCNN (Zhang, et al., 2018), LDA with TextCNN
TP + TN + FP + FN
(Cao, Ji, Chiu, He, & Sun, 2020), MFNN (Jiang, Zhang, & Jin, 2020),

253
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

Table 1 Table 2
Summary of existing works. Deception and OpSpam statistical information.
References Model Cons Pros Datasets Domain Type of Number Number Number
Reviews of of words of
(Shushkevich & BiLSTM By combining the The computational
reviews sentences
Cardiff, 2021) assets of multiple intensity and
models, the resource Deception ( Hotel legitimate 1080 17,328 9258
ensemble method requirements of an Li, et al., reviews
mitigates the effect ensemble may 2014)
of individual flaws surpass those of a Fake 1080 16,635 8463
and can increase the single model. Doctor legitimate 200 5098 1151
model’s overall reviews
precision. Fake 356 5128 2369
(Bandyopadhyay & KNN The KNN algorithm The KNN algorithm Restaurant legitimate 201 5126 1892
DUTTA, 2020) is straightforward exhibits laziness in reviews
and intuitive, learning, resulting Fake 201 5136 1827
making it simple to in a notably lengthy OpSpam ( Hotels legitimate 800 14,812 7963
comprehend and prediction time in Ott, et al., reviews
implement. comparison to 2011)
alternative Fake 800 14,427 7192
algorithms.
(Felber, 2021) SVM Support Vector In particular, SVMs
Machines (SVMs) may incur • SVM: It is a classification model for utilizing unigram features with
have demonstrated significant
SVM.
their ability to computational
accurately represent costs when applied
• SAGE: It is a classification model combines topic modeling with a
intricate decision to vast datasets. generalized additive model.
boundaries and • RCNN: It is a classification model that combines of recurrent and
flourish when convolutional neural networks.
applied to high-
• GRNN–CNN: It is a hybrid classification model for detecting false
dimensional
datasets. reviews that combines a GRU and a CNN.
(Koloski, Perdih, BERT Words with Due to the • DRI-RCNN: It is a classification model that is built on recurrent
Robnik-Šikonja, meanings that vary substantial convolutional deep neural networks (DRI-RCNN) for identifying
Pollak, & Škrlj, depending on computational false reviews based on analyzing their context.
2022) context, idiomatic resources needed
expressions, and for both training
• BERT-Base Case: It is a classification model that is based on pre-
long-range and inference, trained a deep bidirectional text representation. This model can
dependencies are all BERT is unsuitable handle unlabelled input by concentrating on the right and left
aspects of language for deployment on context at all levels.
structures that BERT systems with
• LDA with TextCNN: In this model, coarse-grained features are
is capable of limited resources.
processing. extracted with LDA and a backpropagation (BP) neural network,
(Shifath, Khan, & Foundation By combining the Large quantities of while fine-grained features are extracted with the help of neural
Islam, 2021) ensemble merits of numerous memory and networks. Based on the two fine-grained characteristics, it trains a
model pre-trained computational support vector machine classifier to identify phony testimonials. This
transformer models, expense are
ensemble models of characteristics of
model consists of one fully linked layer and four progressively
transformers can ensemble models smaller filter layers (sizes 2, 3, 4, and 5).
outperform them on that make them • MFNN: This model is a multivariate feature-based multilayer per­
subsequent tasks. challenging to ceptron (MLP) that learns both local and global information to
implement on
identify false reviews. This model employs three-, four-, and five-
devices with
limited resources. width multi-kernel CNNs.
(Duma, et al., Hybrid Integrating multi- Large quantities of • SOM-CNN: This model uses self-organizing maps (SOM) to visualise
2024) approach data features memory and reviews to spot false ones. This model consists of a 32-filter 3 × 3
(reviewer- and computational convolutional layer and a 2 × 2 Max-pooling layer.
review-centric). expense are
required.
(Thuy, et al., 2024) DenyBERT Combining BERT The proposed Table 4, Figs. 4, and 5 show that the proposed model outperforms the
model and Knowledge model focused only state-of-the-art approaches on deception and OpSpam datasets.
Distillation on English- Compared to conventional ML and deep learning models, the approach
techniques. language.
attained a high level of accuracy with a limited dataset. For instance, the
(Cheng, Wu, Chao, graph neural By combining Large quantities of
& Wang, 2024) network different review text memory and proposed model obtains more than 90 % accuracy with limited training
and reviewer computational data, but deep learning could not reach this level. Hence, both classic
features. expense are and deep learning models require huge training datasets. Nevertheless,
required. large-scale data collection is not always feasible. Hence, the transformer
model is a pragmatic solution for small datasets.
SOM-CNN (Neisari, Rueda, & Saad, 2021), BERT (Thuy, et al., 2024) .
The various reference models are described as follows: 5. Model explanation

• SVM: It is a classification model that combines bigram and LIWC Using ML or deep learning-based models can enhance the decision-
characteristics using SVM. making process for customers and businesses. However, determining
• SVM: It is a classification model comprises of four grams and LIWC confidence in individual predictions of these models is a challenging
characteristics using SVM. problem. For example, ML models cannot be trusted in sensitive
research areas like medicine, defence, and finance due to the level of

254
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

Table 3
Classification reports for OpSpam and Deception datasets.
Datasets

OpSpam Deception

Model Acc P R F1-score Acc P R F1-score

Proposed model 96.03 % 91.48 % 99.31 % 95.36 % 93.15 % 85.39 % 97.40 % 91.20 %

widely used dataset. Fig. 6 shows how common keywords like "luxury",
Table 4
"hotel," "Chicago," and "room," are used as features by neural network
Comparison of the Proposed model with the state-of-the-art methods.
models to determine if reviews are genuine. When the model detects a
Machine Learning Algorithms Deception dataset Op-Spam dataset fraudulent review, it gives that review a larger weight term as shown in
Accuracy F1- Accuracy F1- Fig. 6. On the other hand, as demonstrated in Fig. 7, if the reviews are
score score genuine, the model can correctly classify them based on these terms. In
SVM (bigram and LIWC features) 79.33 % 82.83 % 82.89 % 82.09 % Fig. 7, the top 10 most relevant terms from the reviews are shown by
SVM (unigram feature) 83.33 % 79.53 % 86.09 % 83.12 % bars of increasing size. These terms are also indicated in the original
SVM (LIWC and four grams 81.67 % 81.03 % 84.34 % 83.21 % document with orange color for false and blue for genuine as shown in
features)(
Figs. 6 and 7.
SAGA 81.82 % 79.38 % 83.10 % 82.23 %
RCNN 82.16 % 82.00 % 83.21 % 81.23 %
GRNN-CNN 83.34 % 82.86 % 84.15 % 84.17 % 6. Complexity analysis and limitations
DRI-RCNN 85.24 % 83.56 % 87.24 % 85.36 %
BERT Base Case 86.20 % 85.50 % 90.31 % 89.56 %
LDA with TextCNN 86.36 % 85.90 % 90.39 % 90.90 % Detecting fake reviews is also challenging for machines since they
MFNN 85.20 % 84.70 % 88.72 % 89.10 % must be able to differentiate between "genuine reviews" and "fake re­
SOM-CNN 85.90 % 85.01 % 89.23 % 90.01 % views”. However, we employ a range of characteristics (e.g., review
RoBERTa þLSTM 93.15 % 91.10 96.03 % 95.36 content and emotions) to accurately detect fake reviews. Thus, during
% %
the training, the greater the number of incorporated characteristics into
the models, the better it detects fraudulent reviews. In this paper, we
complexity. Consequently, there is a need to explain ML models to make develop an approach focusing on deriving textual characteristics from
them more reliable and responsible. Thus, both customers and busi­ review text to analyze the distinctions between fake and genuine review
nesses can have more visibility into how machine and deep learning- material. The experimental results revealed that contextual character­
based fake review detection models compute their final output and istics are crucial for distinguishing between bogus and genuine reviews.
they therefore can trust the models and their decisions. Therefore, in this The results indicate that LSTM and pre-trained models considerably
work, we provide an explainable fake review detection model. We use enhanced the detection of brief textual false reviews.
the post-hoc explainable technique to explain the obtained results by our We note that the success of pre-trained language models relies
models and why it achieves the best performance in detecting fake re­ heavily on two factors: (1) the scope and breadth of the data used in the
views. Specifically, we implement this technique on example_1 and training process and (2) the design of the model itself. RoBERTa’s
example_2. LIME also provides explanations by learning an interpretable training included BookCorpus, Wikipedia, and CC-News to expand its
model locally around our prediction and assigning specific weight coverage. Though, in the hybrid method, using RoBERTa with LSTM can
values to each feature. In this paper, we analyze the prediction results of enhance detection performance. This helps to manage the fake review
the interpreter to identify key aspects of fraudulent reviews in the most features and context variance. The proposed hybrid model (RoBERTA
with LSTM) is the best compared to the state-of-the-art methods and

Fig. 4. Performance Comparison Under the Op-Spam dataset.

255
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

Fig. 5. Performance Comparison Under the Deception dataset.

Fig. 6. Shape explanation on the Op-spam dataset.

clearly shows that our model outperforms the other proposed models. variation and false review features. When compared to the most recent
The findings reveal that the utilisation of LSTM and pre-trained models methodologies, the performance of the proposed model unequivocally
can significantly improve the ability to identify brief textual false demonstrates that it surpasses the other models that have been
evaluations. proposed.
Two factors significantly influence the effectiveness of pre-trained
language models: (1) the extent and depth of the data utilised during
6.1. Complexity
training, and (2) the model’s architecture. To broaden its scope, RoB­
ERTa training incorporates BookCorpus, Wikipedia, and CC-News. With
Analysis the proposed model has an extremely low computational
the hybrid approach, however, RoBERTa and LSTM can improve
complexity due to the fact that it compares only pertinent terms
detection performance. This aids in the management of context
extracted from the text, namely aspects and sentiments, with the

256
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

Fig. 7. Shape explanation on the deception dataset.

asscoiated aspects and sentiments of other reviews. In our methodology, 7. Conclusion


the computational complexity of reviews is O(log2(n)). Furthermore,
the computational complexity of examining other reviews is O(log2(n)* This paper proposed a novel semantic- and linguistic-aware model
log2(m)). This procedure is to be iterated for k reviews, resulting in a for fake review detection that integrates a RoBERTa model with an
total computational complexity of O(k* log2(n)* log2(m)). It is apparent LSTM layer. This approach leverages transformer architecture to accu­
from the computational complexity analysis that the proposed model rately identify patterns in fake reviews, enhancing both fake and
significantly reduces the amount of time required to analyze a large authentic behavior profiling. The proposed model significantly out­
number of reviews. performed state-of-the-art models on semi-real benchmark datasets. This
paper provides substantial contributions to deep learning, natural lan­
6.2. Limitation of this study guage processing, and online commerce security. The reported findings
shed light on textual data processing, feature extraction techniques, and
The findings presented in this research offer significant contributions the detection of fraudulent online reviews. Reliable methods for
to the understanding of the efficacy of different downstream neural detecting fake reviews strengthen the credibility of e-commerce, making
network methodologies when it comes to detecting fake reviews utilis­ this a critical objective for online commerce platforms. Moreover, the
ing transformer-based models. However, certain prospective limitations proposed work serves as a benchmark for future model developers using
may be resolved in future research. For instance, the investigation failed machine learning and deep learning techniques to identify fraudulent
to examine the effects of various optimization techniques and hyper­ reviews. Despite the comprehensive experiments presented, further
parameters on the efficacy of the model. Furthermore, it is worth noting improvement is possible. Future research will evaluate the proposed
that the research employed a comparatively limited fake reviews data­ method on cross-domain datasets and explore applying SHAP and deep
set, which may restrict the applicability of the results. The impact of learning insights to develop an interpretable lexicon-based classifier. We
larger datasets and the applicability of the proposed models to related also plan to investigate combining multiple classifiers and feature
tasks could be the subject of additional study. Furthermore, an assess­ extraction techniques to improve accuracy on larger datasets. Addi­
ment of the models’ interpretability may yield significant insights into tionally, we will apply advanced textual analyses, such as named-entity
the mechanisms that underlie them, thereby enhancing transparency recognition, to extract more valuable information and examine the ef­
and credibility. In addition, this study concludes that RoBERTa with fects of adjusting RoBERTa’s hyperparameters and subsequent layers.
LSTM performs better than alternative models. However, whether this
result is attributable exclusively to the pre-training task or to other Declaration of competing interest
factors as well remains unknown. Hence, additional investigation is
required in order to ascertain the precise elements that contribute to the The authors declare that they have no known competing financial
exceptional performance of the proposed model. Further research may interests or personal relationships that could have appeared to influence
be warranted to examine the effects of various pre-training tasks and the work reported in this paper.
datasets in order to ascertain the proposed models’ robustness.

257
R. Mohawesh et al. International Journal of Cognitive Computing in Engineering 5 (2024) 250–258

References Mohawesh, R., Liu, X., Arini, H. M., Wu, Y., & Yin, H. (2023b). Semantic graph based
topic modelling framework for multilingual fake news detection. AI Open, 4, 33–41.
Mohawesh, R., Maqsood, S., & Althebyan, Q. (2023c). Multilingual deep learning
Alippi, C., Boracchi, G., & Roveri, M. (2011). A just-in-time adaptive classification system
framework for fake news detection using capsule neural network. Journal of
based on the intersection of confidence intervals rule. Neural Networks, 24, 791–800.
Intelligent Information Systems, 1–17.
Bandyopadhyay, S., & DUTTA, S. (2020). Analysis of fake news in social medias for four
Mohawesh, R., Maqsood, S., Jararweh, Y., & Salameh, H. B. (2023d). Federated learning
months during lockdown in COVID-19.
support for cybersecurity: fundamentals, applications, and opportunities. In 2023
Cagnina, L., & Rosso, P. (2015). Classification of deceptive opinions using a low
international conference on intelligent computing, communication, networking and
dimensionality representation. In Proceedings of the 6th workshop on computational
services (ICCNS) (pp. 50–56). IEEE.
approaches to subjectivity, sentiment and social media analysis (pp. 58–66).
Mohawesh, R., Tran, S., Ollington, R., & Xu, S. (2020). Analysis of concept drift in fake
Cao, N., Ji, S., Chiu, D. K., He, M., & Sun, X. (2020). A deceptive review detection
reviews detection. Expert Systems with Applications, Article 114318.
framework: Combination of coarse and fine-grained features. Expert Systems with
Mohawesh, R., Xu, S., Springer, M., Al-Hawawreh, M., & Maqsood, S. (2021). Fake or
Applications, Article 113465.
genuine? Contextualised text representation for fake review detection. arXiv preprint
Cardoso, E. F., Silva, R. M., & Almeida, T. A. (2018). Towards automatic filtering of fake
arXiv:2112.14343.
reviews. Neurocomputing, 309, 106–116.
Mohawesh, R., Xu, S., Springer, M., Jararweh, Y., Al-Hawawreh, M., & Maqsood, S.
Cheng, L.-C., Wu, Y. T., Chao, C.-T., & Wang, J.-H. (2024). Detecting fake reviewers from
(2023e). An explainable ensemble of multi-view deep learning model for fake review
the social context with a graph neural network method. Decision Support Systems,
detection. Journal of King Saud University-Computer and Information Sciences, 35,
179, Article 114150.
Article 101644.
DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H.
Mohawesh, R., Xu, S., Tran, S. N., Ollington, R., Springer, M., Jararweh, Y., &
(2003). Cues to deception. Psychological bulletin, 129, 74.
Maqsood, S. (2021b). Fake reviews detection: A survey. IEEE Access, 9,
Duma, R. A., Niu, Z., Nyamawe, A., Tchaye-Kondi, J., Chambua, J., & Yusuf, A. A.
65771–65802.
(2024). DHMFRD–TER: a deep hybrid model for fake review detection incorporating
Myers, D., Mohawesh, R., Chellaboina, V. I., Sathvik, A. L., Venkatesh, P., Ho, Y.-H.,
review texts, emotions, and ratings. Multimedia Tools and Applications, 83,
Henshaw, H., Alhawawreh, M., Berdik, D., & Jararweh, Y. (2023). Foundation and
4533–4549.
large language models: Fundamentals, challenges, opportunities, and social impacts.
Felber, T. (2021). Constraint 2021: Machine learning models for covid-19 fake news
Cluster Computing, 1–26.
detection shared task. arXiv preprint arXiv:2101.03717.
Neisari, A., Rueda, L., & Saad, S. (2021). Spam review detection using self-organizing
Feng, S., Banerjee, R., & Choi, Y. (2012). Syntactic stylometry for deception detection. In
maps and convolutional neural networks. Computers & Security, 106, Article 102274.
Proceedings of the 50th annual meeting of the association for computational linguistics:
Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptive opinion spam by
Short Papers-Volume 2 (pp. 171–175). Association for Computational Linguistics.
any stretch of the imagination. In Proceedings of the 49th annual meeting of the
Fusilier, D. H., Montes-y-Gómez, M., Rosso, P., & Cabrera, R. G. (2015). Detecting
association for computational linguistics: Human language technologies-volume 1 (pp.
positive and negative deceptive opinions using PU-learning. Information Processing &
309–319). Association for Computational Linguistics.
Management, 51, 433–443.
Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and
Goyal, N., Du, J., Ott, M., Anantharaman, G., & Conneau, A. (2021). Larger-scale
psychometric properties of LIWC2015. In.
transformers for multilingual masked language modeling. In REPL4NLP.
Prechelt, L. (1998). Early stopping-but when? Neural Networks: Tricks of the trade, 55–69.
Hakak, S., Alazab, M., Khan, S., Gadekallu, T. R., Maddikunta, P. K. R., & Khan, W. Z.
Ren, Y., & Ji, D. (2017a). Neural networks for deceptive opinion spam detection: An
(2021). An ensemble machine learning approach through effective feature extraction
empirical study. Information Sciences, 385, 213–224.
to classify fake news. Future Generation Computer Systems, 117, 47–58.
Ren, Y., & Ji, D. (2017b). Neural networks for deceptive opinion spam detection: an
Halyal, S. V. (2019). Running Google Colaboratory as a server–transferring dynamic data
empirical study. J Inf Sci, 385.
in and out of colabs. International Journal of Education and Management Engineering, 9,
Ren, Y., & Zhang, Y. (2016). Deceptive opinion spam detection using neural network. In
35.
Proceedings of COLING 2016, the 26th international conference on computational
Ho-Dac, N. N., Carson, S. J., & Moore, W. L. (2013). The effects of positive and negative
linguistics: Technical papers (pp. 140–150).
online customer reviews: do brand strength and category maturity matter? Journal of
Rifai, A. P., Mulyani, Y. P., Febrianto, R., Arini, H. M., Wijayanto, T., Lathifah, N., Liu, X.,
Marketing, 77, 37–53.
Li, J., Yin, H., & Wu, Y. (2023). Detection model for fake news on COVID-19 in
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,
Indonesia. ASEAN Engineering Journal, 13, 119–126.
9, 1735–1780.
Rosasco, L., De Vito, E., Caponnetto, A., Piana, M., & Verri, A. (2004). Are loss functions
Jiang, C., Zhang, X., & Jin, A. (2020). Detecting online fake reviews via hierarchical
all the same? Neural computation, 16, 1063–1076.
neural networks and multivariate features. In Neural information processing: 27th
Saini, S., Saumya, S., & Singh, J. P. (2017). Sequential purchase recommendation system
international conference, ICONIP 2020, Bangkok, Thailand, November 23–27, 2020,
for e-commerce sites. In IFIP international conference on computer information systems
Proceedings, Part I 27 (pp. 730–742). Springer.
and industrial management (pp. 366–375). Springer.
Khan, H., Asghar, M. U., Asghar, M. Z., Srivastava, G., Maddikunta, P. K. R., &
Saumya, S., Singh, J. P., Baabdullah, A. M., Rana, N. P., & Dwivedi, Y. K. (2018). Ranking
Gadekallu, T. R. (2021). Fake review classification using supervised machine
online consumer reviews. Electronic Commerce Research and Applications, 29, 78–89.
learning. In Pattern recognition. ICPR International workshops and challenges: Virtual
Shifath, S., Khan, M.F., & Islam, M.S. (2021). A transformer based approach for fighting
event, January 10–15, 2021, proceedings, part IV (pp. 269–288). Springer.
COVID-19 fake news. arXiv preprint arXiv:2101.12027.
Khan, W., & Haroon, M. (2022). An unsupervised deep learning ensemble model for
Shushkevich, E., & Cardiff, J. (2021). TUDublin team at Constraint@
anomaly detection in static attributed social networks. International Journal of
AAAI2021–COVID19 fake news detection. arXiv preprint arXiv:2101.05701.
Cognitive Computing in Engineering, 3, 153–160.
Singh, J. P., Irani, S., Rana, N. P., Dwivedi, Y. K., Saumya, S., & Roy, P. K. (2017).
Koloski, B., Perdih, T. S., Robnik-Šikonja, M., Pollak, S., & Škrlj, B. (2022). Knowledge
Predicting the “helpfulness” of online consumer reviews. Journal of Business
graph informed fake news classification via heterogeneous representation
Research, 70, 346–355.
ensembles. Neurocomputing, 496, 208–226.
Thuy, D. T. T., Thuy, L. T. M., Bach, N. C., Duc, T. T., Bach, H. G., & Cuong, D. D. (2024).
Krishnamoorthy, P., Sathiyanarayanan, M., & Proença, H. P. (2024). A novel and secured
Designing a deep learning-based application for detecting fake online reviews.
email classification and emotion detection using hybrid deep neural network.
Engineering Applications of Artificial Intelligence, 134, Article 108708.
International Journal of Cognitive Computing in Engineering, 5, 44–57.
Wolf, T., Chaumond, J., Debut, L., Sanh, V., Delangue, C., Moi, A., Cistac, P.,
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for
Funtowicz, M., Davison, J., & Shleifer, S. (2020). Transformers: State-of-the-art
text classification. In Twenty-ninth AAAI conference on artificial intelligence.
natural language processing. In Proceedings of the 2020 conference on empirical
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A
methods in natural language processing: system demonstrations (pp. 38–45).
lite bert for self-supervised learning of language representations. arXiv preprint
Xu, Q., & Zhao, H. (2012). Using deep linguistic features for finding deceptive opinion
arXiv:1909.11942.
spam. In Proceedings of COLING 2012: Posters (pp. 1341–1350).
Li, J., Ott, M., Cardie, C., & Hovy, E. (2014). Towards a general rule for identifying
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet:
deceptive opinion spam. In Proceedings of the 52nd annual meeting of the association
Generalized autoregressive pretraining for language understanding. Advances in
for computational linguistics (Volume 1: Long Papers) (pp. 1566–1576).
Neural Information Processing Systems, 32.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M.,
Yin, H., Liu, X., Wu, Y., Arini, H. M., & Mohawesh, R. (2023). A BERT-based semantic
Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert
enhanced model for COVID-19 fake news detection. In Asia-Pacific Web (APWeb) and
pretraining approach. In arXiv preprint arXiv:. 1907.11692.
web-age information management (WAIM) joint international conference on web and big
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint
data (pp. 1–15). Springer.
arXiv:1711.05101.
Zhang, W., Du, Y., Yoshida, T., & Wang, Q. (2018). DRI-RCNN: An approach to deceptive
Maqsood, S., Xu, S., Springer, M., & Mohawesh, R. (2021). A benchmark study of
review identification using recurrent convolutional neural network. Information
machine learning for analysis of signal feature extraction techniques for blood
Processing & Management, 54, 576–592.
pressure estimation using photoplethysmography (PPG). IEEE Access.
Mohawesh, R., Al-Hawawreh, M., Maqsood, S., & Alqudah, O. (2023a). Factitious or fact?
Learning textual representations for fake online review detection. Cluster computing
(pp. 1–16).

258

You might also like