0% found this document useful (0 votes)
14 views96 pages

Fake News Documentation

This research paper discusses the detection of fake news using machine learning and natural language processing techniques, highlighting the impact of social media on news dissemination. It outlines various methodologies, including supervised learning algorithms like Naive Bayes and Passive-Aggressive classifiers, to classify news articles as real or fake. The study aims to enhance the accuracy of fake news detection systems to mitigate the spread of misinformation online.

Uploaded by

Řâj Kùmàŕ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views96 pages

Fake News Documentation

This research paper discusses the detection of fake news using machine learning and natural language processing techniques, highlighting the impact of social media on news dissemination. It outlines various methodologies, including supervised learning algorithms like Naive Bayes and Passive-Aggressive classifiers, to classify news articles as real or fake. The study aims to enhance the accuracy of fake news detection systems to mitigate the spread of misinformation online.

Uploaded by

Řâj Kùmàŕ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 96

RESEARCH PAPER-1

|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774


INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS
A RESEARCH PAPER ON FAKE NEWS DETECTION
Mayur Bhogade1, Bhushan Deore2, Abhishek Sharma3, Omkar Sonawane4,
Prof. Manisha Singh5
Department of Computer Engineering, Dhole Patil College of Engineering, Pune, India 1 2 3 4 5
[email protected], [email protected],[email protected],
[email protected], [email protected]
------------------------------------------------------ ***--------------------------------------------------
Abstract: - With the popularity of mobile technology and social media growing, information is readily available. Mobile
App and social media platforms have overturned traditional media in the distribution of news. Alongside the increment in
the utilization of online media stages like Facebook, Twitter, and so forth news spread quickly among a large number of
clients with an extremely limited ability to focus time. Machine learning and Knowledge-based approach and approach are
the two techniques utilized for investigating the truthiness of the content. Public and private assessments on a wide
assortment of subjects are communicated and spread persistently through various online media. Most methodologies are
utilized, for example, regulated AI. The spread of phony news has extensive results like the making of one-sided feelings to
influencing political race results to support certain applicants. Additionally, spammers utilize engaging news features to
produce income utilizing notices through click baits. In this paper, we intend to perform a parallel grouping of different
news stories accessible online with the help of thoughts identifying with Artificial Intelligence, Natural Language
Processing, and Machine Learning. The result of the project determines the fake news detection for social networks using
machine learning and also checks the authenticity of the publishing news website.
Keywords: - Fake News, News articles, Internet, Social media, Classification, Artificial Intelligence, Machine Learning.
---------------------------------------------------------------------***---------------------------------------------------------------------
I INTRODUCTION article that is misleading in nature from articles that depend on
The growing popularity of social media & mobile technology realities. Different strategies include the investigation of the
with this information is accessible at one’s fingertips. Mobile spread of fake news interestingly with real news. Specifically,
apps and social media like Facebook and Twitter have this approach analyses fake news articles propagates
overthrown traditional media in the field of information and differently on the internet relative to a true article. The reaction
news. With the convenience and speed that digital media that an article gets can be separated at a theoretical level to
offers, people express a preference for social media. Not only arrange the article as real or fake. The hybrid approach can also
has it empowered consumers with faster access but it has be used to investigate the social responsibility of an article
additionally given benefit looking for parties a solid stage to alongside investigating the text-based features to examine
catch a more extensive crowd. whether an article is deceptive or not.

With a lot of information or news, the one question occurred The algorithms used by fake news detection systems include
whether the given news or information is True or Fake. Fake machine learning algorithms such as Logistic Regression,
news is commonly distributed with an intent to mislead or Random Forests, Decision trees, Support Vector Machines,
make an inclination to get political or monetary benefits. Let’s Stochastic Gradient Descent, and so on. A simple method of
consider the example - In the recent elections of India, there fake news detection based on one of the AI algorithms called
has been a lot of discussion in regards to the credibility of the Naive Bayes classifier help to examine how this particular
different news reports preferring certain applicants and the method works for the particular problem with a manually
political thought processes behind them. In this growing labeled (fake or real) dataset and to support the idea of using
interest, exposing fake news is paramount in preventing its machine learning to detect fake news.
negative impact on people and society. II LITERATURE REVIEW
The World Wide Web contains data in grouped arrangements [1]Paper Name: - Evaluating Machine Learning algorithms for
like documents, videos, and audio. News distributed online in Fake News Detection.
an unstructured configuration (like news, articles, videos, Author: - Shloka Gilda.
audios) is moderately hard to distinguish and order as this
rigorously requires human mastery. However, computational In this article, the author introduced the concept of the
procedures, for example, natural language preparing (NLP) can importance of NLP in stumbling across incorrect information.
be utilized to identify irregularities that different a content They have used time frequency-inverse document frequency
(TF-IDF) of bigrams and probabilistic context-free grammar
IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 310
|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS
detection. Shloka Gilda introduced the concept of the new data set, which provided the opportunity to evaluate its
importance of NLP in stumbling over incorrect information. performance against the most recent data.
They used BiGram Count Vectorizer and Probabilistic Context- III PROPOSED METHODOLOGY
Free Grammar (PCFG) to detect deceptions. They examined
the data set in more than one class of algorithms to find out a This project will help to find a way to utilize Natural Language
better model. The count vectorizer of bi-grams fed directly into Processing (NLP) to identify and Classify fake news articles.
a stochastic gradient descent model which identifies The main objective is to detect fake news, which is a classic
noncredible resources with an accuracy of 71.2%. text classification problem. We will gather our data, preprocess
the text, and convert our articles into features for Use in
[2]Paper Name: - Fake News Detection on Social Media: A supervised models. We will use a Passive-Aggressive classifier
Data Mining Perspective. for training data sets and testing on news articles.
Author: - Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang and In this project, we will be using Python and Sci-kit libraries.
Huan Liu. Python has a great set of libraries and plugins that you can use
In this paper to detect fake news on social media, a data mining in machine learning. The Sci-Kit Learn library is the best
perspective is presented that includes the characterization of resource for the machine learning algorithms, which almost all
fake news in psychology and social theories. This article looks of the types of machine learning algorithms that are easily
at two main factors responsible for the widespread acceptance available to Python, so a simple and quick evaluation of the
of fake messages by the user which are naive realism and ML algorithms, is possible too. We used the flask to deploy a
confirmatory bias. It proposes a general two-phase data mining model along with the implementation help of HTML, CSS, and
framework that includes 1) feature extraction and 2) modeling, Javascript for the front end.
analyzing data sets, and confusion matrix for detecting fake IV SYSTEM DESIGN
news.
[3]Paper Name: - Media Rich Fake News Detection: A Survey.
Author: - Shivam B. Parikh and Pradeep K. Atrey.
Social networking sites read news mainly in three ways: The
(multilingual) text is analyzed with the help of computational
linguistics, which semantically and systematically focuses on
the creation of the text. Since most publications are in the form
of text, a lot of work has been done on analyzing them.
Multimedia: Several forms of media are integrated into a single
post. This can include audio, video, images, and graphics. This
is very attractive and attracts the viewer's attention without
worrying about the text. Hyperlinks allow the author of the
post to refer to various sources and thus gain the trust of
viewers. In practice, references are made to other social media
websites, and screenshots are inserted.
[4]Paper Name: - Fake News Detection using Naive Bayes
classifier.
Author: - Mykhailo Granik and Volodymyr Mesyura.
V IMPLEMENTATION
This article describes a simple method of fake news detection
1. Data Collection :
based on one of the artificial intelligence algorithms called the
Naive Bayes classifier. The goal of the research is to examine In the working first step is data collection. The algorithm of
how this particular method works for the particular problem machine learning used in this project is called supervised
with a manually labeled (fake or real) dataset and to support learning. Learning is said to be supervised when the model is
the idea of using machine learning to detect fake news. The trained on a data set that contains both input and output
difference between this article and articles on similar topics is parameters. In supervised learning, the model is trained using a
that this article is extensively based on a Naive Bayes classifier data set that contains both input and output parameters. To train
which is used for the classification of fake news and real news; the model we have taken the dataset from kaggle.com The size
In addition, the developed system was tested on a relatively of the dataset is 20000*5 that means it having 20000 news
article and 5 attributes.

IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 311


|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS
The name of the attributes are 'id', 'title', 'author', 'text' and
'label'. Out of which four are input parameters or independent
variables these are 'id', 'title', 'author', and 'text'. The attribute 4.Classification
'label' is and a dependent variable or output parameter. The
To train the model first we are splitting the input dataset into
attribute 'label' is denoting whether the news article is ‘real’ or
two parts training dataset and testing dataset.80% of data will
‘fake’.
be used for training purposes and the rest 20% for testing. For
2. Preprocessing the text the text classification, we have used a passive-aggressive
In the second step is preprocessing the text. The performance classifier. The input of the passive-aggressive classifier is a
of the text classification model depends heavily on the words matrix of the TFIDF features. The passive-aggressive
in a corpus and the features created from those words to build a algorithm is generally used for large-scale learning.
model. In preprocessing we are omitting the stopwords from The passive-aggressive algorithm is an online learning
the news article. Stop words are the words that are common to algorithm. With online machine learning algorithms, the input
all types of articles like is a, an, the, am, are, etc. These words data comes in sequential order and the machine learning model
are so common that they don't disturb the correctness of the is updated step-by-step instead of batch learning, which uses
information in the article. the entire training data set at once. This algorithm is very
After this, we are applying lemmatization which will be useful in situations where there is a large amount of data. and it
removing the common morphological words and generate the is computationally impossible to train the entire data set at
root form of the inflected words. eg. since words like win, once. This algorithm is very useful in situations where there is
winning, won having the same meaning will be treated as a huge amount of data and it is computationally infeasible to
similar after this process. so this process will help to reduce the train the entire dataset at once.
feature dimensionality and increase the efficiency of the Passive: If the prediction is correct, keep the model and make
model. no changes. That means the data in the example is insufficient
3. Feature Extraction to effect a change in the model.
The next step is feature extraction. Machine learning Aggressive: If the prediction is incorrect make a change to the
algorithms operate on numeric values to transform the text into model i.e. some change to the model may correct it. After that,
something a machine can understand we are taking the help of a model is formed which is trained on the data of the training
Natural language processing that is transforming text into a set and will be applied to the testing dataset to evaluate the
meaningful vector of numbers. In Natural language processing, performance of this classifier.
there are two techniques for feature extraction one is count VI EVALUATION METRICS
vectorizer and TFIDF(Term frequency-inverse document
To examine the effectiveness of the set of rules for the
frequency)in this project, we have used the TFIDF technique.
detection of fraudulent messages to a special assessment of the
TF (Term Frequency): The frequency with which a word facts has been used. In this section, we are able to speak the
appears in a document is its Term Frequency. A higher value maximum normally used metrics for the detection of fraudulent
means that one term occurs more often than others, so the messages. Most of the present techniques for the exam of the
document fits well if the term is part of the search terms. difficulty of faux information as a typical problem, it's far
IDF (Inverse Document Frequency): Words that occur many expected that with inside the article, maximum of them are
times in a document, but also occur many times in many faux or now no longer:
others, maybe irrelevant. IDF is a measure of how important a True Positive (TP): When it is anticipated to faux a message,
term is in the entire corpus. it's far without a doubt categorized as a fake message.
TFIDF Vectorizers is a numerical statistic designed to reflect True Negative (TN): When the actual information changed
the meaning of a word for a document in a collection or into anticipated, it changed into categorized as real messaging.
corpus.

IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 312


|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS
False Negative (FN) When it is actual, the information is, it's Fig - Home Page
far without a doubt categorized as fake reports.
False Positive (FP): When it is anticipated to faux a message,
it's far without a doubt categorized as actual information.
Confusion matrix:
This lets you visualize how the set of rules works. It's the wide
variety of accurate and wrong forecasts, it will likely be
blended with the values of the numerator and the cut-up in
each class. This is the important thing to the confusion matrix.
The confusion matrix suggests a way to make your type
version is burdened whilst it makes predictions. This will
provide us a concept now no longer the handiest of the
mistakes made with the aid of using the classifier, however
rather, and greater importantly, the forms of errors that have
been made.

VII GUI SCREENSHOTS


Fig - Real News

Fig - Fake News


VIII ADVANTAGES
Fake News Detection system will help in controlling the spread
of fake news over social media. This way, we can help the
people to make more informed decisions, and they are not
made to think about what others are trying to manipulate to
believe. A Fake News Detection system will reduce the burden
to check the authenticity of the news manually and saves lots
of time.
IX DISADVANTAGES
The accuracy of detecting fake news will not be 100%.
Therefore some articles may be predicted as false.
X RESULTS
In the fake news detection technology, there have been multiple
instances where both unsupervised learning and supervised
learning algorithms are used to classify text. Most of the
literature survey focuses on specific domains, most important
IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 313
|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS
the domain of politics. Therefore, the algorithm trained best International Conference on Smart Cloud (SmartCloud),
works on a particular type of article’s domain and does not 2017, pp. 208-215, DOI: 10.1109/SmartCloud.2017.40.
gives optimal results when presented to articles from different [6]
areas. So we have to find the solution for the fake news
detection problem using the machine learning approach. We [7] A. Gupta and R. Kaushal, "Improving spam detection in
used news.csv with a passive-aggressive classifier and Online Social Networks," 2015 International Conference
obtained 95.54% accuracy. on Cognitive Computing and Information
Processing(CCIP), 2015, pp. 1-6, DOI:
XI CONCLUSION 10.1109/CCIP.2015.7100738.
Manual classification of news articles requires in-depth [8] M. L. Della Vedova, E. Tacchini, S. Moret, G. Ballarin,
knowledge and expertise in identifying anomalies in the text. It M. DiPierro, and L. de Alfaro, "Automatic Online Fake
takes a lot of time to verify a single article manually that’s why News detection Combining Content and Social Signals,"
We discussed the use of machine learning models and 2018 22nd Conference of Open Innovations Association
ensemble methods to classify fake news articles. (FRUCT), 2018, pp. 272-279, DOI:
It is important that we have a mechanism to detect fake news, 10.23919/FRUCT.2018.8468301.
or at least an awareness that not everything we read on social
[9] De Beer, Dylan, and Machdel Matthee. “Approaches to
media may be true. That is why we always have to think
Identify Fake News: A Systematic Literature Review.”
critically. This way, we can help the people to make more
Integrated Science in Digital Age 2020 vol. 136 13–22. 5
informed decisions, and they won't be led to think about what
May. 2020, doi:10.1007/978-3-030-49264-9_2
others are trying to manipulate them into believing.

ACKNOWLEDGEMENT [10] S. I. Manzoor, J. Singla, and Nikita, "Fake News


Detection Using Machine Learning approaches A
We are greatly indebted to our guide Prof. Manisha Singh,
Systematic Review," 2019 3rd International Conference
Head of the Department, Principal for their unconditional
on Trends in Electronics and
support, and for sharing their profound technical knowledge,
Informatics (ICOEI), 2019, pp. 230-234, DOI:
without which our work would not have seen the light of the
10.1109/ICOEI.2019.8862770.
day.
[11] I. Ahmad, M. Yousaf, S. Yousaf, and M. O. Ahmad, “Fake
REFERENCES
News Detection Using Machine Learning Ensemble Methods,”
[1] S. Gilda, "Notice of Violation of IEEE Publication Complexity, 17-Oct-2020. [Online]. Available:
Principles: Evaluating machine learning algorithms for https://www.hindawi.com/journals/complexity/2020/8885861/.
fake news detection," 2017 IEEE 15th Student Conference [11] M. Gahirwal, “Fake News Detection,” International
on Research and Journal of Advance Research, Ideas and Innovations in
Development (SCOReD), 2017, pp. 110-115, Technology, vol. 4, no. 1, pp. 817–819, 2018.
DOI: 10.1109/SCORED.2017.8305411.
[12] Uma Sharma, Siddarth Saran, Shankar M. Patil, 2021,
[2] M. Granik and V. Mesyura, "Fake news detection using Fake News Detection using Machine Learning Algorithms,
naive Bayes classifier," 2017 IEEE First Ukraine INTERNATIONAL JOURNAL OF ENGINEERING
Conference on Electrical and Computer Engineering RESEARCH & TECHNOLOGY (IJERT) NTASU – 2020
(UKRCON), 2017, pp. 900-903, DOI: (Volume 09 – Issue 03).
10.1109/UKRCON.2017.8100379.
[3] Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H.
(n.d.)."fake news detection on social media: A data
Mining Perspective".
[4] S. B. Parikh and P. K. Atrey, "Media-Rich Fake News
Detection: A Survey," 2018 IEEE Conference on
Multimedia Information Processing and Retrieval
(MIPR), 2018, pp. 436- 441, DOI:
10.1109/MIPR.2018.00093.
[5] C. Buntain and J. Golbeck, "Automatically Identifying
Fake News in Popular Twitter Threads," 2017 IEEE

IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 314


|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS

RESEARCH PAPER – 2
IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 315
|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS

Received November 10, 2021, accepted November 16, 2021, date of publication November 18, 2021, date of current version November 30, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3129329

A Comprehensive Review on Fake News


Detection With Deep Learning
1
M. F. MRIDHA , (Senior Member, IEEE), ASHFIA JANNAT KEYA 1, MD. ABDUL HAMID 2
,

MUHAMMAD MOSTAFA MONOWAR2, AND MD. SAIFUR RAHMAN 1


1
Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh
2
Department of Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
Corresponding author: M. F. Mridha ([email protected])

ABSTRACT
A protuberant issue of the present time is that, organizations from different domains are struggling to obtain effective solutions for detecting online-based fake news. It is quite
thought-provoking to distinguish fake information on the internet as it is often written to deceive users. Compared with many machine learning techniques, deep learning-
based techniques are capable of detecting fake news more accurately. Previous review papers were based on data mining and machine learning techniques, scarcely exploring
the deep learning techniques for fake news detection. However, emerging deep learning-based approaches such as Attention, Generative Adversarial Networks, and
Bidirectional Encoder Representations for Transformers are absent from previous surveys. This study attempts to investigate advanced and state-ofthe-art fake news detection
mechanisms pensively. We begin with highlighting the fake news consequences. Then, we proceed with the discussion on the dataset used in previous research and their NLP
techniques. A comprehensive overview of deep learning-based techniques has been bestowed to organize representative methods into various categories. The prominent
evaluation metrics in fake news detection are also discussed. Nevertheless, we suggest further recommendations to improve fake news detection mechanisms in future research
directions.

INDEX TERMS Natural language processing, machine learning, deep learning, fake news.
I. INTRODUCTION difficult to determine. As a result, any false or incorrect information is typically
The Internet has changed interaction and communication ways through low cost, branded as misinformation on the Internet. Distinguishing real and fake
simple access, and fast information dissemination. Therefore, social media and information is challenging. However, many approaches have been adopted to
online portals have become more popular for news searches and reading for many address this issue. Various machine learning (ML) methods have been used to detect

people rather than traditional newspapers. Social media harms society by false information spread online in the case of knowledge verification [16], natural
influencing major events even though it has become a powerful means of language processing (NLP) [16]–[18] and sentiment analysis [19]. Early research
information. Especially after the presidential election of the U.S. in 2016, the issue of concentrated on leveraging textual information derived from the article’s content,
online false news has gained more popularity [1], [2]. According to Zhang and such as statistical text features [20] and emotional information [21]–[23].
Ghorbani [3], voters might be easily controlled by deceptive political statements and Deep learning (DL) has recently become an emerging technology among the
claims. Inspection shows that false news or lies propagate more quickly through research community and has proven to be more effective in recognizing fake news
humans than original information and cause tremendous effects [4]. than traditional ML methods. DL has some particular advantages over ML, such as
The terms rumor and fake news are closely interrelated. a) automated feature extraction,
Fake news or disinformation is intentionally created. On the TABLE 1. A comparison of existing surveys based on fake news detection.
The associate editor coordinating the review of this manuscript and extract high-dimensional features, and c) better accuracy. Further, the current wide
other hand, rumors are unconfirmed and questionable information that is spread availability of data and programming frameworks has boosted the usage and
without the aim to deceive [15]. On social media sites, spreaders’ intentions might be robustness of DL-based approaches. Hence, in the last five years, numerous articles

IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 316

approving it for publication was Sergio Consoli. b) lightly dependent on data pre-processing, c) ability to
|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS
have been published on fake news detection, mostly based on DL strategies [24]. An There has always been fake news since the beginning of human civilization.
enthusiastic effort has been made to review the current literature to compare the However, the spread of fake news is increased by modern technologies and the
extensive amount of DL-based fake news detection research efforts. conversion of the global media landscape. The major consequences on social,
A number of research works has been published on the survey of fake news political, and economic environments may be caused by fake news. Fake
detection [5], [25], [26]. Our investigation reveals that existing studies do not information and fake news have various faces. As information molds our view
provide a thorough overview of deep learning-based architectures for detecting fake toward the world, fake news has a huge impact. We make critical decisions based on
news. The existing survey papers mostly cover the ML strategies in detecting fake the information. By obtaining information, we develop an impression about a
news, scarcely exploring the DL strategies [3], [9], [10]. We provide a complete list of situation or people. We cannot obtain good decisions if we find fake, false, distorted,
NLP techniques as well as describe their benefits and drawbacks. In what follows, in or fabricated informationontheInternet.Theprimaryimpactsoffakenews are as
this survey, we performed an in depth analysis of current DL-based studies. Table 1 follows:
provides a brief overview of the existing survey papers and our research Impact on Innocent People: Rumors can have a major impact on specific people.
contributions. The present study aims to address the previous research’s weaknesses These people may be harassed by social media. They may also face insults and
and strengths by conducting a systematic survey on fake news detection. First, we threats that may have real-life consequences. People must not believe in invalid
divide existing fake news detection research into two main categories: (1) Natural information on social media or judge a person.
Language Processing (NLP) and (2) Deep Learning (DL). We discuss the NLP Impact on Health: The number of people searching for health-related news on the
techniques such as data pre-processing, data vectorizing, and feature extraction. Internet is continuously increasing. Fake news in health has a potential impact on
Second, we analyze the fake news detection architectures based on different DL people’s lives[36].Therefore,thisisoneofthemajorchallengestoday. Misinformation
architectures. Finally, we discuss used evaluation metrics in fake news detection. about health has had a tremendous impact in the last year [37]. Social media
Figure 1 depicts an overall taxonomy of fake news detection approaches. We also platforms have made some policy changes to ban or limit the spread of health
include a table 2, including acronyms used throughout the survey to assist misinformation as they face pressure from doctors, lawmakers, and health
researchers when encountering issues due to acronyms. advocates.
The rest of the paper is organized as follows. Section II highlights the consequences Financial Impact: Fake news is currently a crucial problem in industries and the
of fake news. Section III describes the used datasets. Section IV explains the Natural business world. Dishonest businessmen spread fake news or reviews to raise their
Language Processing techniques in fake news detection. Section V contains an in- profits. Fake information can cause stock prices to fall. It can ruin the fame of a
depth analysis of deep learning strategies. Section VI presents the evaluation metrics business. Fake news also has an impact on customer expectations. Fake news can
used in previous studies. Section VII narrates the challenges and future research create an unethical business mentality.
direction. Finally, Section VIII concludes the paper.
Democratic impact: The media has discussed the fake news phenomenon
II. FAKE NEWS CONSEQUENCES significantly because fake news played a
TABLE 2. The table contains the acronyms used in this survey.

IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 317


|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS

vital role in the last American presidential election. This is III. BENCHMARK DATASET
a major democratic problem. We must stop spreading fake In this section, we discuss the datasets used in various news as it has a real impact. studies. For both training and
testing, benchmark datasets
TABLE 3. The table provides details of publicly available datasets and corresponding URLs.

IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 318


|| Volume 6 || Issue 6 || June 2021 || ISSN (Online) 2456-0774
INTERNATIONAL JOURNAL OF ADVANCE SCIENTIFIC RESEARCH
AND ENGINEERING TRENDS
were utilized. One of the difficulties in identifying fake news is the shortage of a
labeled benchmark dataset with trustworthy ground truth labels and a massive
dataset. Based on that, researchers can obtain practical features and construct
models [38]. For several usages in DL and ML, such datasets have been collected
over the last few years. The datasets are vastly diverse from one another because of
different study agendas. For instance, a few datasets are made up entirely of
political statements (such as PolitiFact), while others are made up entirely of news
articles (FNC-1) or social media posts (Twitter). Datasets can differ based on their
modality, labels, and size. Therefore we categorize these datasets in table 3 based on
these characteristics. Fake articles are frequently collected from fraudulent websites
designed intentionally to disseminate disinformation. These false news stories are
eventually shared on social media platforms by their creators. Malicious individuals
or bots and inattentive users who do not care to check the source of the story before
sharing it assist in spreading fake news through social media. However, most
datasets contain only news content. But current language features and writing style
are not sufficient enough in developing an efficient detection model.
Fake news, Twitter15, and Liar are the most popular datasets that are publicly
available. But some studies trained their model with their created dataset [39]. We
defined these datasets as self-collected. Since sufficient information is not provided
about their self-collected datasets, we find it difficult to compare with other studies
properly. Using the benchmark dataset, a comparative study can be established with
current state-of-the-art methods for detecting fake news. Kaliyar et al. [40]
conducted a comparative study of their suggested model with existing methods using
the Kaggle

FIGURE 2. A pie chart of the benchmark datasets used in the studies of fake news
detection.
dataset and they reported an accuracy of 93.50% which is the highest, utilizing the
same dataset for fake news detection.
A pie chart of used benchmark datasets is given in 2.
IV. NATURAL LANGUAGE PROCESSING
Natural Language Processing (NLP) is an area in machine learning with the
capability of a computer to understand, analyze, manipulate, and potentially
generate human language. The NLP technique consists of data pre-processing and
word embedding. By utilizing deep learning techniques, NLP has seen some colossal
advancements in recent years [41]. The natural language must be transformed into
a mathematical structure to give machines a sense of natural language. In section
IV-A, IV-B, and IV-C, NLP techniques are discussed.
A. DATA PRE-PROCESSING
Data pre-processing is utilized to represent complex structures with attributes,
binarize attributes, change discrete attributes, persist, and manage lost and obscure
attributes.

IMPACT FACTOR 6.228 WWW.IJASRET.COM DOI: 10.51319/2456-0774.2021.6.0067 319


During data pre-processing, different visualization procedures are helpful. A
cautious pre-processing strategy is required to ingest the data in a neural
network for fake news detection because social media data sources are
fragmented, unstructured, and noisy. It is a popular fact that amid the learning
stage, data pre-processing saves computational time and space. In addition,
limiting the impact of artifacts during the learning process, text pre-processing
avoids every ingests of noisy data. The data becomes a logical representation
after proper text pre-processing. It also included the most representative
descriptive words. Umer et al. [42] experimented on a fake news detection model
in which the accuracy was 3) TOKENIZATION, STEMMING AND LEMMATIZATION
only78%whentheyusedthefeaturesexcludingdatacleaning or pre-processing, Tokenization is a method of breaking down a text into words. This can be
which is surprisingly poor. After performing the pre-processing steps and applied to any character. Performing tokenization on a space character is the
removing unnecessary data, the accuracy increases dramatically to 93.0%. Data most common way of tokenization.
quality assessment, dimensionality reduction, and splitting of the dataset are the Chopping off an end to achieve the base word is called stemming. The removal
data pre-processing steps used in various studies [39], [41], [43]. The pre- of derivational affixes is usually included in the stemming. A derivational affix is
processing steps are elaborated in Sections IV-A1, IV-A2, and IV-A3. an affix in which one word is obtained from another. The derived word is usually
1) DATA QUALITY ASSESSMENT a distinct class of words from the original.
Data are frequently taken from numerous sources that are ordinarily reliable Lemmatization is a text normalization procedure that morphologically analyzes
and are in completely different formats. When working on a machine learning words, generates the root form of inflated words, and is normally intended to
problem, more time is invested in managing data quality issues. It is remove inflectional endings [64]. A group of letters applied to the end of a word
unreasonable to anticipate that the data would be perfect. There may be some to modify its meaning is known as an inflectional ending. Some examples of
issues due to a human blunder, defects within the data collection process, or inflectional endings are s, bat, and bats.
restrictions on measuring gadgets. The quality of a dataset is often responsible Rusli et al. [52] performed two experiments to detect fake news with and without
for the poor performance of fake news detection models. For this reason, the stemming and stop-word removal. They used stemming and stop-word removal
quality of the data used in any machine learning project will have a huge effect for removing all affixes and stop-words. They achieved a 0.82 macro-averaged
on the chances of success. However, only a few studies ensure the quality of their F1-score by performing the stemming and stop-word removal processes. They
used datasets. S and Chitturi [41] collected the George McIntire dataset from also achieved a 0.8 macro-averaged F1-score without performing stemming and
GitHub and dropped the rows that did not have labels in the clarifying process, stop-word removal. Performing the stemming and stop-word removal processes
and the process surely has a huge impact on their success in fake news detection. in the text preprocessing phase was time-consuming, but there was a small
To ensure the quality of the entire dataset, Wang et al. [44] removed duplicate difference in the results. Although tokenization, stemming, and lemmatization
and low-quality images. Alsaeedi and Al-Sarem [45] extended the data cleaning improve the performance of the classifier, many researchers have not used these
process by URL removal, lowercase and hashtag character (#) removal, mention techniques [4], [65]. Jain and Kasbe [66] presented simple technique with web
character (@), and number removal. They also considered words with recurring scrapping for detecting fake news. They showed that updating the dataset
characters such as ‘‘Likkke’’ and handled emoticons by supplanting positive regularly with web scrapping a model’s truthfulness can be checked. The
emoticons with a ‘‘positive’’ word and with a ‘‘negative’’ word for negative authors achieved an accuracy of 91% based on text. The result can be improved
emoticons. greatly with some extra preprocessing, such as stemming and omitting stop
2) TRAIN/VALIDATION/TEST SPLIT BASED words.
The dataset may be divided into train, test, and validation sets. The sample of B. WORD VECTORIZING
data that is utilized to adjust the parameters is called the training set. The
validation set is a series of examples used to fine-tune the parameters of a model. Word vectorizing involves mapping the word/text to a list of vectors. TF-IDF
A set of examples applied only for assessing a fully-specified model’s and Bag of Words (BoW) vectorization techniques are commonly used in
performance is regarded as the test set. Although many studies on fake news machine learning strategies to identify fake news [4], [53], [63]. In term
detection have divided their dataset into training, validation, and test sets, few frequency inverse document frequency (TF-IDF), the value rises proportionally
studies have used only the training, and test sets [46], [47]. The ratios of data to the number of times a word emerges in the document but is balanced by the
split 60:20:20, 70:30, and 80:20 are very common in fake news detection. The frequency of the word in the body. Although this vectorization is successful, the
Pareto principle (for many outcomes, roughly 80% of consequences come from semantic sense of the words is lost in its attempt to translate to numbers [48].
20% of the causes) is used to describe the 80:20 ratio. It is typically a safe bet to The BoW technique considers every news article to be a document and computes
use the ratio that all studies applied. Mandical et al. [48] applied the ratio of the frequency count of each word within this document, which is then used to
90:5:5 and 80:10:10 when the number of articles in the dataset was less than produce a numeric representation of the data. In addition to data loss, this
10,000 and greater than 10,000, respectively. However, they did not specify the approach also has limitations. The relative location of the words is overlooked,
purpose behind it. Jadhav and Thepade [49] compared their model performance and contextual information is lost. This loss can be costly at times when
based on the data splitting ratio and showed that 75%–25% data split has more measured against the benefit in computing convenience with the ease of use [46].
prominent performance than other models possessing diverse splits. The model Rusli et al. [52] used TF-IDF and Bag of Words feature extraction methods to
parameter estimates exhibit more prominent variation with smaller training detect fake news. However, this approach may suffer due to loss of information.
data. Performance statistics exhibit more prominent variation with smaller Neural network-based models have accomplished victory on diverse language-
testing data. Studies should be careful with splitting data so that neither related roles as opposed to traditional machine learning-based models such as
variation is too large or too small, and it has more to do with the total number of logistic regression of support vector machine (SVM) by utilizing word
instances in each category rather than the percentage. The optimal split of the embeddings in fake news detection. It maps words or text to a list of vectors.
test, validation and train sets is determined by hyperparameters, model They are low-dimensional, and disseminated feature representations are
architecture, data dimension, etc. Table 4 provides an overview of the appropriate for natural languages. The term ‘‘word embedding’’ refers to a
advantages and disadvantages of the splitting ratios used in most studies: combination of language modeling and feature learning. Words or expressions
TABLE 4. The table gives an overview of common dataset partitioning based on from the lexicon are allocated to real-number vectors. Neural network models
training, validation, and testing with advantages and disadvantages. Few studies essentially utilize this method for fake news detection [42], [96]. Word
mentioned their data partitioning, and only those references are given in the representation was performed using dense vectorsin wordembedding.
table. Thesevectors representthe word mapping onto a continuous, high-dimensional
vector space. This is considered an improvement over the BoW model, wherein
large sparse vectors of vocabulary size were used as word vectors. These large
vectors also provided no information about how the two words were interrelated
or any other useful information [50]. Recently, fake news detection researchers
have used pre-trained word-embedding models such as global vectors for word
representation (GloVe) and Word2vec. The primary benefit of using these
models is their ability to train with large datasets [40]. Unlike Word2Vec, GloVe
supports parallel implementation, making it easier to train the model on huge
datasets. Table 5 gives a summary of the NLP techniques and word vector the users who are interacting [120]. We provide Figure 3 depicting important
models used in deep learning-based fake news detection papers. features that were utilized to detect fake news precisely.
C. FEATURE EXTRACTION It is pivotal to choose the correct determination algorithm for decreasing
features because feature reduction contains an incredible effect on the text
A huge amount of computational power and memory is required to analyzea classification results. Some common feature reduction algorithms include Gini
large numberof variables. Classification algorithms may overfit the training Coefficient (GI), Term Frequency-Inverse Document Frequency (TF-IDF),
samples and induce poorly to new samples. Feature extraction is a process of Information Gain (IG), Mutual Information (1v1I),
building combinations of variables to overcome these difficulties while still PrincipalComponentAnalysis(PCA),andChi-SquareStatistics (CHI ). In the
representing the data with adequate precision. Feature extraction and feature process of content classification, the linear classification model works well with
selection are frequently used in text mining [69], [97]. the TF-IDF model [121]. PCA and Chi-square were utilized to improve the
Fake news detection strategies concentrate on applying news content and social adaptability of the text classifier combined with deep learning models. A number
context features [98]. News content features highlights depict the meta- of studies compared their model accuracy with and without feature extraction
information relevant to a chunk of news [5]. Commonly, in news validation, and found that with feature extraction, the success rate is higher. Umer et al. [42]
news content (linguistics and visual information) is used as a feature [99], [100]. compared the applications of feature reduction methods (PCA and Chi-square)
Textual features comprise the writing style and emotion [101], [102]. applied with two deep learning models. When the proposed model is utilized
Furthermore, hidden textual representations are generated using tensor with the reduced feature set, it increases the F1-score and accuracy by 20% and
factorization [103]–[105] and deep neural networks [106]–[108], achieving high 4%, respectively, compared to the other techniques. However, many studies did
performance in detecting false news with news contents. Visual features are not perform feature extraction, although it has a significant impact on the result
retrieved from visual components such as image and video, but only a few [16], [122]. Neural networks are considered very powerful machine learning
studies utilized visual features in fake news detection [109], [110]. In contrast, tools due to their ability of complex feature extraction. Instead of relying on
social context information can also be aggregated for detecting fake news in manual feature selection and other existing techniques, researchers are currently
social media. There are three main perspectives of social content: a) users, b) focusing on neural networks for feature extraction [123]. Yang et al. [124]
produced posts and c) networks (connection amidst the users who distributed employed a model TI-CNN (Text and Image information based convolutional
relevant posts) [5]. User-based features are typically from the user profile in neural network) to extract latent features from both visual and textual
social media [98], [111]. Users’ social responses in terms of stances [42], [64], information and achieved promising results [124]. Another study [107] used the
topics [112], or credibility [113]–[115] are represented via post-based features. deep recurrent neural network model for extraction of a collection of latent
Recently, several studies have focused on stance features to detect fake news features for news producers, posts, and topics.
[64]. It can be effective for human fact-checkers to distinguish false claims [113], V. DEEP LEARNING APPROACH FOR FAKE
[114]. To check the authenticity of a claim/report/headline, it is essential to
understand what different news agencies are declaring about that particular NEWS DETECTION
claim/report/headline. Reference [116]. Features that are network-based are Deep learning models have seen exceptional growth in recent times owing to
retrieved by creating specialized networks, such as diffusion networks, their promising success in several fields, including communication and
interaction networks, and propagation networks [117]–[119]. The propagation networking [125], [126], computer vision [127], [128], intelligent transportation
network contains rich information about user interactions (likes, comments, [129], speech recognition [130], as well as NLP. Deep learning systems have
responses, or shares) that advantages over traditional machine learning methods. Deep learning is a
subfield of machine learning strategies, which displays high precision and
exactness in fake news detection. Generally, ML methods are based on hand-
crafted features. Biased features may appear because feature extraction
assignments are challenging and slow. ML approaches failed to achieve
prominent results in fake news detection. Because ML approaches produce high-
dimensional representations of linguistic information, resulting in the curse of
dimensionality. The existing neural network-based models have outperformed
the traditional models in terms of their performance owing to their exceptional
feature extraction ability [62]. In contrast, DL systems can acquire hidden
representations from less complex inputs. The hidden features can be extracted
from both the news content and context varieties. A study by Hiramath and
Deshpande [78] showed that deep neural networks (DNNs) require less time
TABLE 5. The table provides the advantages and disadvantages of Word Vector Models, along with the references.

than other ML-based classification algorithms such as logistic regression,


FIGURE 3. The infographic illustrates the social content/context features such
random forest (RF), and SVM, etc. However, DNNs use more memory.
as user, post and network elaborately.
Convolutional neural network (CNN) and recurrent neural network (RNN) are
show the direction of information flow, timestamp details about interactions, two broadly utilized ideal models for deep learning in
textual information about user interactions, and user profile information about
regularization layer are the most utilized layers in CNNs for fake news detection.
The input data can be manipulated through pooling and convolution operations.
Sections V-A1, V-A2, and V-A3 describe the popular layers used in CNN.

FIGURE 6. The figure shows the architecture of CNN. Here, an input picture of
a snowflake is given to the CNN picture classifier. The input goes through a
series of convolution layers, pooling layer, fully connected layers, and classifies
FIGURE 4. A nested pie chart illustrating the percentage of published articles the object based on learned features.
and popular models each year. 1) CONVOLUTION LAYER
CNNs work very well with image classification and computer vision because of
the convolution operation, and their ability to extract features from inputs for
better representation makes them very efficient. These properties make CNNs
powerful in sequence processing [131]. Fernández-Reyes and Shinde [77]
proposed a CNN architecture called, StackedCNN (2-dimensional convolution
layers, rather than 1-dimensional convolutions). It is proven that finding
patterns in text data a fusion of pre-trained word embeddings with 2-
dimensional convolutional layers helps, but the performance of the StackedCNN
is poor compared to state-of-the-art CNN. Another study by Li et al. [132]
adopted a novel approach with multilevel CNN (MCNN) and Sensitive word’s
weight calculating method (TFW). MCNN-TFW successfully captured semantic
information from the article text content. For this reason, it outperforms the
compared methods, including CNN. Their work did not consider latent-based
FIGURE 5. The diagram illustrates the general deep learning-based architecture
features. Alsaeedi and Al-Sarem [45] added more convolution layers, and it has
that was used in most studies.
an impact on the proposed model performance. According to the results, the
cutting-edgeartificialneuralnetworks.Therefore,weprovide Figure 4, which model’s performance is lowered by about 0.014.
shows the percentage of DL-based fake news detection papers with used
2) POOLING LAYER
classifiers in recent years.
A pooling operation that chooses the greatest component from each patch of
After inspecting previous studies, we found a general framework for deep
each feature map covered by the filter is called max pooling. A pooling layer is a
learning-based fake news detection. The first step was to collect a dataset or
new layer attached to the convolutional layer. Its purpose is to continuously
create one. Most studies have used news articles collected from publicly
diminish the spatial size of the representation in order to decrease the number of
available datasets. The pre-processing technique was applied after collecting the
parameters and the calculation inside the network. The pooling layer operates
dataset to feed the data in a neural network [42], [96], [131]. Word2vec and
autonomously on each feature map. Max pooling or average pooling is the most
GloVe word embedding methods have mostly been used in previous studies to
commonly used function in fake news detection. Alsaeedi and Al-Sarem [45]
map words into vectors [41], [78], [80]. We represent an overall process for fake
adjusted the hyperparameter settings in a CNN. They found the best parameter
news identification with deep learning in Figure 5 based on various studies [40],
settings that gave an improvement in the model’s performance. The
[42], [61].
recommended CNN model performs best when the number of units in the dense
148 DL-based studies were examined to provide a detailed description of these layer is set to 100, the number of filters is set to 100, and the window size is set to
architectures: CNN in section V-A and RNN in Section V-B, Graph Neural 5. The GlobalMaxPooling1D method achieved the highest scores, showing that it
Network in Section V-C, works well for fake news detection when compared to other pooling methods
Generative Adversarial Network in Section V-D, Attention Mechanism in [45].
Section V-E, Bidirectional Encoder Representations for Transformers in Section 3) REGULARIZATION LAYER
V-F, and Ensemble Approach in Section V-G.
The most crucial problem of classification is to reduce the training and test
A. CONVOLUTIONAL NEURAL NETWORK (CNN) errors of the classifier. Another common issue is the over-fitting problem (the
A few deep learning models have been introduced to handle ambiguous detection space between training and testing errors is huge). Overfitting makes it difficult
issues. CNNs and RNNs are the most interesting models [77]. Researchers are to generalize the model as it becomes more applicable (overfit) to the training
trying to boost the performance of the fake news detector with CNN by taking set. Regularization is a solution to the overfitting problem. Regularization is
its power of extracting features well and better classification process [132]. applied to the model to lessen the problem of overfitting and decrease the error
However, CNNs are also gaining popularity in the NLP technique too. It is of generalization, but not the error of training [45]. The dropout regularization
utilized for mapping the features of n-gram patterns. The CNN is similar to a method is mostly used for fake news detection [133]. Other methods such as
multilayer perceptron (MLP) as it is an unsupervised multilayer feed-forward early stopping and weight penalties were not used in previous studies on fake
neural network [45]. The CNN consists of an input layer, an output layer, and a news detection. Dropout avoids overfitting by gradually
sequence of hidden layers. CNNs are mostly used for picture recognition and filteringoutneurons.Eventually,allweightsarecalculatedasan average so that the
classification. Neural networks with 100 or more hidden layers have been weight is not too high for a single neuron.
reported in recent studies. Backward-propagation and forward-propagation B. RECURRENT NEURAL NETWORK (RNN)
algorithms are utilized in neural networks. These algorithms are used to train
The RNN is a type of neural network. In RNN, nodes are sequentially connected
neural networks by updating the weights of each layer. The gradient (derivative)
to construct a directed graph. The output from the earlier step serves as the
of the cost function is utilized to update the weights. When the sigmoid
input to the current step. RNNs are effective in time and sequence-based
activation function is applied, the value of the gradient decreases per layer. This
predictions. RNN is less compatible with features compared to CNN. RNNs are
lengthens the training time. This problem is called the vanishing-gradient
suitable for studying sequential texts and expressions. However, it cannot
problem. A deeper CNN or a direct connection in dense solves this problem.
process very long sequences when tanh or ReLU is used as an activation
Compared to a normal CNN, a deeper CNN is also
function.
lessvulnerabletooverfitting[67].Kaliyaretal.[40]proposed a model FNDNet (deep
CNN), which is designed to learn the discriminatory features for fake news The backward-propagation algorithm is utilized in the RNN for training. While
detection using multiple hidden layers. The model is less prone to overfitting but training the neural networks, it is required to take tiny steps frequently in the
takes a longer time to train. The convolutional layer, pooling layer, and way of the negative error derivative concerning network weights to establish
API. It requires more time to train and test the suggested model. Liao et al. [137]
proposed a novel model called fake news detection multi-task learning (FDML).
The model explores the influence of topic labels for fake news while also using
contextual news information to improve detection performance on short false
news. The FDML model, in particular, is made up of representation learning and
multi-task learning components that train both the false news detection task and
the news topic categorization task at the same time. However, the performance
of the model decreases without the author’s information.
2) GATED RECURRENT UNIT (GRU)
In terms of structure and capabilities, GRU is comparatively easier and more
proficient than LSTM. This is because there are only two gates, to be specific,
reset and update. The GRU manages the information flow in the same manner
as the LSTM unit does, but without the use of a memory unit. It literally exposes
the entire hidden content with no control whatsoever. When it comes to learning
long-term dependencies, the quality of GRU is way better than LSTM. Hence, it
is a promising candidate for NLP applications [41]. GRUs are more
straightforward as well as much more proficient compared to LSTM. GRU is
still in its early stages, thus, we are seeing it being used lately to identify false
news. GRU is a newer algorithm with a performance comparable to that of
LSTM but greater computational efficiency. Li et al. [134] used a deep
bidirectional GRU neural network (two-layer bidirectional GRU) as rumor
detection model. The model suffers from slow convergence. S and Chitturi [41]
showed that it is difficult to determine whether one of the gated RNNs (LSTM,
GRU) is more successful, and they are usually chosen based on the basis of the
FIGURE 7. The figure shows an architecture of basic RNN with n sequential available computing resources. Girgis et al. [96] experimented with CNN,
layers. x represents the inputs and y represents the output generated by the LSTM, Vanilla, and GRU. Vanilla suffers from a gradient vanishing problem,
RNN. but GRU solves this issue. Though GRU is said to be the best outcome of their
a minimum error function. The size of the gradients becomes tiny for each studies, it takes more training time. A bidirectional GRU was utilized by
consequent layer. Thus, the RNN suffers from a vanishing gradient issue in the Singhania et al. [87] forword-by-wordannotation.Withprecedingandsubsequent
bottom layers of the network. We can deal with the vanishing gradient problem words, it captures the word’s meaning within the sentence. A study by Shu et al.
by using three solutions: (1) using rectified linear unit (ReLU) activation [100] proposed a sentence-comment co-attention subnetwork model named
function, (2) using RMSProp optimization algorithm, and (3) using diverse dEFEND (Explainable fake news detection) utilizing news content and user
network architecture such as long short-term memory networks (LSTM) or comments for fake news detection. The authors considered textual information
gated recurrent unit (GRU). So previous studies focused on LSTM and GRU with bidirectional GRU (Bi-GRU) to achieve better performance. Moreover, the
rather than the state-of-the-art RNN [80], [96], [134]. Bugueño et al. [80] model has a low learning efficiency.
proposed a model based on RNN for propagation tree classification. The authors C. GRAPH NEURAL NETWORK (GNN)
used RNN for sequence analysis. The number of epochs was set as 200, which is
A Graph Neural Network is a form of neural network that operates on the graph
relatively high in comparison to their training examples. To predict fake news
structure directly. Node classification is a common application of GNN.
articles, authors have proposed distinctive RNN models, specifically LSTM,
Essentially, every node in the network has a label, and the network predicts the
GRU, tanhRNN, unidirectional LSTM-RNN, and vanilla RNN. RNNs, and in
labels of the nodes without using the ground truth. The network extends
specific LSTM, are especially successful in processing sequential data (human
recursive neural networks by processing a broader class of graphs, including
language) and catching significant features out of diverse data sources. Further,
directed, undirected graphs, and cyclic, and it can handle node-focused
in Sections V-B1 and V-B2, we discuss LSTM and GRU.
applications except any pre-processing steps [138]. The network extends
1) LONG SHORT-TERM MEMORY (LSTM) recursive neural networks by processing a broader class of graphs, including
LSTM models are front runners in NLP problems. LSTM is an artificial cyclic, directed, and undirected graphs, and it can handle node-focused
recurrent neural network framework used in deep learning. LSTM is a applications without requiring any pre-processing procedures cite190. GNN
progressed variation of RNN [41]. RNNs are not capable of learning long-term captures global structural features from graphs or trees better than the deep-
dependencies because back-propagation in recurrent networks takes a while, learning models discussed above [139]. GNNs are prone to noise in the datasets.
particularly for the evolving backflow of blunder. However, LSTM can keep Adding a little amount of noise to the graph via node perturbation or edge
‘‘Short Term Memories’’ for ‘‘Long periods.’’ The LSTM is made up of three deletion and addition has an antagonistic effect on the GNN output. Graph
gates: an input gate, an output gate, a forget gate, and a cell. Through a convolutional network (GCN) is considered as one of the basic graph neural
combination of the three, it calculates the hidden state. The cell can recall values networks variants.
over a large time interval. The word’s connection within the beginning of the A study by Huang et al. [140] claimed to be the first that experimented using a
content can impact the output of the word afterward within the sentence for this rich structure of user behavior for rumor detection. The user encoder uses graph
reason [67]. LSTM is an exceptionally viable solution for tending the vanishing convolutional networks (GCN) to learn a representation of the user from a
gradient issue. Bahad et al. [61] proposed an RNN model that suffers from the graph created by user behavioral information. The authors used two recursive
vanishing gradient issue. To tackle this issue, they implemented an LSTM-RNN. neural networks based on tree structure: bottom-up RvNN encoder and top-
But still, LSTM could not solve the vanishing gradient issue completely. The down RvNN encoder. The tree structure is shown in Figure 8. The proposed
LSTM-RNN model had a higher precision compared to the initial state-of-the- model performed worse for the non-rumor class cause user behavior
art CNN. Asghar et al. [135] proposed bidirectional LSTM (Bi-LSTM) with information brings some interference in non-rumor detection.
CNN for rumor detection. The model preserves the sequence information in
Another study by Bian et al. [139] proposed top-down GCN and bottom-up GCN
both directions. The Bi-LSTM layer is effective in remembering long-term
using a novel method DropEdge [141] for reducing over-fitting of GCNs. In
dependency. Even though the BiLSTM-CNN beat the other models, the
addition, a root feature enhancement operation is utilized to improve the
suggested approach is computationally expensive.
performance of rumor detection. Although it performed well on three datasets
A study by Ruchansky et al. [123] suggested a model called CSI, which (Weibo, Twitter15, Twitter16), the outliers in the dataset affected the models’
comprises three modules, Capture, Score, and Integrate. The capture module performance.
extracts features from the article, and the score module extracts features from
On the other hand, GCNs incur a significant memory footprint in storing the
the user. Then by integrating article and user-based features, the CSI model
complete adjacency matrix. Furthermore, GCNs are transductive, which implies
performs the prediction for fake news detection. The CSI model has fewer
that inferred nodes must be present at the training time. And do not guarantee
parameters than other RNN-based models. Another study by Sahoo and Gupta
generalizable representations [142]. Wu et al. [143] proposed an algorithm of
[136] proposed an approach with both user profile and news content features for
representation learning with a gated graph neural network named PGNN
detecting false news on Facebook. The authors used LSTM to identify fake news,
(propagation graph neural network). The suggested technique can incorporate
and a set of new features are extracted by Facebook crawling and Facebook
structural and textual features into high-level representations by propagating resilient against potential attacks. Though the model performed well, it is not
information among neighbor nodes throughout the propagation network. In evaluated using defense mechanisms, namely adversarial learning.
order to obtain considerable performance improvements, they also added an E. ATTENTION MECHANISM BASED
attention mechanism. The propagation graph is built using the whoreplies-to-
whom structure, but the follower-followee and forward relationships are The attention-related approach is another notable advancement. In deep neural
omitted. Zhang et al. [144] presented a simplified aggregation graph neural networks, the attention mechanism is an effort to implement the same behavior
network (SAGNN) based on efficient aggregation layers. Experiments on of selectively focusing on a few important items while ignoring others. Attention
publicly accessible Twitter datasets show that the proposed network is a bridge that connects the encoder and decoder, which provides information to
outperforms state-of-the-art graph convolutional networks while considerably the decoder from each encoder’s secret state. Using this framework, the model
lowering computational costs. selectively concentrates on the valuable components from the input. Thus the
model will be able to discover the associations among them. This allows the
D. GENERATIVE ADVERSARIAL NETWORK (GAN) model to deal with lengthy input sentences more effectively. Unlike RNNs or
Generative Adversarial Networks (GANs) are deep learningbased generative CNNs, attention mechanisms maintain word dependencies in a sentence despite
models. The GAN model architecture consists of two sub-models: a generator the distance between them. The primary downside of the attention mechanism is
model for creating new instances and a discriminator model for determining that it adds additional weight parameters to the model, which might lengthen
whether the produced examples are genuine or fake, generated by the generator the training time, especially if the model’s input data are long sequences.
model. Existing adversarial networks are often employed to create images that A study by Long [150] proposed attention-based LSTM with speaker profile
may be matched to observed samples using a minimax game framework [44]. features, and their experimental findings suggest that employing speaker
The generator model produces new images from the features learned from the profiles can help enhance fake news identification. Recently, attention
training data that resemble the original image. The discriminator model predicts techniques have been used to efficiently extract information related to a mini
whether the generated image is fake or real. GANs are extremely successful in query (article headline) from a long text (news content) [47], [87]. A study by
generative modeling and are used to train discriminators in a semisupervised Singhania et al. [87] used an automated detector through a three-level
context to assist in eliminating human participation in data labeling. hierarchical attention network (3HAN). Three levels exist in 3HAN, one for
Furthermore, GANs are useful when the data have imbalanced classes or words, one for sentences, and one for the headline. Because of its three levels of
underrepresented samples. GANs produce synthetic data only if they are based attention, 3HAN assigns different weights to different sections of an article. In
on continuous numbers. But GANs are inapplicable to NLP data because all contrast to other deep learning models, 3HAN yields understandable results.
NLPs are based on discrete values such as words, letters, or bytes [145]. To train While 3HAN only uses textual information, a study by Jin et al. [47] used image
features,
including
social
context and
text
features, as
well as
attention
on RNN
(att-RNN).
Another
study used
RNNs with
a soft-
attention
mechanism
to filter out
unique
linguistic
features
[151].
However,
FIGURE 8. This figure illustrates the propagation tree structure encoder taken from Huang et al. [140]. this method
is based on
GANs for text data, novel techniques are required. distinct domain and community features without any external evidence. Thus, it
A study by Long [145] provides a restricted context for credibility analysis.
proposed sequence GAN
(SeqGAN), which is a GAN architecture that overcomes the problem of gradient
descent in GANs for discrete outputs by employing reinforcement learning (RL)
based approach and Monte Carlo search. The authors provide actual news
content to the GAN. Then a classifier based on Google’s BERT model was
trained to identify the real samples from the samples generated by the GAN. The
architecture of SeqGAN is provided in Figure 9.
In generative adversarial networks, the principle of adversarial learning was
invented. The adversarial learning concept has produced outstanding results in
a wide range of topics, including information retrieval [146], text classification
[147], and network embedding [148]. The unique problem for detecting fake
news is the recognition of false news on recently emergent events on social
media. To solve this problem, Wang et al. [44] suggested an endto-end
architecture called event adversarial neural network (EANN). This architecture
is used to extract event-invariant characteristics and, therefore, aids in the
identification of false news on newly incoming events. It is made up of three
major components: a multimodal feature extractor, a fake news detector, and an
event discriminator. Another study by Le et al. [149] introduced Malcom that
generates malicious comments which have fooled five popular fake news FIGURE 9. A basic SeqGAN architecture. The figure is taken from
detectors (CSI, dEFEND, etc.) to detect fake news as real news with 94% and Hiriyannaiah et al. [145].
90% attack success rates. The authors showed that existing methods are not
To overcome the shortcomings of previous works, Aloshban [152] proposed an whereas the pre-trained VGG-19 model was used to extract image features in
automatic fake news classification through self-attention (ACT). Their principle the multimodal feature extractor. The extracted features are then concatenated
is inspired by the fact that claim texts are fairly short and hence cannot be used and sent to the detector to differentiate between fake and real news. Moreover,
for classification efficiently. Their suggested framework makes use of mutual the existence of noisy images in the Weibo dataset have affected the BDANN
interactions between a claim and many supporting responses. The LSTM neural results. Kaliyar et al. [92] proposed a BERT-based deep convolutional approach
network was applied to the article input. The outcome of the final step of LSTM (fakeBERT) for fake news detection. The fakeBERT is a combination of
may not completely reflect the semantics of the article. Connecting all vector different parallel blocks of a one-dimensional deep convolutional neural network
representations of words in the text will lead to a massive vector dimension. (1d-CNN) with different kernel sizes and filters and the BERT. Different filters
Therefore, the internal connection between the articles’ words can be ignored. can extract convenient information from the training dataset. The combination
As a result, employing the self-attention function on the LSTM model extracts of BERT with 1d-CNN can deal with both large-scale structure and
key parts of the article through several feature vectors. Their strategy is heavily unstructured text. Therefore, the combination is beneficial in dealing with
reliant on selfattention and an article representation matrix. Graph-aware co- ambiguity.
attention networks (GCAN) is an innovative approach for detecting fake news G. ENSEMBLE APPROACH
[153]. The authors predict if a source tweet article is false based just on its brief
text content and user retweet sequence, as well as user profiles. Given the Ensemble approaches are strategies that generate several models and combine
chronology of its retweeters, GCAN can determine whether a short-text tweet is them to achieve better results. Ensemble models typically yield more precise
fraudulent. However, this model is not suitable for long text as it is difficult to solutions than a single model does. An ensemble reduces the distribution or
find the relationship between a long tweet and retweet propagation. dispersion of predictions and model efficiency. Ensembling can be applied to
supervised and unsupervised learning activities [86]. Many researchers have
F. BIDIRECTIONAL ENCODER REPRESENTATIONS FOR TRANSFORMERS used an ensemble approach to boost their performance [42], [133]. Agarwal and
(BERT) Dixit [63] combined two datasets, namely, Liar and Kaggle, to evaluate the
BERT is a deep learning model that has shown cutting-edge results across a wide performance of LSTM and achieved

variety of natural language processing applications. BERT incorporates pre- anaccuracyof97%.TheyalsousedvariousmodelslikeCNN, LSTM, SVM, naive
training language representations developed by Google. BERT is a sophisticated bayes (NB), and k-nearest neighbour (KNN) for building an ensemble model.
pre-trained word-embedding model built on a transformerencoded architecture The authors showed an average accuracy score of their used algorithms but did
[89]. The BERT method is distinctive in its capacity to identify and capture not show the accuracy of their ensemble model, which is a limitation of their
contextual meaning in a sentence or text [90]. The main restriction of work.
conventional language models is that they are unidirectional, which restricts the Often the CNN-LSTM ensemble approach has been used in previous DL-based
architectures that could be utilized during pre-training. The BERT model studies. Kaliyar [67] used an ensemble of CNN and LSTM, and the accuracy was
eliminates unidirectional limitations by using a mask language model (MLM). slightly lower than that of the state-of-the-art CNN model. However, the
BERT employs the next sentence prediction (NSP) task in addition to the precision and recall were effectively improved. Asghar et al. [135] obtained an
masked language model to jointly pre-train text-pair representations. BERT increase in the efficiency of their model by using Bi-LSTM. The Bi-LSTM
consists of two stages: pre-training and fine-tuning. During pre-training, the retains knowledge from both former and upcoming contexts before rendering its
model was trained on unlabeled data using a variety of pre-training tasks. For input to the CNN model. Even though CNN and RNN typically require huge
fine-tuning, the BERT model is first initialized with the pre-trained parameters, datasets to function successfully, Ajao et al. [133] trained LSTM-CNN with a
and then all of the parameters are fine-tuned using labeled data from the smaller dataset. The abovementioned works considered just text-based features
downstream jobs. The architecture of the BERT model is shown in figure 10. for fake news classification, whereas the addition of new features may generate a
The data utilized in the BERT model are generic data gathered from Wikipedia more significant result. While most studies used CNN with LSTM, a study by
and the Book Corpus. While these data contain a wide range of information, Amine et al. [131] merged two convolutional neural networks to integrate
specific information on individual domains is still lacking. To overcome this metadata with text. They illustrate that integrating metadata with text will result
problem, a study by Jwa et al. [75] incorporated news data in the pre-training in substantial improvements in fine-grained fake news detection. Furthermore,
phase to boost fake news identification skills. When compared to the state-of- when tested on real-world datasets, this approach shows improvements
the-art model stackLSTM, compared to the textonly deep learning model. Moving further Kumar et al. [86]
FIGURE 10. The BERT architecture taken from Devlin et al. [89]. employed the use of an attention layer. It assists the CNN + LSTM model in
learning to pay attention to particular regions of input sequences rather than the
the proposed model named exBAKE (BERT with extra unlabeled news corpora) full series of input sequences. Utilizing the attention mechanism with
outperformed by a 0.137 F1-score. Ding et al. [154] discovered that including CNN+LSTM was reported to be efficient by a small margin. Result analysis of
mental features such as a speaker’s credit history at the language level might DL-based studies is presented in Table 7.
considerably improve BERT model performance. The history feature helps
further the relationship’s construction between the event and the person in
reality. But these studies did not consider any pre-processing methods.
Zhang et al. [91] presented a BERT-based domainadaption neural network for
multimodal false news detection (BDANN). BDANN is made up of three major
components: a multimodal feature extractor, a domain classifier, and a false
news detector. The pre-trained BERT model was used to extract text features,
TABLE 6. The table contains the strength and limitation of popular existing studies with reference and used classifier.

VI. EVALUATION METRICS TruePositive


A key step in a predictive modeling pipeline is to evaluate the output of a R= (3)
machine-learning model. Although a model may have a higher classification TruePositive + FalseNegative
result once constructed, it must be determined whether it can address the
specific problem in different circumstances. Classification accuracy alone is D. F1-SCORE
usually insufficient to make this judgment. Other assessment metrics are
The model’s accuracy for each class is defined by the F1-score (F1). If the
necessary for proper evaluation. Since a promising method is required to pass
dataset is not balanced, the F1-score metric is typically used. The F1-score is
the assessment metric’s evaluation, it is easy to create a model, but it is more
often used as an assessment matrix in fake news detection [41], [157], [158].
challenging to create a promising strategy. Diverse evaluation metrics are used
to evaluate the model’s efficiency. The evaluation matrix is an essential device F1-score computation can be performed using Equation (4).
for arranging and organizing an evaluation. The confusion matrix shows an precision × recall
overview of model performance on the testing dataset from the known true
F1 = 2 × (4)
values. It provides a review of the model’s success and useful results of true
positive, true negative, false positive, and false negative. To test their models, precision + recall
researchers considered distinctive sorts of metrics such as accuracy (A), E. ROC CURVE AND AUC
precision (P), and recall (R) [40], [54], [58]. The selection of metrics relies
entirely on the model form and its implementation strategy. We provide some The Receiver Operating Characteristics (ROC) curve shows the success of a
evaluation metrics that were widely used in previous studies: classification model across several classification thresholds. True Positive Rate
(Recall) and False Positive Rate (FPR) are used in this curve. AUC is an
A. ACCURACY abbreviation for
The accuracy score, also known as the classification accuracy rating, is ‘‘Area Under the ROC curve.’’ In other words, AUC tests the
determined as the percentage of accurate predictions in proportion to the total
predictions made by the model. The accuracy (A) can be depicted by the given
formula in Equation (1).
TruePositive + TrueNegative
A= (1)
TotalNumberofPredictions
B. PRECISION
Precision (P) is defined as the number of actual positive findings divided by the
total number of positive results, including incorrectly recognized ones. The
precision can be computed using Equation (2).
TruePositive
P= (2) Positive + FalsePositive
C. RECALL
When the total number of samples that should have been identified as positive is
used to divide, the number of true positive results is referred to as recall (R). The
recall can be computed using Equation (3).
TABLE 7. The table contains the result in accuracy of DL-based studies along with used method and NLP techniques.

whole two-dimensional field under the entire ROC curve. The FPR can be works have taken this into account. We believe that studies that
defined as in Equation (5). concentrate on the selection of features and classifiers might potentially
FalsePositive improve performance.

FPR = (5) • The feature engineering concept is not common in deep learning-
based studies. News content and headline features are the widely used
FalsePositive + TrueNegative
features in fake news detection, but several other features such as user
VII. CHALLENGES AND RESEARCH DIRECTION behavior [154], user profile, and social network behavior need to be
Despite the fact that numerous studies have been conducted on the identification explored. Political or religious bias in profile features and lexical,
of fake news, there is always space for future advancement and investigation. In syntactic, and statistical-based features can increase the detection rate. A
the sense of recognizing fake news, we highlight challenges and several unique fusion of deeply hidden text features with other statistical features may
exploration areas for future studies. Although DL-based methods provide higher result in a better outcome.
accuracy compared to the other methods, there is scope to make it more • Propagation-based studies are scarce in this domain [117]. Network-
acceptable. based patterns of news propagation are a piece of information that has
• Thefeatureandclassifierselectiongreatlyinfluencesthe efficiency of not been comprehensively utilized for fake news detection [159]. Thus, we
the model. Previous studies did not place a high priority on the selection suggest considering news propagation for fake news identification. Meta-
of features and classifiers. Researchers should focus on determining data and additional information can increase the robustness and reduce
which classifier is most suitable for particular features. The long textual the noise of a single textual claim, but they must be handled with caution.
features require the use of sequence models (RNNs), but limited research
• Studies focused only on text data for fake news detection, whereas [2] T. Rasool, W. H. Butt, A. Shaukat, and M. U. Akram, ‘‘Multi-label
fake news is generated in sophisticated ways, with text or images that fake news detection using multi-layered supervised learning,’’ in Proc.
have been purposefully altered [95]. Only a few studies have used image 11th Int. Conf. Comput. Autom. Eng., 2019, pp. 73–77.
features[109],[110].Thus,werecommendtheuseofvisual data (videos and [3] X. Zhang and A. A. Ghorbani, ‘‘An overview of online fake news:
images). An examination with video and image features will be an Characterization, detection, and discussion,’’ Inf. Process. Manage., vol.
investigation region to build a stronger and more robust system. 57, no. 2, Mar. 2020, Art. no. 102025. [Online]. Available:
• Studies that use a fusion of features are scarce in this domain [160]. http://www.sciencedirect.com/science/article/pii/S0306457318306794
Combining information from multiple sources may be extremely [4] Abdullah-All-Tanvir, E. M. Mahir, S. Akhter, and M. R. Huq,
beneficial in detecting whether Internet articles are fake [95]. We suggest ‘‘Detecting fake news using machine learning and deep learning
utilizing multi-model-based approaches with later pretrained word algorithms,’’ in Proc. 7th Int. Conf. Smart Comput. Commun. (ICSCC),
embeddings. Many other hidden features may have a great impact on Jun. 2019, pp. 1–5.
fake news detection. Hence we encourage researchers to investigate
hidden features. • Fake news detection models that learn from newly [5] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, ‘‘Fake news detection
emerging web articles in real-time could enhance detection results. on social media: A data mining perspective,’’ ACM SIGKDD Explorations
Another promising future work is the use of a transfer-learning approach Newslett., vol. 19, no. 1, pp. 22–36, 2017.
for training a neural network with online data streams. [6] R. Oshikawa, J. Qian, and W. Y. Wang, ‘‘A survey on natural
• More data for a more significant number of fake news should be language processing for fake news detection,’’ 2018, arXiv:1811.00770.
released since the lack of data is the major problem in fake news [7] S. B. Parikh and P. K. Atrey, ‘‘Media-rich fake news detection: A
classification. We assume that more training data will improve model survey,’’ in Proc. IEEE Conf. Multimedia Inf. Process. Retr. (MIPR), Apr.
performance. 2018, pp. 436–441.
Datasets focused on news content are publicly available. On the other hand, [8] A. Habib, M. Z. Asghar, A. Khan, A. Habib, and A. Khan, ‘‘False
datasets based on different textual features are limited. Thus research utilizing information detection in online content and its role in decision making: A
additional textual features is scarce. systematic literature review,’’ Social Netw. Anal. Mining, vol. 9, no. 1, pp.
• Instead of a simple classifier, using an ensemble method produces 1–20, Dec. 2019.
better results [49]. By constructing an ensemble model with DL and ML [9] M. K. Elhadad, K. F. Li, and F. Gebali, ‘‘Fake news detection on
algorithms, in which an LSTM can identify the original article while social media: A systematic survey,’’ in Proc. IEEE Pacific Rim Conf.
passing auxiliary features through a second model can yield better results Commun., Comput. Signal Process. (PACRIM), Aug. 2019, pp. 1–8.
[41]. A simpler GRU model performs better than an LSTM [80].
[10] A. Bondielli and F. Marcelloni, ‘‘A survey on fake news and rumour
Therefore, we recommend combining GRU and CNNs to urge the leading
detection techniques,’’ Inf. Sci., vol. 497, pp. 38–55, Sep. 2019. [Online].
result.
Available: http://www.sciencedirect.
• Many researchers have achieved high accuracy by using CNN, com/science/article/pii/S0020025519304372
LSTM, and ensemble models [42], [64]. SeqGAN and Deep Belief [11] P. Meel and D. K. Vishwakarma, ‘‘Fake news, rumor, information
Network (DBN) were not explored in this domain. We encourage pollution in social media and web: A contemporary survey of state-of-the-
researchers to experiment with these models. arts, challenges and opportunities,’’ Expert Syst. Appl., vol. 153, Sep. 2020,
• Transformers have replaced RNN models such as LSTM as the Art. no. 112986.
model of choice for NLP tasks. BERT has been used in the identification [12] K. Sharma, F. Qian, H. Jiang, N. Ruchansky, M. Zhang, and Y. Liu,
of fake news, but Generative Pre-trained Transformer (GPT) has not ‘‘Combating fake news: A survey on identification and mitigation
been used in this domain. We suggest using GPT by fine-tuning fake news techniques,’’ ACM Trans. Intell. Syst. Technol., vol. 10, no. 3, pp. 1–42,
detection tasks. May 2019.
• Existing algorithms make critical decisions without providing [13] X. Zhou and R. Zafarani, ‘‘A survey of fake news: Fundamental
precise information about the reasoning that results in specific decisions, theories,
predictions, recommendations, or actions [161]. Explainable Artificial detectionmethods,andopportunities,’’ACMComput.Surv.,vol.53,no.5, pp.
Intelligence (XAI) is a study field that tries to make the outcomes of AI 1–40, 2020.
systems more understandable to humans [162]. XAI can be a valuable
approach to start making progress in this area. [14] B. Collins, D. T. Hoang, N. T. Nguyen, and D. Hwang, ‘‘Trends in
combating fake news on social media—A survey,’’ J. Inf. Telecommun.,
VIII. CONCLUSION vol. 5, no. 2, pp. 247–266, 2021.
Fake news is escalating as social media is growing. [15] A. Zubiaga, A. Aker, K. Bontcheva, M. Liakata, and R. Procter,
Researchers are also trying their best to find solutions to keep society safe from ‘‘Detection and resolution of rumours in social media: A survey,’’ ACM
fake news. This survey covers the overall analysis of fake news classification by Comput. Surveys, vol. 51, no. 2, pp. 1–36, Jun. 2018.
discussing major studies. A thorough understanding of recent approaches in fake
[16] M. D. Ibrishimova and K. F. Li, ‘‘A machine learning approach to
news detection is essential because advanced frameworks are the front-runners
fake news detection using knowledge verification and natural language
in this domain. Thus, we analyzed fake news identification methods based on
processing,’’ in Proc. Int. Conf. Intell. Netw. Collaborative Syst. Cham,
NLP and advanced DL strategies. We presented a taxonomy of fake news
Switzerland: Springer, 2019, pp. 223–234.
detection approaches. We explored different NLP techniques and DL
architectures and provided their strength and shortcomings. We have explored [17] H. Ahmed, I. Traore, and S. Saad, ‘‘Detecting opinion spams and
diverse assessment measurements. We have given a short description of the fake news using text classification,’’ Secur. Privacy, vol. 1, no. 1, p. e9,
experimental findings of previous studies. In this field, we briefly outlined Jan. 2018.
possible directions for future research. Fake news identification will remain an [18] H. Ahmed, I. Traore, and S. Saad, ‘‘Detection of online fake news
active research field for some time with the emergence of novel deep learning using N-gram analysis and machine learning techniques,’’ in Proc. Int.
network architectures. There are fewer chances of inaccurate results using deep Conf. Intell., Secure, Dependable Syst. Distrib. Cloud Environ.
learning-based models. We strongly believe that this review will assist Switzerland: Springer, 2017, pp. 127–138.
researchers in fake news detection to gain a better, concise perspective of
existing problems, solutions, and future directions. [19] B. Bhutani, N. Rastogi, P. Sehgal, and A. Purwar, ‘‘Fake news
detection using sentiment analysis,’’ in Proc. 12th Int. Conf. Contemp.
ACKNOWLEDGMENT Comput. (IC), Aug. 2019, pp. 1–5.
The authors would like to thank the Advanced Machine Learning (AML) Lab [20] C. Castillo, M. Mendoza, and B. Poblete, ‘‘Information credibility
for resource sharing and precious opinions. on Twitter,’’ in Proc. 20th Int. Conf. World Wide Web, Mar. 2011, pp. 675–
REFERENCES 684, doi: 10.1145/1963405.1963500.
[1] H. Allcott and M. Gentzkow, ‘‘Social media and fake news in the
2016 election,’’ J. Econ. Perspect., vol. 31, no. 2, pp. 36–211, 2017.
[21] O. Ajao, D. Bhowmik, and S. Zargari, ‘‘Sentiment aware fake news [40] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, ‘‘FNDNet— A
detection on online social networks,’’ in Proc. IEEE Int. Conf. Acoust., deep convolutional neural network for fake news detection,’’ Cognit. Syst.
Speech Signal Process. (ICASSP), May 2019, pp. 2507–2511. Res., vol. 61, pp. 32–44, Jun. 2020.[Online]. Available:
[22] B. Ghanem, P. Rosso, and F. Rangel, ‘‘An emotional analysis of false http://www.sciencedirect.com/science/article/pii/S1389041720300085
information in social media and news articles,’’ ACM Trans. Internet [41] S. Deepak and B. Chitturi, ‘‘Deep neural approach to Fake-News
Technol., vol. 20, no. 2, pp. 1–18, May 2020. identification,’’ Proc. Comput. Sci., vol. 167, pp. 2236–2243, Jan. 2020.
[23] A. Giachanou, P. Rosso, and F. Crestani, ‘‘Leveraging emotional [Online]. Available: http://www.sciencedirect.
signals for credibility detection,’’ in Proc. 42nd Int. ACM SIGIR Conf. com/science/article/pii/S1877050920307420
Res. Develop. Inf. Retr., Jul. 2019, pp. 877–880. [42] M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi, and B.-W.
[24] D. Khattar, J. S. Goud, M. Gupta, and V. Varma, ‘‘MVAE: On, ‘‘Fake news stance detection using deep learning architecture
Multimodal variational autoencoder for fake news detection,’’ in Proc. (CNNLSTM),’’ IEEE Access, vol. 8, pp. 156695–156706, 2020.
World Wide Web Conf., May 2019, pp. 2915–2921. [43] N. Aslam, I. U. Khan, F. S. Alotaibi, L. A. Aldaej, and A. K.
[25] N. J. Conroy, V. L. Rubin, and Y. Chen, ‘‘Automatic deception Aldubaikil, ‘‘Fake detect: A deep learning ensemble model for fake news
detection: Methods for finding fake news,’’ in Proc. 78th ASIST Annu. detection,’’ Complexity, vol. 2021, pp. 1–8, Apr. 2021.
Meeting, Inf. Sci. Impact, Res. Community, vol. 52, no. 1, pp. 1–4, 2015. [44] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su, and J. Gao,
[26] A. R. Pathak, A. Mahajan, K. Singh, A. Patil, and A. Nair, ‘‘Analysis ‘‘EANN: Event adversarial neural networks for multi-modal fake news
of techniques for rumor detection in social media,’’ Proc. Comput. Sci., detection,’’ in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data
vol. 167, pp. 2286–2296, Jan. 2020. Mining, Jul. 2018, pp. 849–857.
[27] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong, and M. [45] A. Alsaeedi and M. Al-Sarem, ‘‘Detecting rumors on social media
Cha, ‘‘Detecting rumors from microblogs with recurrent neural based on a CNN deep learning technique,’’ Arabian J. Sci. Eng., vol. 45,
networks,’’ in Proc. 25th Int. Joint Conf. Artif. Intell. (IJCAI). Res. no. 12, pp. 1–32, 2020.
Collection School Comput. Inf. Syst., 2016, pp. 3818–3824. [46] A. Thota, P. Tilak, S. Ahluwalia, and N. Lohia, ‘‘Fake news
[28] J. Ma, W. Gao, and K.-F. Wong, ‘‘Detect rumors in microblog posts detection: A deep learning approach,’’ SMU Data Sci. Rev., vol. 1, no. 3, p.
using propagation structure via kernel learning,’’ in Proc. 55th Annu. 10, 2018.
Meeting Assoc. Comput. Linguistics (ACL). Vancouver, BC, Canada: Res. [47] Z. Jin, J. Cao, H. Guo, Y. Zhang, and J. Luo, ‘‘Multimodal fusion
Collection School Comput. Inf. Syst., Jul./Aug. 2017, pp. 708–717. with recurrent neural networks for rumor detection on microblogs,’’ in
[29] W. Y. Wang, ‘‘‘Liar, liar pants on fire’: A new benchmark dataset for Proc. 25th ACM Int. Conf. Multimedia, Oct. 2017, pp. 795–816.
fake newsdetection,’’inProc.55thAnnu.MeetingAssoc.Comput.Linguistics, [48] R. R. Mandical, N. Mamatha, N. Shivakumar, R. Monica, and A. N.
Vancouver, BC, Canada, Jul. 2017, pp. 422–426. [Online]. Available: Krishna, ‘‘Identification of fake news using machine learning,’’ in Proc.
https://www.aclweb.org/anthology/P17-2067 IEEE Int. Conf. Electron., Comput. Commun. Technol. (CONECCT), Jul.
[30] A. Zubiaga, M. Liakata, and R. Procter, ‘‘Learning reporting 2020, pp. 1–6.
dynamics during breaking news for rumour detection in social media,’’ [49] S. S. Jadhav and S. D. Thepade, ‘‘Fake news identification and
2016, arXiv:1610.07363. classification using DSSM and improved recurrent neural network
[31] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, classifier,’’ Appl. Artif. Intell., vol. 33, no. 12, pp. 1058–1068, Oct. 2019,
‘‘FakeNewsNet: A data repository with news content, social context, and doi: 10.1080/08839514.2019.1661579.
spatiotemporal information for studying fake news on social media,’’ Big [50] A. S. K. Shu, D. M. K. Shu, L. G. M. Mittal, L. G. M. Mittal, and M.
Data, vol. 8, no. 3, pp. 171–188, Jun. 2020. M. J. K. Sethi, ‘‘Fake news detection using a blend of neural networks:
[32] M. Amjad, G. Sidorov, A. Zhila, H. Gómez-Adorno, I. Voronkov, An application of deep learning,’’ Social Netw. Comput. Sci., vol. 1, no. 3,
and A. Gelbukh, ‘‘‘Bend the truth’: Benchmark dataset for fake news pp. 1–9, Jan. 1970. [Online]. Available: https://link.springer.
detection in Urdu language and its evaluation,’’ J. Intell. Fuzzy Syst., vol. com/article/10.1007/s42979-020-00165-4
39, no. 2, pp. 2457–2469, 2020. [51] A. P. S. Bali, M. Fernandes, S. Choubey, and M. Goel,
[33] E. Tacchini, G. Ballarin, M. L. Della Vedova, S. Moret, and L. de ‘‘Comparative performance of machine learning algorithms for fake
Alfaro, ‘‘Some like it hoax: Automated fake news detection in social news detection,’’ in Proc. Int. Conf. Adv. Comput. Data Sci. Switzerland:
networks,’’ 2017, arXiv:1704.07506. Springer, 2019, pp. 420–430.
[34] C. Boididou, S. Papadopoulos, and M. Zampoglou, ‘‘Detection and [52] A. Rusli, J. C. Young, and N. M. S. Iswari, ‘‘Identifying fake news in
visualization of misleading content,’’ Int. J. Multimedia Inf. Retr., vol. 7, Indonesian via supervised binary text classification,’’ in Proc. IEEE Int.
no. 1, pp. 71–86, 2018. Conf. Ind. 4.0, Artif. Intell., Commun. Technol. (IAICT), Jul. 2020, pp. 86–
90.
[35] J. Golbeck, M. Mauriello, B. Auxier, K. H. Bhanushali, C. Bonk, M.
A. Bouzaghrane,C.Buntain,R.Chanduka,P.Cheakalos,J.B.Everett, and W. [53] V. Tiwari, R. G. Lennon, and T. Dowling, ‘‘Not everything you read
Falak, ‘‘Fake news vs satire: A dataset and analysis,’’ in Proc. 10th ACM is true! Fake news detection using machine learning algorithms,’’ in Proc.
Conf. Web Sci., 2018, pp. 17–21. 31st Irish Signals Syst. Conf. (ISSC), Jun. 2020, pp. 1–4.
[36] P. M. Waszak, W. Kasprzycka-Waszak, and A. Kubanek, ‘‘The [54] A. Verma, V. Mittal, and S. Dawn, ‘‘FIND: Fake information and
spread of medical fake news in social media—The pilot quantitative news detections using deep learning,’’ in Proc. 12th Int. Conf. Contemp.
study,’’ Health Policy Technol., vol. 7, no. 2, pp. 115–118, Jun. 2018. Comput. (IC), Aug. 2019, pp. 1–7.
[37] (2020). The Year of Fake News Covid Related Scams and [55] M. Z. Hossain, M. A. Rahman, M. S. Islam, and S. Kar,
Ransomware. Accessed: Mar. 12, 2021. [Online]. Available: https://www. ‘‘BanFakeNews: A dataset for detecting fake news in Bangla,’’ in Proc.
12th Lang. Resour. Eval. Conf. Marseille, France: European Language
prnewswire.com/news-releases/2020-the-year-of-fake-news-covidrelated-scams-
Resources Association, May 2020, pp. 2862–2871. [Online]. Available:
and-ransomware-301180568
https://www.aclweb.org/anthology/2020.lrec-1.349
[38] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu,
‘‘FakeNewsNet: A data repository with news content, social context and
[56] P. Savyan and S. M. S. Bhanu, ‘‘UbCadet: Detection of
compromised accounts in Twitter based on user behavioural profiling,’’
spatialtemporal information for studying fake news on social media,’’
Multimedia Tools Appl., vol. 79, pp. 1–37, Jul. 2020.
2018, arXiv:1809.01286.
[39] Y.-C. Ahn and C.-S. Jeong, ‘‘Natural language contents evaluation [57] J. Kapusta and J. Obonya, ‘‘Improvement of misleading and fake
news classification for flective languages by morphological group
system for detecting fake news using deep learning,’’ inProc. 16th Int.
analysis,’’ in Informatics, vol. 7, no. 1. Switzerland: Multidisciplinary
Joint Conf. Comput. Sci. Softw. Eng. (JCSSE), Jul. 2019, pp. 289–292.
Digital Publishing Institute, 2020, p. 4.
[58] S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. IberoAmer. Conf. Artif. Intell., Nov. 2018, pp. 206–216. [Online].
Maddikunta, and W. Z. Khan, ‘‘An ensemble machine learning approach Available: https://link.springer.com/chapter/10.1007/978-3-030-03928-
through effective feature extraction to classify fake news,’’ Future Gener. 8_17
Comput. Syst., vol. 117, pp. 47–58, Apr. 2021. [Online]. Available: [78] C. K. Hiramath and G. C. Deshpande, ‘‘Fake news detection using
https://www.sciencedirect.com/science/article/pii/S0167739X20330466 deep learning techniques,’’ in Proc. 1st Int. Conf. Adv. Inf. Technol.
[59] M. G. Hussain, M. Rashidul Hasan, M. Rahman, J. Protim, and S. (ICAIT), Jul. 2019, pp. 411–415.
A. Hasan, ‘‘Detection of Bangla fake news using MNB and SVM [79] A. P. B. Veyseh, M. T. Thai, T. H. Nguyen, and D. Dou, ‘‘Rumor
classifier,’’ in Proc. Int. Conf. Comput., Electron. Commun. Eng. detection in social networks via deep contextual modeling,’’ in Proc.
(iCCECE), Aug. 2020, pp. 81–85. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining, Aug. 2019, pp. 113–
[60] G. Gravanis, A. Vakali, K. Diamantaras, and P. Karadais, ‘‘Behind 120.
the cues: A benchmarking study for fake news detection,’’ Expert Syst. [80] M. Bugueño, G. Sepulveda, and M. Mendoza, ‘‘An empirical
Appl., vol. 128, pp. 201–213, Aug. 2019. analysis of rumor detection on microblogs with recurrent neural
[61] P. Bahad, P. Saxena, and R. Kamal, ‘‘Fake news detection using bi- networks,’’ in Proc. Int. Conf. Hum.-Comput. Interact., Jul. 2019, pp. 293–
directional LSTM-recurrent neural network,’’ Proc. Comput. Sci., vol. 310. [Online].
165, pp. 74–82, Jan. 2019. [Online]. Available: Available: https://link.springer.com/chapter/10.1007/978-3-030-219024_21
http://www.sciencedirect.com/science/article/pii/S1877050920300806
[81] E. Providel and M. Mendoza, ‘‘Using deep learning to detect rumors
[62] E. Qawasmeh, M. Tawalbeh, and M. Abdullah, ‘‘Automatic in Twitter,’’ in Proc. Int. Conf. Hum.-Comput. Interact. Switzerland:
identification of fake news using deep learning,’’ in Proc. 6th Int. Conf. Springer, 2020, pp. 321–334.
Social Netw. Anal., Manage. Secur. (SNAMS), Oct. 2019, pp. 383–388.
[82] Q. Le and T. Mikolov, ‘‘Distributed representations of sentences and
[63] A. Agarwal and A. Dixit, ‘‘Fake news detection: An ensemble documents,’’ in Proc. Int. Conf. Mach. Learn., 2014, pp. 1188–1196.
learning approach,’’ in Proc. 4th Int. Conf. Intell. Comput. Control Syst.
(ICICCS), May 2020, pp. 1178–1183. [83] S. Sangamnerkar, R. Srinivasan, M. R. Christhuraj, and R.
Sukumaran, ‘‘An ensemble technique to detect fabricated news article
[64] S. M. Padnekar, G. S. Kumar, and P. Deepak, ‘‘BiLSTM- using machine learning and natural language processing techniques,’’ in
autoencoder architecture for stance prediction,’’ in Proc. Int. Conf. Data Proc. Int. Conf.
Sci. Eng. (ICDSE), Dec. 2020, pp. 1–5.
Emerg. Technol. (INCET), Jun. 2020, pp. 1–7.
[65] M. Granik and V. Mesyura, ‘‘Fake news detection using naive Bayes
classifier,’’inProc.IEEE1stUkraineConf.Electr.Comput.Eng.(UKRCON), [84] S. Helmstetter and H. Paulheim, ‘‘Weakly supervised learning for
May 2017, pp. 900–903. fake news detection on Twitter,’’ in Proc. IEEE/ACM Int. Conf. Adv.
Social Netw. Anal. Mining (ASONAM), Aug. 2018, pp. 274–277.
[66] A.JainandA.Kasbe,‘‘Fake newsdetection,’’inProc.IEEEInt.Students’
Conf. Electr., Electron. Comput. Sci. (SCEECS), 2018, pp. 1–5. [85] J. Pennington, R. Socher, and C. Manning, ‘‘GloVe: Global vectors
for word representation,’’ in Proc. Conf. Empirical Methods Natural
[67] R. K. Kaliyar, ‘‘Fake news detection using a deep neural network,’’ Lang. Process. (EMNLP), 2014, pp. 1532–1543.
in Proc. 4th Int. Conf. Comput. Commun. Autom. (ICCCA), Dec. 2018, pp.
1–7. [86] S.Kumar,R.Asthana,S.Upadhyay,N.Upreti,andM.Akbar,‘‘Fakenews
detection using deep learning models: A novel approach,’’ Trans. Emerg.
[68] G. Bhatt, A. Sharma, S. Sharma, A. Nagpal, B. Raman, and A. Telecommun. Technol., vol. 31, no. 2, p. e3767, Feb. 2020. [Online].
Mittal, ‘‘Combining neural, statistical and external features for fake Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ett.3767
news stance identification,’’ in Proc. Companion The Web Conf. Web
Conf. (WWW), 2018, pp. 1353–1357, doi: 10.1145/3184558.3191577. [87] S. Singhania, N. Fernandez, and S. Rao, ‘‘3HAN: A deep neural
network for fake news detection,’’ in Proc. Int. Conf. Neural Inf. Process.
[69] F. A. Ozbay and B. Alatas, ‘‘Fake news detection within online social Switzerland: Springer, 2017, pp. 572–581.
media using supervised artificial intelligence algorithms,’’ Phys. A, Stat.
Mech. Appl., vol. 540, Feb. 2020, Art. no. 123174. [88] J. A. Nasir, O. S. Khan, and I. Varlamis, ‘‘Fake news detection: A
hybrid CNN-RNN based deep learning approach,’’ Int. J. Inf. Manage.
[70] B. Al-Ahmad, A. M. Al-Zoubi, R. A. Khurma, and I. Aljarah, ‘‘An Data Insights, vol. 1, no. 1, Apr. 2021, Art. no. 100007.
evolutionary fake news detection method for COVID-19 pandemic
information,’’ Symmetry, vol. 13, no. 6, p. 1091, Jun. 2021. [89] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-
training of deep bidirectional transformers for language understanding,’’
[71] S. Shabani and M. Sokhn, ‘‘Hybrid machine-crowd approach for 2018, arXiv:1810.04805.
fake news detection,’’ in Proc. IEEE 4th Int. Conf. Collaboration Internet
Comput. (CIC), Oct. 2018, pp. 299–306. [90] S. Kula, M. Choraś, and R. Kozik, ‘‘Application of the bert-based
architecture in fake news detection,’’ in Proc. Comput. Intell. Secur. Inf.
[72] C. M. M. Kotteti, X. Dong, N. Li, and L. Qian, ‘‘Fake news detection Syst. Conf. Switzerland: Springer, 2019, pp. 239–249.
enhancement with data imputation,’’ in Proc. IEEE 16th Int. Conf.
Dependable, Autonomic Secure Comput., 16th Int. Conf. Pervasive Intell. [91] T. Zhang, D. Wang, H. Chen, Z. Zeng, W. Guo, C. Miao, and L. Cui,
Comput., 4th Int. Conf. Big Data Intell. Comput. Cyber Sci.Technol.Congr. ‘‘BDANN: BERT-based domain adaptation neural network for
(DASC/PiCom/DataCom/CyberSciTech),Aug.2018, pp. 187–192. multimodal fake news detection,’’ in Proc. Int. Joint Conf. Neural Netw.
(IJCNN), Jul. 2020, pp. 1–8.
[73] X. Zhou, A. Jain, V. V. Phoha, and R. Zafarani, ‘‘Fake news early
detection: A theory-driven model,’’ Digit. Threats, Res. Pract., vol. 1, no. [92] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘FakeBERT: Fake news
2, pp. 1–25, Jul. 2020. detection in social media with a BERT-based deep learning approach,’’
Multimedia Tools Appl., vol. 80, no. 8, pp. 11765–11788, Mar. 2021.
[74] P. H. A. Faustini and T. F. Covões, ‘‘Fake news detection in multiple
platforms and languages,’’ Expert Syst. Appl., vol. 158, Nov. 2020, Art. no. [93] W. Shishah, ‘‘Fake news detection using BERT model with joint
113503. learning,’’ Arabian J. Sci. Eng., vol. 46, pp. 1–13, Jun. 2021.

[75] H. Jwa, D. Oh, K. Park, J. Kang, and H. Lim, ‘‘ExBAKE: [94] H. Yuan, J. Zheng, Q. Ye, Y. Qian, and Y. Zhang, ‘‘Improving fake
Automatic fake news detection model based on bidirectional encoder news detection with domain-adversarial and graph-attention neural
representations from transformers (BERT),’’ Appl. Sci., vol. 9, no. 19, p. network,’’ Decis. Support Syst., vol. 151, Dec. 2021, Art. no. 113633.
4062, Sep. 2019. [95] A. Giachanou, G. Zhang, and P. Rosso, ‘‘Multimodal multi-image
[76] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, fake news detection,’’ in Proc. IEEE 7th Int. Conf. Data Sci. Adv. Anal.
‘‘Distributed representations of words and phrases and their (DSAA), Oct. 2020, pp. 647–654.
compositionality,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 3111– [96] S. Girgis, E. Amer, and M. Gadallah, ‘‘Deep learning algorithms for
3119. detecting fake news in online text,’’ in Proc. 13th Int. Conf. Comput. Eng.
[77] F. C. Fernández-Reyes and S. Shinde, ‘‘Evaluating deep neural Syst. (ICCES), Dec. 2018, pp. 93–97.
networks for automatic fake news detection in political domain,’’ in Proc.
[97] H. Reddy, N. Raj, M. Gala, and A. Basava, ‘‘Text-mining-based fake [116] T. Saikh, A. Anand, A. Ekbal, and P. Bhattacharyya, ‘‘A
news detection using ensemble methods,’’ Int. J. Autom. Comput., vol. 17, novel approach towards fake news detection: Deep learning augmented
pp. 1–12, Apr. 2020. with textual entailment features,’’ in Proc. Int. Conf. Appl. Natural Lang.
[98] K. Shu, S. Wang, and H. Liu, ‘‘Understanding user profiles on social Inf. Syst. Switzerland: Springer, 2019, pp. 345–358.
media for fake news detection,’’ in Proc. IEEE Conf. Multimedia Inf. [117] L. Wu and H. Liu, ‘‘Tracing fake-news footprints:
Process. Retr. (MIPR), Apr. 2018, pp. 430–435. Characterizing social media messages by how they propagate,’’ in Proc.
[99] M. L. Della Vedova, E. Tacchini, S. Moret, G. Ballarin, M. DiPierro, 11th ACM Int. Conf. Web Search Data Mining, Feb. 2018, pp. 637–645.
and L. de Alfaro, ‘‘Automatic online fake news detection combining [118] K. Shu, S. Wang, and H. Liu, ‘‘Beyond news contents:
content and social signals,’’ in Proc. 22nd Conf. Open Innov. Assoc. The role of social context for fake news detection,’’ in Proc. 12th ACM
(FRUCT), May 2018, pp. 272–279. Int. Conf. Web Search Data Mining, Jan. 2019, pp. 312–320.
[100] K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, [119] F. Monti, F. Frasca, D. Eynard, D. Mannion, and M. M.
‘‘DEFEND: Explainable fake news detection,’’ in Proc. 25th ACM Bronstein, ‘‘Fake news detection on social media using geometric deep
SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2019, pp. 395–405. learning,’’ 2019, arXiv:1902.06673.
[101] M. Potthast, J. Kiesel, K. Reinartz, J. Bevendorff, and [120] M. Albahar, ‘‘A hybrid model for fake news detection:
B. Stein, ‘‘A stylometric inquiry into hyperpartisan and fake news,’’ 2017, Leveraging news content and user comments in fake news,’’ IET Inf.
arXiv:1702.05638. Secur., vol. 15, no. 2, pp. 169–177, Mar. 2021.
[102] X. Zhang, J. Cao, X. Li, Q. Sheng, L. Zhong, and K. [121] B. Al Asaad and M. Erascu, ‘‘A tool for fake news
Shu, ‘‘Mining dual emotion for fake news detection,’’ 2019, detection,’’ in Proc. 20th Int. Symp. Symbolic Numeric Algorithms Sci.
arXiv:1903.01728. Comput. (SYNASC), Sep. 2018, pp. 379–386.
[103] S. Hosseinimotlagh and E. E. Papalexakis, [122] S. Aphiwongsophon and P. Chongstitvatana, ‘‘Detecting
‘‘Unsupervised contentbased identification of fake news articles with fake news with machine learning method,’’ in Proc. 15th Int. Conf. Electr.
tensor decomposition ensembles,’’ in Proc. Workshop Misinformation Eng., Electron., Comput., Telecommun. Inf. Technol. (ECTI-CON), Jul.
Misbehavior Mining Web (MIS), 2018, pp. 1–8. 2018, pp. 528–531.
[104] R. K. Kaliyar, A. Goswami, and P. Narang, ‘‘DeepFakE: [123] N. Ruchansky, S. Seo, and Y. Liu, ‘‘CSI: A hybrid deep
Improving fake news detection using tensor decomposition-based deep model for fake news detection,’’ in Proc. ACM Conf. Inf. Knowl. Manage.,
neural network,’’ J. Supercomput., vol. 77, no. 2, pp. 1015–1037, Feb. New York, NY, USA, Nov. 2017, pp. 797–806, doi:
2021. 10.1145/3132847.3132877.
[105] R. K. Kaliyar, A. Goswami, and P. Narang, [124] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, and P. S. Yu,
‘‘EchoFakeD: Improving fake news detection in social media with an ‘‘TICNN: Convolutional neural networks for fake news detection,’’
efficient deep neural network,’’ Neural Comput. Appl., vol. 33, pp. 1–17, CoRR, vol. abs/1806.00749, pp. 1–11, Jun. 2018.
Jan. 2021. [125] T. O’Shea and J. Hoydis, ‘‘An introduction to deep
[106] M. Dong, L. Yao, X. Wang, B. Benatallah, Q. Z. Sheng, learning for the physical layer,’’ IEEE Trans. Cogn. Commun. Netw., vol.
and H. Huang, ‘‘DUAL: A deep unified attention model with latent 3, no. 4, pp. 563–575, Dec. 2017.
relation representations for fake news detection,’’ in Proc. Int. Conf. Web [126] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé,
Inf. Syst. Eng.
‘‘Mobile encrypted traffic classification using deep learning:
Switzerland: Springer, 2018, pp. 199–209. Experimental evaluation, lessons learned, and challenges,’’ IEEE Trans.
[107] J. Zhang, B. Dong, and P. S. Yu, ‘‘FakeDetector: Netw. Service Manag., vol. 16, no. 2, pp. 445–458, Feb. 2019.
Effective fake news detection with deep diffusive neural network,’’ in [127] P. Yildirim and D. Birant, ‘‘The relative performance of
Proc. IEEE 36th Int. Conf. Data Eng. (ICDE), Apr. 2020, pp. 1826–1829. deep learning and ensemble learning for textile object classification,’’ in
[108] H. Karimi, P. Roy, S. Saba-Sadiya, and J. Tang, ‘‘Multi- Proc. 3rd Int. Conf. Comput. Sci. Eng. (UBMK), Sep. 2018, pp. 22–26.
source multi-class fake news detection,’’ in Proc. 27th Int. Conf. Comput. [128] D. Shen, G. Wu, and H. Suk, ‘‘Deep learning in medical
Linguistics, 2018, pp. 1546–1557. image analysis,’’ Annu. Rev. Biomed. Eng., vol. 19, pp. 221–248, Jun.
[109] D. Mangal and D. K. Sharma, ‘‘Fake news detection 2017.
with integration of embedded text cues and image features,’’ in Proc. 8th [129] M. Veres and M. Moussa, ‘‘Deep learning for intelligent
Int. Conf. Rel., INFOCOM Technol. Optim., Trends Future Directions transportation systems: A survey of emerging trends,’’ IEEE Trans.
(ICRITO), Jun. 2020, pp. 68–72. Intell. Transp. Syst., vol. 21, no. 8, pp. 3152–3168, Aug. 2020.
[110] P. Qi, J. Cao, T. Yang, J. Guo, and J. Li, ‘‘Exploiting [130] U. Kamath, J. Liu, and J. Whitaker, Deep Learning for
multi-domain visual information for fake news detection,’’ in Proc. IEEE NLP and Speech Recognition, vol. 84. Switzerland: Springer, 2019.
Int. Conf. Data Mining (ICDM), Nov. 2019, pp. 518–527.
[131] B. M. Amine, A. Drif, and S. Giordano, ‘‘Merging deep
[111] K. Shu, X. Zhou, S. Wang, R. Zafarani, and H. Liu, learning model for fake news detection,’’ in Proc. Int. Conf. Adv. Electr.
‘‘The role of user profiles for fake news detection,’’ in Proc. IEEE/ACM Eng. (ICAEE), Nov. 2019, pp. 1–4.
Int. Conf. Adv. Social Netw. Anal. Mining, Aug. 2019, pp. 436–439.
[132] Q. Li, Q. Hu, Y. Lu, Y. Yang, and J. Cheng, ‘‘Multi-level
[112] H. Guo, J. Cao, Y. Zhang, J. Guo, and J. Li, ‘‘Rumor word features based on CNN for fake news detection in cultural
detection with hierarchical social attention network,’’ in Proc. 27th ACM communication,’’ Pers. Ubiquitous Comput., vol. 24, no. 2, pp. 1–14, 2019.
Int. Conf. Inf. Knowl. Manage., Oct. 2018, pp. 943–951.
[133] O. Ajao, D. Bhowmik, and S. Zargari, ‘‘Fake news
[113] J. C. S. Reis, A. Correia, F. Murai, A. Veloso, and F. identification on Twitter with hybrid CNN and RNN models,’’ in Proc.
Benevenuto, ‘‘Explainable machine learning for fake news detection,’’ in 9th Int. Conf. Social Media Soc., New York, NY, USA, Jul. 2018, pp. 226–
Proc. 10th ACM Conf. Web Sci. (WebSci), New York, NY, USA, 2019, pp. 230, doi: 10.1145/3217804.3217917.
17–26, doi: 10.1145/3292522.3326027.
[134] L. Li, G. Cai, and N. Chen, ‘‘A rumor events detection
[114] J. Kim, B. Tabibian, A. Oh, B. Schölkopf, and M. method based on deep bidirectional GRU neural network,’’ in Proc.
Gomez-Rodriguez, ‘‘Leveraging the crowd to detect and reduce the IEEE 3rd Int. Conf. Image, Vis. Comput., Jun. 2018, pp. 755–759.
spread of fake news and misinformation,’’ in Proc. 11th ACM Int. Conf.
[135] M. Z. Asghar, A. Habib, A. Habib, A. Khan, R. Ali, and
Web Search Data Mining, Feb. 2018, pp. 324–332.
A. Khattak, ‘‘Exploring deep neural networks for rumor detection,’’ J.
[115] K. Popat, S. Mukherjee, A. Yates, and G. Weikum, Ambient Intell. Humanized Comput., vol. 12, no. 4, pp. 1–19, 2019.
‘‘DeClarE: Debunking fake news and false claims using evidence-aware
deep learning,’’ 2018, arXiv:1809.06416.
[136] S. R. Sahoo and B. B. Gupta, ‘‘Multiple features based [153] Y.-J. Lu and C.-T. Li, ‘‘GCAN: Graph-aware co-
approach for automatic fake news detection on social networks using attention networks for explainable fake news detection on social media,’’
deep learning,’’ Appl. Soft Comput., vol. 100, Mar. 2021, Art. no. 106983. 2020, arXiv:2004.11648.
[137] Q. Liao, H. Chai, H. Han, X. Zhang, X. Wang, W. Xia, [154] J. Ding, Y. Hu, and H. Chang, ‘‘BERT-based mental
and model, a better fake news detector,’’ in Proc. 6th Int. Conf. Comput. Artif.
Y. Ding, ‘‘An integrated multi-task model for fake news detection,’’ IEEE Trans. Intell., New York, NY, USA, Apr. 2020, pp. 396–400, doi:
Knowl. Data Eng., early access, Jan. 28, 2021, doi: 10.1109/TKDE.2021.3054993. 10.1145/3404555.3404607.

[138] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and [155] L. Wu, Y. Rao, H. Yu, Y. Wang, and A. Nazir, ‘‘False
G. Monfardini, ‘‘The graph neural network model,’’ IEEE Trans. Neural information detection on social media via a hybrid deep model,’’ in Proc.
Netw., vol. 20, no. 1, pp. 61–80, Jan. 2008. Int. Conf. Social Inform., Sep. 2018, pp. 323–333, doi: 10.1007/978-3-030-
011598_31.
[139] T. Bian, X. Xiao, T. Xu, P. Zhao, W. Huang, Y. Rong,
and A. Huang, ‘‘Rumor detection on social media with bi-directional [156] A. Choudhary and A. Arora, ‘‘Linguistic feature based
graph convolutional networks,’’ in Proc. AAAI Conf. Artif. Intell., 2020, learning model for fake news detection and classification,’’ Expert Syst.
vol. 34, no. 1, pp. 549–556. Appl., vol. 169, May 2021, Art. no. 114171.

[140] Q. Huang, C. Zhou, J. Wu, M. Wang, and B. Wang, [157] D. K. Vishwakarma, D. Varshney, and A. Yadav,
‘‘Deep structure learning for rumor detection on Twitter,’’ in Proc. Int. ‘‘Detection and veracity analysis of fake news via scrapping and
Joint Conf. Neural Netw. (IJCNN), Jul. 2019, pp. 1–8. authenticating the web search,’’ Cognit. Syst. Res., vol. 58, pp. 217–229,
Dec. 2019.
[141] Y. Rong, W. Huang, T. Xu, and J. Huang, ‘‘DropEdge:
Towards deep graph convolutional networks on node classification,’’ [158] Z. Jin, J. Cao, Y. Zhang, and J. Luo, ‘‘News verification
2019, arXiv:1907.10903. by exploiting conflicting social viewpoints in microblogs,’’ in Proc. 13th
AAAI Conf. Artif. Intell. (AAAI), 2016, pp. 2972–2978.
[142] Y. Ren, B. Wang, J. Zhang, and Y. Chang, ‘‘Adversarial
active learning based heterogeneous graph neural network for fake news [159] X. Zhou and R. Zafarani, ‘‘Fake news detection: An
detection,’’ in Proc. IEEE Int. Conf. Data Mining (ICDM), Nov. 2020, pp. interdisciplinary research,’’ in Proc. Companion World Wide Web Conf.,
452–461. May 2019, p. 1292.

[143] Z. Wu, D. Pi, J. Chen, M. Xie, and J. Cao, ‘‘Rumor [160] R. Kumari and A. Ekbal, ‘‘AMFB: Attention based
detection based on propagation graph neural network with attention multimodal factorized bilinear pooling for multimodal fake news
mechanism,’’ Expert Syst. Appl., vol. 158, Nov. 2020, Art. no. 113595. detection,’’ Expert Syst. Appl., vol. 184, Dec. 2021, Art. no. 115412.
[Online]. Available: [161] A. Nascita, A. Montieri, G. Aceto, D. Ciuonzo, V.
http://www.sciencedirect.com/science/article/pii/S095741742030419X Persico, and A. Pescape, ‘‘XAI meets mobile traffic classification:
Understanding and improving multimodal deep learning architectures,’’
[144] L. Zhang, J. Li, B. Zhou, and Y. Jia, ‘‘Rumor detection IEEE Trans. Netw. Service Manage., early access, Jul. 19, 2021, doi:
based on SAGNN: Simplified aggregation graph neural networks,’’ 10.1109/TNSM.2021.3098157.
Mach. Learn. Knowl. Extraction, vol. 3, no. 1, pp. 84–94, Jan. 2021.
[Online]. Available: https://www.mdpi.com/2504-4990/3/1/5 [162] A. Adadi and M. Berrada, ‘‘Peeking inside the black-
box: A survey on explainable artificial intelligence (XAI),’’ IEEE Access,
[145] S. Hiriyannaiah, A. Srinivas, G. K. Shetty, G. Siddesh, vol. 6, pp. 52138–52160, 2018.
and K. Srinivasa, ‘‘A computationally intelligent agent for detecting fake
news using generative adversarial networks,’’ in Hybrid Computational M. F. MRIDHA (Senior Member, IEEE) received the
Intelligence: Challenges and Applications. Amsterdam, The Netherlands: Ph.D. degree in AI/ML from Jahangirnagar
Elsevier, 2020, p. 69. University, in 2017. He joined as a Lecturer at the
Department of Computer Science and Engineering,
[146] J. Wang, L. Yu, W. Zhang, Y. Gong, Y. Xu, B. Wang, P. Stamford University Bangladesh, in June 2007. He
Zhang, and D. Zhang, ‘‘IRGAN: A minimax game for unifying generative was promoted as a Senior Lecturer at the
and discriminative information retrieval models,’’ in Proc. 40th Int. ACM Department of Computer Science and Engineering,
SIGIR Conf. Res. Develop. Inf. Retr., Aug. 2017, pp. 515–524. in October 2010, and promoted as an
[147] Y. Li and J. Ye, ‘‘Learning adversarial networks for AssistantProfessorattheDepartmentofComputer
semi-supervised text classification via policy gradient,’’ in Proc. 24th Science and Engineering, in October 2011. Then,
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2018, pp.
he joined as an Assistant Professor at UAP, in May 2012. He worked as a CSE
1715–1723.
Department Faculty Member at the University of Asia Pacific and a Graduate
[148] B. Hu, Y. Fang, and C. Shi, ‘‘Adversarial learning on Coordinator, from 2012 to 2019. He is currently working as an Associate
heterogeneous information networks,’’ in Proc. 25th ACM SIGKDD Int. Professor with the Department of Computer Science and Engineering,
Conf. Knowl. Discovery Data Mining, Jul. 2019, pp. 120–129. Bangladesh University of Business and Technology. His research experience,
[149] T. Le, S. Wang, and D. Lee, ‘‘MALCOM: Generating within both academia and industry, results in over 80 journals and conference
malicious comments to attack neural fake news detection models,’’ in publications. For more than ten years, he has been with the masters and
Proc. IEEE Int. Conf. Data Mining (ICDM), Nov. 2020, pp. 282–291. undergraduate students as a supervisor of their thesis work. His research
interests include artificial intelligence (AI), machine learning, deep learning,
[150] Y. Long, Q. Lu, R. Xiang, M. Li, and C.-R. Huang, natural language processing (NLP), and big data analysis. He has served as a
‘‘Fake news detection through multi-perspective speaker profiles,’’ in program committee member for several international conferences/workshops.
Proc. 8th Int. Joint Conf. Natural Lang. Process., vol. 2. Taipei, Taiwan: He served as an associate editor for several journals.
Asian Fed. Natural Lang. Process., Nov. 2017, pp. 252–256. [Online].
Available: https://aclanthology.org/I17-2043/ ASHFIA JANNAT KEYA was born in Dhaka,
Bangladesh. She received the B.Sc. degree in
[151] T. Chen, X. Li, H. Yin, and J. Zhang, ‘‘Call attention to
computer science and engineering from the
rumors: Deep attention based recurrent neural networks for early rumor
Bangladesh University of Business and Technology
detection,’’ in Proc. Pacific–Asia Conf. Knowl. Discovery Data Mining.
(BUBT), in 2021. She is currently working as a
Switzerland:
Research Assistant with the Department of CSE,
Springer, 2018, pp. 40–52. BUBT. She also works as a Researcher with
[152] N. Aloshban, ‘‘ACT: Automatic fake news classification theAdvancedMachineLearningLab.Herresearch
through selfattention,’’ in Proc. 12th ACM Conf. Web Sci., Jul. 2020, pp. interests include deep learning, natural language
115–124. processing (NLP), and computer vision. She has
experienced working in C++, Python, Keras, TensorFlow, Sklearn, NumPy,
Pandas, and Matplotlib.
MD. ABDUL HAMID was born in Sonatola,
Pabna, Bangladesh. He received the Bachelor of
Engineering degree in computer and information
engineering from the International Islamic
University Malaysia (IIUM), in 2001, and the
combined master’s and Ph.D. degree from the
Computer Engineering Department, Kyung Hee
University, South Korea, in August 2009, majoring in
information communication. His education life spans
over different countries in the world. From 1989
to 1995, his high school and college graduation at the Rajshahi Cadet College,
Bangladesh. He has been in the teaching profession throughout his life, which
also spans over different parts of the globe. From 2002 to 2004, he was a
Lecturer with the Computer Science and Engineering Department, Asian
University of Bangladesh, Dhaka, Bangladesh. From 2009 to 2012, he was an
Assistant Professor with the Department of Information and Communications
Engineering, Hankuk University of Foreign Studies (HUFS), South Korea. From
2012 to 2013, he was an Assistant Professor with the Department of Computer
Science and Engineering, Green University of Bangladesh. From 2013 to 2016,
he was an Assistant Professor with the Department of Computer Engineering,
Taibah University, Madinah, Saudi Arabia. From 2016 to 2017, he was an
Associate Professor with the Department of Computer Science, Faculty of
Science and Information Technology, American International University-
Bangladesh, Dhaka. From 2017 to 2019, he was an Associate Professor and a
Professor with the Department of Computer Science and Engineering,
University of Asia Pacific, Dhaka. Since 2019, he has been a Professor with the
Department of Information Technology, King Abdulaziz University, Jeddah,
Saudi Arabia. His research interests include network/cyber-security, natural
language processing, machine learning, wireless communications, and
networking protocols.
MUHAMMAD MOSTAFA MONOWAR received the
B.Sc. degree in computer science and information
technology from the Islamic University of Technology
(IUT), Bangladesh, in 2003, and the Ph.D. degree in
computer engineering from Kyung Hee University,
South Korea, in 2011. He worked as a Faculty
Member at the Department of Computer Science and
Engineering, University of Chittagong, Bangladesh.
He is currently working as an Associate Professor at
the Department
of Information Technology, King Abdulaziz
University, Saudi Arabia. His research interests include wireless networks,
mostly ad-hoc, sensor, and
meshnetworks,includingroutingprotocols,MACmechanisms,IPandtransport
layer issues, cross-layer design, and QoS provisioning, security and privacy
issues, and natural language processing. He has served as a program committee
member for several international conferences/workshops. He served as an editor
for a couple of books published by CRC Press and Taylor & Francis Group. He
also served as a guest editor for several journals.
MD. SAIFUR RAHMAN is currently working as an
Assistant Professor at the Department of Computer
Science and Engineering, Bangladesh University of
Business and Technology. He has expertise in
software development and has developed numerous
management systems. He has been a successful
Director of the International Collegiate
Programming Contest (ICPC), Dhaka Regional
Contest, in 2014. Apart from the collaboration and
development domain, his skills
cover theoretical background in computer
engineering sectors. His research interests include system design and artificial
intelligence-based systems.
He received coach awards in ICPC Dhaka Regional Contests.
RESEARCH PAPER -3
FAKE NEWS DETECTION USING MACHINE
LEARNING
Associate Prof. DR. Mahendra Sharma*1, Assistant Prof. Mrs. Laveena Sehgal*2, Blaghul Rizwan*3, Md
Zamin Zafar*4
*1 Associate Professor, Department of Information
Technology, IIMT College Of Engineering, Greater Noida, Uttar Pradesh India.
*2 Assistant Professor, Department of Information
Technology, IIMT College Of Engineering, Greater Noida, Uttar Pradesh India.
*3 Student, Department of Information
Technology, IIMT College Of Engineering, Greater Noida, Uttar Pradesh India.
*4 Student, Department of Information
Technology, IIMT College Of Engineering, Greater Noida, Uttar Pradesh India.

ABSTRACT
The fake news on social media and various other media is wide spreading and is a matter of serious concern due to its ability
to cause a lot of social and national damage with destructive impacts. A lot of research is already focused on detecting it.
This paper makes and analysis of the research related to fake news detection and explores the traditional machine learning
models to choose the best, in order to create a model of a product with supervised machine learning algorithm, that can
classify fake news as true or false, by using tools like python is scikit-learn, NLP for textual analysis. This process will result
in feature extraction and vectorization; we propose using python scikit-learn library to perform tokenization and feature
extraction of text data, because this library contains useful tools like Count Vectorizer and Tiff Vectorizer. Then, we will
perform feature selection methods, to experiment and choose the best fit features to obtain the highest precision, according to
confusion matrix result.

INTODUCTION
Fake news contains misleading information that could be checked. This maintains lies about a certain startup in a country or
exaggerated cost of certain services for a country, which may arise unrest for some countries like an Arabic spring. there are
organizations like the House of Commons and the crosscheck project, trying to deal with issue as confirming authors and
accountable. However, their scope is so limited because they depend on human mutual detection, in a globe with millions of
articles either removed or being published every minute, this cannot be accountable or feasible manually. A solution could
be, by the development of a system to provide a credible automated index scoring, or rating for credibility of different
publishers and news context.
This paper proposes a methodology to create a model that will detect if an article is authentic or fake based on its words,
phrases, sources and titles, by applying supervised machine learning algorithms on an annotated (labeled) dataset, that are
manually classified and guaranteed. Then, feature selection methods are applied to experiment and choose the best fit
features to obtain the highest precision, according to confusion matrix results. We propose to create the model using different
classification algorithms. The product model will test the unseen data, the results will be plotted, and accordingly, the
product will be a model that detects the classifies fake articles and can be used and integrated with any system for future use.

RELATED WORK
1. Social Media and Fake News
Social media includes websites and programs that are devoted to forums, social websites, microblogging, social
bookmarking and wikis. On the other side some researchers consider the fake news as a result of accidental issue such as
educational shock or underwriting actions like what happened in Nepal Earthquake case. In 2020, there was widespread fake
news concerning health that had exposed global health at risk. The WHO released a warning during early February 2020 that
the COVID-19 outbreak has caused massive ‘infodemic’, or a spurt of real and fake news-- which includes lots of
misinformation.

2. Natural language processing


The main reason for utilizing natural language processing is to consider one or more specializations of system or an
algorithm. natural language processing (NLP) painting of an algorithmic system enables the combination of speech
understanding and speech generation. In addition, it could be utilized to detect actions with various languages. There has
been suggested a new ideal system for extraction actions from languages of English, Italian and Dutch speeches through
utilizing various pipelines of various languages such as emotion analyzer and detection, named entity recognition (NER),
parts of speech taggers, chunking, and semantic role labeling made NLP good subject of search.
This sentiment analysis extracts emotions on a particular subject. Sentiment analysis is composed of extracting a specific
term for a subject, extracting the sentiment and pairing the connection analysis. The sentiment analysis uses dual languages
resources for analysis: Glossary of meaning and sentiment models database full for constructive and destructive words and
attempts to give classifications on a level of –5 to 5. parts of speech taggers tool for languages such as European languages
are being explored to produce parts of language taggers tools in different languages such as Sanskrit, Hindi and Arabic. Can
be efficient mark and categorize words as names, adjectives, verbs, and so on. Most part of speech techniques can be
performed effectively in European languages but not in Asian or Arabic languages. Parts of Sanskrit word “speak”
specifically use the treebank method. Arabic utilizes vector machine (SVM) uses a method to automatically identify symbols
and parts of speech and automatically expose basic sentences in Arabic text.

3. Data mining
data mining techniques are categorized into two main methods which are: supervised and unsupervised. The supervised
method utilizes the training information in order to foresee the hidden activities. Unsupervised data mining is an attempt to
recognize hidden data models provided without providing training data, for example, pairs of input labels and categories. A
model example for unsupervised data mining is aggregate mines and a syndicate base.

4. Machine learning (ML) Classification


Machine learning (ML) is a class of algorithms that help software system achieve more accurate results without having to
reprogram them directly data scientists categorize changes or characteristics that the model needs to analyze and utilize to
develop predictions. When the training is completed, the algorithm splits the learned levels into new data. There are six
algorithms that are adopted in this paper for classifying the fake news.
5. Decision tree
The decision tree is an important tool that works based on flow chart like structure that is mainly used for classification
problems. Each internal node of the decision tree specifies a condition or a “test” on an attribute and the branching is done
on the basis of the test conditions and result. Finally, the leaf node bears a class label that is obtained after computing all
attributes. The distance from the root to leaf represents the classification rule. The amazing thing is that it can work with
category and dependent variable. They are good at identifying the most important variables and they also depict the relation
between the variables quite aptly. They are significant in creating new variables and features which are useful for data
exploration and predict the target variable quite efficiently.
Decision Tree Pseudo-code
GenerateDecision Tree (Sample s, features F)
1. If stop_condition(S, F)= true then
a. leaf = create_Node()
b. Leaf.lable= classify(s)
c. Return leaf
2. root= creat_Node()
3. root.testcondition= find_bestSplit(s,f)
4. v = {v l v apossible outcome of root.testconditions}
5. for each value v belong V
6. sv: = {s | root.testcondition(s) = v and s belong to S};
7. child = Tree_Growth(Sv,F);
8. Grow child as descent of root and lable the edge (root--->child) as v

Tree based learning algorithms are widely with predictive models using supervised learning methods to establish high
accuracy. They are good at mapping non-linear relationships. They solve the classification or regression problems quite well
and are also referred to as CART.
6. Random Forest
Random forests are built on the concept of building many decision tree algorithms, after which the decision trees get a
separate result. The results, which are predicted by a large number of decision trees, are taken up by the random forest. To
ensure a variation of the decision trees, the random forest randomly selects a subcategory of properties from each group.
The applicability of Random Forest is the best when used on uncorrelated decision trees. If applied on similar trees, the
overall result will be more or less similar to a single decision tree. Uncorrelated decision trees can be obtained by
bootstrapping and feature randomness.
Random Forest Psudo-code

To make n classifiers:
For I= 1 to n do
Sample the training data T random with replacement for Ti output
Build a Ti- containing root node,

Ni Call BuildTree (Ni) end For


BuildTree (N);
If N includes instance of only one class, then returns else
Select z% of the possible splitting characteristics at random in N
Select the features F with the highest information gain to split on
Create f child nodes of N, Ni …., Nf, where F has f possible values (F1, …., Ff) For i
= 1 to f do
Set the content of Ni to Ti, where Ti is all instances in N that match Fi
Call Buildtree(Ni) end for end if
7. Support Vector Machine (SVM)
The SVM algorithm is based on the layout of each data item in the form of a point in a range of dimensions n (the number of
available properties), and the value of a given property is the number of specified coordinates. Given a set of N features,
SVM algorithm uses n dimensions space to plot the data Item with the coordinates representing the value of each feature.
The hyper-plane obtained to separate the two classes is used for classifying the data.

SVM Pseudo-Code

F[0...N-1]: a feature set with N features that is sorted by information gain in decreasing order accuracy (I): accuracy of
prediction model based on SVM with F[0...i] gone set
Low= 0
High= N-1
Value = accuracy (N-1)
IG_RFE_SVM(F[0....N-1], value, low, high){
If (high)<=low)
Return F[0...N-1] and value
Mid=(low+high) / 2
Value_2= accuracy(mid)
If (value_2>=value)
Return IG_RFE_SVM (F|0....mid), value_2, low, mid)
Else (value_2 < value)
Return IG_RFE_SVM (F[0...high], value, mid,high)

8. Naive Bayes
This algorithm works on Bayes theory under the assuming that its free from predictors and is used in multiple machine
learning problems. Simply put, Naïve Bayes assumes that one function in the category has nothing to do with another. For
example, the fruit will be classified as an apple when it's red color, swirls, and the diameter is closed 3 inches. Regardless of
whether these functions depend on each other or on different functions, and even if these functions depend on each other or
on other functions, Naïve Bayes assumes that all these functions share a separate proof of the apples.
Naïve Bayes Equation

P (c | x) = {P (x | c) P(c)} \ P(x)
P (c | X) = P (x1| c) * P (x2 | c) *…...* P (x2 | c) * P (c)
Where:

P (c | X) is the posterior Probability.


P (x | c) is the Likelihood.
P (c ) is the Class Prior Probability.
P (x) is the Predictor Prior Probability.

Naïve Bayes Pseudo-Code


Training dataset T,
F= (f1, f2, f3, …..., fn) // value of the predictor variable in testing dataset.
Output:
A class of testing dataset.
Step:
1. Read Training Dataset T;
2. Calculate the mean and norm of each class's predictor variables;
3. Repeat.
4. Calculating the Likelihood of using the equation of gauss density in each class;
5. Until pending the estimation of the Likelihood of all predictor variables (f1, f2, f3, ….., fn).
6. Calculated the Likelihood for respective class;
7. Get the highest Likelihood;
Random Forest (RF) and Naïve Bayes have many differences, the main is their model size. The NB models are not good at
representing complex behavior, resulting in low model size and good for a constant type of data. In contrast, the model's size
for random forest model is very large and it might result in overfitting. NB is good for dynamic data and can be reshaped
easily when new data is inserted while using a RF may require a rebuild of the forest every time a change is introduced.
9. KNN (k- Nearest Neighbours)
KNN classifies new positions based on most of the sounds from the neighboring K with respect to them. The position
assigned in the class is highly mutually exclusive between the nearest neighbors K, as measured by the role of the distance.
KNN Pseudo-Code
Classify (X, Y, x) // X: training data, Y: class label of X, x: unidentified sample For I =
1 to m do
Calculate distance d (Xi, x)
End for
Calculate set (I) containing indices for k smallest distance d (Xi, x)
Return majority label for {Yi where I belongs I}
KNN falls in the category of supervised learning and its main applications are instrusion detection pattern recognition. It is
nonparametric, so no specific distribution is assigned to the data, or any assumption is made about them. For example, GMM
assumes a Gaussian distribution of the given data.
10. Combining Classifiers
Achieving the best possible taxonomic performance is the primary goal when planning paradigm-detecting systems. For that
reason, different classification planners for the models of detecting actions are able to be progressed although if one model
may perform the highest execution, the style sets correctly categorized by variant classifiers is not important to be overlap.
Variant categorization planners can give additional information for the models. With this additional information, the
execution of individual models can be improved.
11. Related Work on Fake News Detection
Pointed out various sources of media and made the suitable studies whether the submitted article is reliable or fake. The
paper utilizes models based on speech characteristics unproductive models that do not fit with the other current models.
Used naive Bayes classifiers to detect fake news by naive Bayes. This method was performed as a software framework and
experimented with various records from Facebook, etc., resulting in an accuracy of 74%. The paper neglected punctuation
errors, resulting in poor accuracy. Estimated various ML algorithms and made the researchers on the percentage of
prediction. The accuracy of various predictive patterns included bounded decision trees gradient enhancement, and support
vector machine were assorted. The patterns are estimated based on an unreliable probability threshold with 85-91% accuracy.
Utilize be Naive Bayes Classifiers, discuss how to implement fake news discovery to different social media sites. They used
Facebook, Twitter and other social media applications as a data source for news. Accuracy is very low because the
information on This site is not 100% credible. Discuss misleading and discovering rumors in real time. It utilizes a novelty-
based characteristic underived its data source from Kaggle. The accuracy average of this pattern is 74.5%. Clickbait and
sources do not consider unreliable, resulting in lower resolution. Used to distinguish Twitter spam senders. Among the
various models used are Naïve Bayes algorithms, the clustering, and the decision tree. The accuracy average of detecting
spammers is 70% and fraudsters 71.2%. The models used have achieved a low level of intermediate precision to separate
spammers from non-spam. identified fake news in different ways. The accuracy is limited to 76% as a language model.
Greater accuracy can be achieved if a productive model is used. Aim to utilize machine learning methods to detect fake news
three common methods are utilized through their research: Naïve Bayes, Neural Network and Support Vector Machine
(SVM). Normalization technique is an essential stage in data cleansing prior machine learning is used to categorize the data.
Two more advanced methods, the neural network and the machine vector (SVM) reached an accuracy of 99.90%.
In it has been discovered that fake news detection is a predictive analysis application. Detecting counterfeit messages
involves the three stages of processing feature extraction and classification the hybrid classification model in this research is
designed for Show fake news. The combination of classification is a combination of KNN And random forests. The
execution of suggested model is analyzed for accuracy and recall. The final results improved by up to 8% using a mixed
false message detection model.
Examined how fake news was used in the 2012 Dutch elections on Twitter. She examines the execution of eight supervised
machine learning classifiers in the Twitter data set. We assume that the decision tree algorithm works best for the data set
used with an F score of 88%. 613,033 tweets were rated, of which 328,897 were considered genuine and 284136 were false.
By analyzing the qualitative content of false tweets sent during the election, features and properties of the wrong content
were found and divided into six different categories.
Presented a counterfeit detection model using N-gram analysis by the lenses of various characteristic extraction techniques.
In addition, we examined the extraction techniques of various features and six different methods of machine learning. The
proposed model achieves the highest accuracy in use contains a unigram and the linear SVM workbook. The highest
accuracy is 92%.
3. Methodology
This section presents the methodology used for classification. Using this model, a tool is implemented for detecting the fake
articles in this method supervised machine learning is used for classifying the data set the first step in this classification
problem is data set collection phrase, followed by preprocessing, implementing feature selection, then perform the training
and testing of data set and finally running the classifiers. Figure describes the proposed system methodology. The
methodology is based on conducting various experiments on datasets using the algorithm described in the previous section
named random forest, SVM and section and Naïve Bayes, majority voting and other classifiers. The experiments are
conducted individually on each algorithm, and on combination among them for the purpose of best accuracy and precision.

Database

Data for Data for


testing Training

Feature pre -
processing Pre- Feature
selectio
processing selection

Classificat Model Learning


ion model

Figure1. Describe the Proposed System Methodology.

The main goal is to apply a set of classification algorithms to obtain a classification model in order to be used as a scanner
for a fake news by details of news detections and embed the model in Python application to be used as a discovery for the
fake news data. Also, appropriate refactorings have been performed on the Python code to produce an optimized code.
Massification algorithms applied in this model are K nearest neighbors, linear regression, XGBoot, Naïve Bayes, Decision
Tree, Random Forest and Support Vector Machine. All these algorithms get as accurate as possible. Where reliable from the
combination of the average of them and compare them.
As shown in the figure, the data set is applied to different algorithms in order to detect a fake news. the accuracy of the
results obtained are analyzed to conclude the final result.

Linear Regression

Random Forest

XGBoot -AVG
Data set Naive Bayes sum of accuracy / N

K-Nearest Neighbors (K
-NN) - If class >=1 Then class = 1
else 0
Decision Tree

Support Vector Machine (SVM)


Figure 2. The Classification Algorithms

In the process of model creation, the approach of detecting political fake news is as follows: first step is collection political
news data set (a liar data set is adopted for the model), perform preprocessing through rough noise removal the next step is to
apply the NLTK (Natural Language Toolkit) to perform POS and features are selected next perform the data set splitting
apply ML algorithm. The figure 2 shows that after the NLTK is applied, the Dataset gets successfully preprocessed in the
system, then a message is generated for applying algorithms on trained portion. The system responds with N.B and Random
forests are applied, then the model is created with response message. Testing is performed on the test data set, and the results
are verified the next step is to monitor the precision for acceptance. The model is then applied on unseen data selected by the
user. Full data set is created of the data being fake and half with real articles thus making the model’s reset accuracy 50%.
random selection of 80% data is done from the fake and real data set to be used in our complete data set and leaving the
remaining 20% to be used as a testing set when our model is complete. Text data requires preprocessing before applying
classifier on it, so we we'll clean noise, using Stanford NLP (Natural language processing) for POS (Part of Speech)
processing and tokenization of words, then we must encode the resulted data as integers and floating point values to be
accepted as an input to ML algorithms. This process will result in feature extraction and vectorization; the research using
python scikit-learn library to perform tokenization and feature extraction of text data, because the library contains useful
tools like Count Vectorizer and Tiff Vectorizer. Data is viewed in graphical presentation with confusion matrix.

Aggregation
Data set Model User
+supll+preproces +Train() associatio +insert doublted
s() Dependenc news()
+create ()
+transform () +preprocess ()
+Test ()
+splitply model() Preprocess +predict ()
+Verify
+views_result ()
+update () (A+apply noise
removal()A+apply POS()

composition A+apply
CountVectorizer
Association
+Apply TfidfVectorizer()

Figure 3. Fake Detector Model

RESULT

The scope of this project is to cover the political news data of a dataset known as Liar-dataset, it a new
Benchmark Dataset for fake news detection and labeled by fake or trust news. We have performed analysis on “Liar” dataset.
The results of the analysis of the datasets using the six algorithms have been depicted using the confusion matrix. The 6th
algorithms used for the detections are as:

• XGBoot.
• Random Forest.
• Naïve Bayes.
• K-Nearest Neighbors (KNN).  SVM.
The confusion matrix is automatically obtained by Python code using the cognitive learning library when running the
algorithm code is Anaconda platform.

CONCLUSION
The research in this paper focuses on detecting the fake news by reviewing it in two stages: characterizations and disclosure.
in the first stage the basic concept and the principle of fake news are highlighted in social media. During the discovery stage,
the current methods are reviewed for detection of fake news using different supervised learning algorithms.
As for the displayed fake news detection approaches that is based on text analysis in the paper utilizes models based on
speech characteristics and predictive model that do not fit with the other current models.
In the F4 mentioned research summary and system analysis we concluded that most of the research papers used aive based
algorithm, and the prediction precision was between 70 to 76%, they mostly use qualitative analysis depending on sentiment
analysis, titles, word frequency repetitions. In our approach we propose to add to these methodologies, another aspect, which
is POS textual analysis, it is a quantitative approach its depends on adding numeric statical values as features, we thought
that increasing these features and using random forest will give further improvements to precession results. The features we
propose to add in our dataset are total words (tokens), total unique words(types), Type / Token Ratio(TTR) , Number of
sentences, average sentence length(ASL), number of characters, average word length (AWL), nouns, prepositions, adjectives
etc.

REFERENCES

1. https://www.researchgate.net/search?q=fake%20news
%20detection 2. https://www.researchgate.net/search?q=fake
%20news%20detection 3. https://www.researchgate.net/search?
q=fake%20news%20detection
4. https://www.researchgate.net/search?q=fake%20news
%20detection
RESEARCH PAPER -4
Journal on Interactive Systems, 2023, 14:1, doi: 10.5753/jis.2023.3020
This work is licensed under a Creative Commons Attribution 4.0 International License.

Fake news detection: a systematic literature review of


machine learning algorithms and datasets
Humberto Fernandes Villela [ Universidade FUMEC | [email protected] ]
Fábio Corrêa [ Universidade FUMEC | [email protected] ]
Jurema Suely de Araújo Nery Ribeiro [ Universidade FUMEC | [email protected] ]
Air Rabelo [ Universidade FUMEC | [email protected] ]
Dárlinton Barbosa Feres Carvalho [ Universidade Federal de São João del-Rei | [email protected] ]

Abstract
Fake news (i.e., false news created to have a high capacity for dissemination and malicious intentions) is
a problem of great interest to society today since it has achieved unprecedented political, economic, and
social impacts. Taking advantage of modern digital communication and information technologies, they are
widely propagated through social media, being their use intentional and challenging to identify. In order to
mitigate the damage caused by fake news, researchers have been seeking the development of automated
mechanisms to detect them, such as algorithms based on machine learning as well as the datasets employed in
this development. This research aims to analyze the machine learning algorithms and datasets used in training
to identify fake news published in the literature. It is exploratory research with a qualitative approach, which
uses a research protocol to identify studies with the intention of analyzing them. As a result, we have the
algorithms Stacking Method, Bidirectional Recurrent Neural Network (BiRNN), and Convolutional Neural
Network (CNN), with 99.9%, 99.8%, and 99.8% accuracy, respectively. Although this accuracy is expressive,
most of the research employed datasets in controlled environments (e.g., Kaggle) or without information
updated in real-time (from social networks). Still, only a few studies have been applied in social network
environments, where the most significant dissemination of disinformation occurs nowadays. Kaggle was the
platform identified with the most frequently used datasets, being succeeded by Weibo, FNC-1, COVID-19
Fake News, and Twitter. For future research, studies should be carried out in addition to news about politics,
the area that was the primary motivator for the growth of research from 2017, and the use of hybrid methods
for identifying fake news.

Keywords: Algorithms, datasets, accuracy, fake news, artificial intelligence.


Introduction examined, the proposed algorithms show an accuracy of
73.7% to 98.0%. This result highlights the relevance of this
Currently, the term fake news is on the rise as this type of
theme and the constant search to promote a more assertive
news can remarkably influence society, promoting
detection of fake news, to minimize its impacts in the
significant political, economic, or social impacts (Zhang et
aforementioned political, economic, or social contexts,
al., 2016; Islam et al., 2020). They are false news created to
among others.
be highly broadcastable, usually with malicious intent (i.e.,
to deceive, cause ambiguity, or falsehood). Accordingly, this research is grounded on the
recommendations of Ahmed, Traore, and Saad (2017),
In the political context, specifically, Almeida et al. (2021)
Ahmad et al. (2020), Agarwal et al. (2020), Aslam et al.
point out the impacts of fake news on the US presidential
(2021) e Jiang et al. (2021) regarding the use of algorithms
elections in 2016, having President Donald Trump elected.
to identify fake news. Therefore, the objective of this
In Brazil, similar impacts were imputed to the election of
investigation is to analyze the accuracy obtained and the
President Jair Bolsonaro in 2018. Due to the behavior of
datasets used in fake news identification algorithms.
many Brazilian voters relying on social media as the
primary source to access news, this channel is fruitful for This study intends to contribute by highlighting more
the proliferation of fake news (ALMEIDA et al., 2021). successful algorithms and datasets in identifying fake news
to better support researchers’ understanding of the state-
Due to the relevant impact this type of news has caused
ofthe-art, given the variety of options mentioned above.
on society, researchers have been seeking to develop ways
Besides, it consolidates recommendations for future studies.
to detect them. Using algorithms for the automated
Thus, the theoretical contribution stands out for presenting
identification of fake news presents itself as a promising
algorithms and datasets with expressive accuracy, providing
line of research. The quality of these algorithms is
a means for their identification and practical application for
commonly verified by accuracy, which is the measure of
the identification of fake news, and continuing the research
correctness in classifying whether a news item is true or
by Medeiros and Braga (2020), carried out in August of
false (Chapra & Canale, 2016). That is, accuracy represents
2019. In addition, by exploring fake news, a phenomenon
the assertiveness characteristic of the algorithm in detecting
with relevant political, economic, and social impacts, we
fake news.
seek to encourage discussion on this topic and shed light on
However, the quality of algorithms is directly related to
the importance of computing research in tackling this
their specific purpose, restricted to a language or news
matter. Thus, this article is subdivided into parts to present
style,
the work carried out. In addition to this introduction, the
following section explains the theoretical and practical
aspects of fake news, contextualizing this phenomenon for
that is, according to the datasets used for training (Ahuja & the purposes of this research. The following section
Kumar, 2020). For this reason, Jiang et al. (2021) and explains the methodological procedures used to achieve the
Ahuja and Kumar (2020) recommend continuing research desired objective. Accordingly, the results are discussed in
using varied datasets, such as in languages other than the subsequent section, and so on, promoting this study’s
English, given that this is the most used. conclusion. The references that support this investigation
Nevertheless, recurring studies to detect fake news have end it.
achieved relevant accomplishments. The research by
Burfoot and Baldwin (2009) reached 71% accuracy in
Fake news detection
detecting fake news, while Ahmed, Traore, and Saad (2017)
raised this metric to 87%, the same percentage achieved by The endless need to be informed in a progressively
Low et al. (2022). connected and ubiquitous world is already part of most
Medeiros and Braga (2020) carried out research aiming, people’s routines. The ease of communicating through the
among others, to identify the accuracy achieved by internet is ever more simplified and decentralized. People
algorithms in detecting fake news. From the 32 studies and devices connected, through the Internet of Things
(IoT), communicate through the global network at all times, high complexity to the detection algorithms, even for this
culminating in an increasingly high volume of shared binary classification.
communications. Nevertheless, Collins et al. (2021) subdivide fake news
The increasing expressive volume of communications into clickbait, propaganda, satire and parody, hoaxes, and
shared through the network, often treated as Big Data, others (e.g., name theft, journalistic fraud). Medeiros and
composed of textual and multimedia data, has produced Braga (2020) split into conspiracy theories, hoaxes, rumors,
valuable information. Through this perspective, Agarwal et biased news, and satires. Identifying these categories is a
al. (2020) advocate that data become an increasingly complex task due to the nuances of human language. For
valuable asset and gain greater relevance when transformed example, since satire and parody employ sarcasm and
into information and assimilated as knowledge by users. humor, the algorithmic analysis must consider this feature
In this scenario, Ali et al. (2021) express the idea of to call a news story true or false.
hyperconnectivity, which consists of the high connection of In the meantime, it is notable that social media are the
users and machines, high transmission speed, ease of biggest target of fake news due to their ease and breadth of
communication, and access to real-time information disseminating information nowadays. Another relevant
worldwide. It is a vibrant path in technological evolution, feature that is being widely used and favors the spread of
full of opportunities as well as struggles, such as facilitating fake news is the resemblance to plausible news, confusing
fake news. people and making it challenging to combat false
Fake news is usually a modified version of plausible information dissemination (Islam et al., 2020).
news to mislead, cause ambiguity or falsehood, and be Furthermore, it is noteworthy that the absence or even only
widely disseminated. It is mainly propagated through social a delay in a disclaimer by official entities involved in some
media, with intentional use and challenging identification. rumor, which is false but convincing, favors its
In this way, fake news is a specific type of disinformation, dissemination (Abouzeid et al., 2019). Ruchansky, Seo, and
like rumor and SPAM. They can generate political, Liu (2017) claim that many stakeholders profit from the
economic, and social impacts (Zhang et al., 2016; Islam et publication of fake news online because the more
al., 2020). provocative the news is, the greater the response and the
It is noteworthy that fake news is not new. In 1835, The greater its yield. A rumor can be vastly profitable for
Sun, a New York newspaper, published a series of fake someone or some organization. Given the sophistication of
news about the supposed discovery of life on the moon, fake news due to its creator’s intentions, Asaad and Erascu
being this case nowadays referenced as the Great Moon (2018) point out that critical thinking is an important ally in
Hoax (Shabani & Sokhn, 2018). However, in the modern combating the spread of disinformation.
context of hyper- Although recognizing fake news by algorithms is tricky,
like in satire and parody, the increasing volume of
communications shared on the network is expressive;
1 https://www.politifact.com/ therefore, it is unfeasible to attribute to humans the
responsibility of classifying them (Agarwal et al., 2020).
connectivity, social media has propelled the spread of this
Ahmad et al. (2020) articulate that numerous techniques
type of news.
help in the classification of articles as fake based on their
Social media platforms have a plethora of
textual content. Many of them make use of verification sites
misinformation, which has caught the attention of
such as PolitiFact1 and Snopes2. Curators also maintain
researchers in developing mechanisms to detect them (Jiang
repositories, with lists of sites considered false or
et al., 2021, 2020; Goldani, Momtazi & Safabakhsh, 2021;
ambiguous. Medeiros and Braga (2020) point out several
Birunda & Devi, 2021; Sahoo & Gupta, 2021; Goel et al.,
mechanisms for detecting fake news, divided into automatic
2021; Pardamean & Pardede, 2021). The European
and semi-automatic/manual (Figure 1); however, human
Commission has established a group of experts to advise
critical thinking, in these cases, is necessary to classify the
and discuss policy initiatives to combat fake news and the
news as false or true (Ahmad et al., 2020).
spread of disinformation online (Assad & Erascu, 2018).
Usually, the detection approaches consist of classifying
2 https://www.snopes.com/
(fake) news in a binary form (i.e., true or false) (Zhang et
al., 2019). However, the subtleties of human language add
fake news identification. For this, a research protocol is
employed, based on Dresch, Lacerda, and Antunes Júnior
(2015), to specify the research carried out in this study.
Table 1 presents the planned research protocol, which sets a
systematic review of scientific publications.

Table 1. Research Protocol. Source: The authors.


Figure 1. Different approaches proposed for detecting fake news found in Item Description
the literature. Source: translated from Medeiros & Braga (2020, p. 3) RQ1. What is the accuracy of the main
algorithms used to identify fake news?
Ahmed, Traore, and Saad (2017) and Ahmad et al. (2020) Research Question
RQ2. Which datasets are used? RQ3.
recommend the use of artificial intelligence (AI) algorithms (RQ)
What are the top recommendations
to extract linguistic features from textual articles through for future research?
machine learning. Agarwal et al. (2020) proposed, as EC1. Articles in languages other than
another way to identify fake news, labeling or classifying a English or Portuguese;
particular news or article on a defined scale, thus giving the EC2. Abstracts, technical reports,
reader an idea about the credibility of that published text. secondary studies, presentations, or
Another AI technique used in the process of identifying systematic reviews;
EC3. Incomplete or unavailable
disinformation is deep learning. To Agarwal et al. (2020), Exclusion Criteria
articles for download;
the nature of self-learning and resource maps gave deep (EC) EC4. It does not address the use of
learning a significant advantage compared to other algorithms to identify fake news;
statistical modeling and learning methods. EC5. It does not present methods or
Since research has shown that only 54% of humans can accuracy in the identification of fake
detect fraud without special assistance, Aslam et al. (2021) news;
claim that efforts should be made to build an automated EC6. Article duplicates.
system to classify news as real or fake, aiming at greater Search fields Title, abstract and keywords.
classification accuracy. Jiang et al. (2021) agree with Aslam Temporal space Publications between 2010 and 2021.
et al. (2021), remarking on the essentiality of a machine- Languages English and Portuguese.
driven approach when they state that the use of automated ACM Digital Library, IEEE Xplore,
tools to detect fake news has become an essential Databases Scopus, Science Direct, Springer, Web
requirement to tackle the issue. of Science, EBSCO
The accuracy for identifying fake news by an algorithm is QS1. ("computational techniques")
AND (("fake news") OR
defined as the measure of correct answers attributed to a
(disinformation) OR (misinformation)
group of news stories as true or false (Chapra & Canale, OR
2016). This result is intrinsically related to the features Descriptors (malinformation))
(e.g., language, style) of the news stories of interest, which (Query String) QS2. (("fake news detection") OR
are called the dataset (Ahuja & Kumar, 2020). ("disinformation detection") OR
Accordingly, given the impacts of fake news in society, it ("misinformation detection") OR
is relevant to pinpoint the algorithmic means developed and ("malinformation detection"))
the accuracy achieved by them, as well as the datasets used,
to provide a better understanding regarding the The performed research design included articles from
computational state-of-the-art at tackling the issue. To this the last ten years (from 2010 to 2021) to identify the current
end, the methodological procedures are outlined as follows. state-of-the-art about algorithms and datasets used to
identify fake news and the accuracy achieved. It also
Methodology looked at the recommendations for future research in the
field.
This research has an exploratory nature and a qualitative
As a source for publications of computational
approach since this investigation aims to analyze the
techniques in the context of fake news, the research
accuracy obtained by algorithms and the datasets used in
considered seven databases. This choice is based on the
criterion of scope. The ACM Digital Library and IEEE Table 3. Accuracy of the analyzed algorithms used to identify fake news.
Xplore are fruitful bases on the subject, with the other bases Source: The authors.
and the Portuguese language being added to expand the
search. Author Algorithm Acc
The descriptors were defined based on the authors’ prior 1) Jiang et al. (2021) Stacking Method 99.9%
knowledge of the subject and a trial-and-error process using
Bidirectional
the databases’ search engines. The authors performed and Recurrent Neural
discussed the analysis together until reaching an agreement. 99.8%
Network
2) Jiang et al. (2020) (BiRNN)
Results and discussion Convolutional
3) Goldani, Momtazi e Neural Network 99.8%
The application of the proposed research protocol, through
Safabakhsh (2021) (CNN)
the search employing the query strings QS1 and QS2 (Table Gradient Boosting
1), was carried out on 06/10/2021 and retrieved 507 articles 99.5%
4) Birunda e Devi (2021)
from the following databases: ACM Digital Library Long short-term
(ACM), IEEE Xplore (IEEE), Scopus (SCO), Science 99.4%
5) Sahoo e Gupta (2021) memory (LSTM)
Direct (ScD), Springer (Spr), Web of Science (WoS), Robustly
EBSCO (EBS). Of this amount, three articles were Optimized BERT
disregarded by the EC1 exclusion criterion, 82 by EC2, 80 Pretraining 99.3%
related to EC3, 150 related to EC4, 51 related to EC5, and Approach
80 related to EC6, totaling 446 articles not consistent with 6) Goel et al. (2021) (RoBERTa)
the intent of this research. Table 2 details the whole process Bidirectional
providing the specific number of publications retrieved encoder
99.2%
from each source and the application of each exclusion 7) Pardamean e Pardede representation of
criteria along the analysis. Thus, 61 publications comprise (2021) transformers (BERT)
the sample considered for analysis by this research.

Table 2. Application of the Research Protocol. Source: The authors.


ACM IEEE SCO ScD Spr WoS EBS Σ
QS1 44 0 2 2 8 6 1 63
QS2 16 188 91 36 36 45 32 444
ΣS 60 188 93 38 44 51 33 507
EC1 0 0 1 1 0 1 0 3
EC2 3 45 11 5 7 7 4 82
EC3 12 42 11 5 1 4 5 80
EC4 36 37 32 6 29 9 1 150
EC5 2 23 12 5 2 5 2 51
EC6 4 13 13 12 2 16 20 80
ΣEC 57 160 80 34 41 42 32 446
Σ 3 28 13 4 3 9 1 61

The 61 scientific articles were fully read to answer the


research questions (RQ1, RQ2, and RQ3 in Table 1).
Regarding RQ1 (What is the accuracy of the main
algorithms used to identify fake news?), Table 3 presents
the results obtained through the analyzed articles and the
accuracy (Acc) reported by them.
Convolutional
8) Kaliyar et al. (2020a) 9) Albahr e Albahar
Neural Network 99.1%
(2020)
(CNN)

Naive Bayes 99.0%


10)Gereme et al. (2021)
Convolutional
Neural Network 99.0% 11) Nasir, Khan e Varlamis (2021)
(CNN)
Hybrid CNNRNN 99.0% 12)Kaliyar, Goswami e Narang (2021)
Fake news detection
in social me-
13)Dadkhah et al. (2021)
dia with a 98.9% 14)Goldani, Safabakhsh e Momtazi (2021)
BERTbased
(FakeBERT) 15)Sridhar e Sanagavarapu (2021)
AWD-LSTM 98.8%
16)Umer et al. (2020)
CNN com mar-
ginloss 98.4% 17)Thakur et al. (2020)

BiLSTM-Cap- 18)Agarwal et al. (2020)


sNet 98.0%
19)Ayoub, Yang e Zhou
CNN-LSTM 97.8% (2021)
Gradient Boosting 20)Yu et al. (2020)
(GB) 97.6%

CNN+RNN 97.2%
21)Xie et al. (2021)
DistilBERT 97.2%
96.9% 22)Ozbay e Alatas (2019)
IARNet
Stance Extrac- 23)Fang et al. (2019)
tion and Reasonic 96.6%
Network (SERN) 24)Kaliyar et al. (2020b)
25)Faustini e Covões (2020)
Grey Wolf
96.5%
Optimization (GWO)
SMHA-CNN 95.5% 26)Varshney e Vishwakarma (2020)

DeepNet 95.2% 27)Ahuja e Kumar (2020)

Random Forest 28)Ivancov, Sarnovsk e Maslej-kre (2021)


(RF) 95.0%
29)Verma et al. (2021)

Random Forest 95.0% 30)Wang et al. (2021)


31)Kumar, Anurag e Pratik (2021)
S-HAN 93.6%
CARMN 92.2%
LSTM 93.6%
32)Song et al. (2021)
WELFake 92.6%

SemSeq4FD 92.6%

EchoFakeD 92.3%
57) Garg
33) Sharma, Kesarwani
e et al. BiLSTM K-Nearest 91.5%
79.0%
(2020)
Shrivastava (2021) Neighbor
SVM-RNN-GRUs
AA-HGNN (Ad-
34) Albahar
58) (2021)
Ren et al. (2020)bidirecionais 91.2%
versarial Active
35) Bahad, Saxena e Learning based 73.5%
Kamal (2020) BiLSTM-RNN
Graph Neural91.1% Convolutional Neural Network (CNN) algorithm (Table 3).
Network)
GRU-LSTMCNN Three used typical CNN (articles 3, 8, and 10), and the
59) et
36) Torgheh Jardaneh et al. (2019)
al. (2021) Random Forest90.8% 76.0% other seven as a hybrid model, as follows: CNN-RNN
60) Al-Ahmad
37) Lakshmanarao, et al.
Swathi Algoritmos genéticos (articles 11 and 18), CNN with marginloss (article 14),
(2021)
e Kiran (2019) Random Forest 90.7% 75.4% CNN-LSTM (article 16), SHMA-CNN (article 23), GRU-
LSTM-CNN (article 36), CNN-BiLSTM (article 40).
38) Li et 61)
al. (2020) MCNN-TFV
Konkobo et al. (2020) 90.1% Goldani, Momtazi, and Safabakhsh (2021) achieved the
SSLNews 72.3%
Bi-LSTM-GRU- Ten highest accuracy of the CNN approaches, reaching 99.8%
39) Aslam et al. (2021) 89.8%
dense studies with deep learning (article 3). Through the articles analyzed
40) Kumar et al. (2020) CNN + BiLSTM 88.8% employed (Table 2), science’s growth in identifying fake news is
underlined. (Figure 2).
41) Lin et al. (2020) BERT 88.7% the
42) Kaliyar et al. (2020b) DeepFakE 88.6% Figure 2. Evolution over the years of the number of publications of
algorithms for fake news identification. Source: The authors.
Knowledgedriven
43) Wang et al. (2020)
Multi- Accuracy greater than 90% was only achieved in 2019
modal Graph 88.6% by Lakshmanarao, Swathi, and Kiran (2019) (article 37).
Convolutional
Supporting Almeida et al. (2021), Goldani, Momtazi, and
Networks
(KMGCN) Safabakhsh (2021) and Kumar, Anurag, and Pratik (2021)
Bayesian inference cite the 2016 US presidential elections as the biggest
44) Najar et al. (2019) 87.9% motivator for research applied to the identification of
algorithm
45) Chen et al. (2018) AERNN 87.6% disinformation or fake news.
The word cloud of the most recurrent keywords (Figure
46) Alanazi e Khan (2020) SVM 87.1% 3) in the analyzed articles showed that the terms “fake news
Gaussian Naive detection”, “fake news”, “deep learning”, “machine
47) Mugdha et al. (2020) 87.0% learning”, and “feature-extraction”, with 24, 15, 16, 13 and
Bayes
Gradient Boosting 12 occurrences, respectively, were the words with the
48) Kaliyar et al. (2019) 86.0% highest recurrence in the 61 articles analyzed by this
49) Ajao et al. (2019) LSTM HAN 86.0% research.

50) Lin et al. (2019) XGBoost 85.5%


FND-Bidirectional
51) Qawasmeh et al.
LSTM 85.3%
(2019)
concatenated
52) Qi et al. (2019) Multi-domain
Visual Neural
84.6%
Network
(MVNN)
53) Shabani e Sokhn CROWDSOURC
(2018) ING 84.0%

54) Barua et al. (2019) LSTM+GRU Figure 3. Word cloud of most recurrent keywords in articles
(Recurrent Neural 82.6% Source: The authors.
Networks)
55) Khattar et al. (2016) MVAE Regarding RQ2 (Which datasets are used?), the
(Multimodal investigation reveals that many datasets were being used to
82.4%
Variational develop fake news identification methods. Table 4 presents
Autoencoder) them with the respective amount of use in the publications.
56) Gangireddy et al.
GTUT 80.0%
(2020) Table 4. Datasets used in the analyzed studies of fake news detection.
Source: The authors.
Dataset Amount Frequency Kaggle 39 63.9%
Weibo 6 9.8%
FNC-1 3 4.9%
COVID-19 Fake News 2 3.3%
Twitter 2 3.3%
NewsFN 1 1.6%
Bengali Language 1 1.6%
btvlifestyle 1 1.6%
Slovak language 1 1.6%
Fake vs Satire 1 1.6%
fake news Amharic 1 1.6%
LUN 1 1.6%
Fakeddit 1 1.6%
Facebook 1 1.6%
Total 61 100.0%

The datasets are primarily related to a language (Ahuja


& Kumar, 2020), English being the most used and fostering
most algorithms for such language. Among those used,
Kaggle - a Google platform used by data scientists) - stands
out, which includes several datasets for studies of artificial
intelligence. Kaggle was the biggest dataset provider
(ISOT, Kaggle Fake News, LIAR, Kaggle, PolitiFact,
BuzzFeed, Kaggle Indonesia data) of the analyzed articles
(Table 4). After the Kaggle datasets, the most used was
Weibo (a Chinese microblog similar to Twitter), with six
occurrences; FNC-1, with three occurrences, COVID-19
Fake News and Twitter, both with two occurrences, and the
others with only one occurrence.
Additionally, we sought to understand: a) Is there any
relationship between algorithms results and the datasets
used? b) Is there any pattern in the results provided by the
algorithms that had Kaggle as a database?
All the 61 analyzed papers presented a different
combination of datasets and algorithms, except for surveys
by Goldani, Momtazi and Safabakhsh (2021) and Kaliyar et
al. (2020), who used the Kaggle dataset with the CNN
algorithm and obtained an accuracy of 99.8% and 99.1%,
respectively. Thus, it is not possible to infer whether there
is any relationship between the results of the algorithms and
the database used (a). This yield may be due to the
character of originality and fast evolution in the field since,
in principle, better algorithms or datasets are required for
better performance, and the domain of fake news
identification presents a vast and increasing range of
applications.
In view of the above, it was also not possible to verify
whether there is any pattern in the results provided by the
algorithms that had Kaggle as a database (b), given that it is
not possible to compare results further considering the
different experimental setups. The data that support these Table 5. Algorithm and dataset used for each of the analyzed studies.
perceptions are expressed in Table 5. Source: The authors.
Algorithm – Author Dataset
1) Stacking Method – Jiang et al. (2021) Kaggle
2) Bidirectional Recurrent Neural Network Kaggle
(BiRNN) – Jiang et al. (2020)
3) Convolutional Neural Network (CNN) – Kaggle
Goldani, Momtazi and Safabakhsh
(2021)
4) Gradient Boosting – Birunda and Devi Kaggle
(2021)
5) Long short-term memory (LSTM) – Facebook
Sahoo and Gupta (2021)
6) Robustly Optimized BERT Pretraining Kaggle
Approach (RoBERTa) – Goel et al.
(2021)
7) Bidirectional encoder representation of Kaggle
transformers (BERT) – Pardamean and
Pardede (2021)
8) Convolutional Neural Network (CNN) – Kaggle
Kaliyar et al. (2020a)
9) Naive Bayes – Albahr and Albahar (2020)
Kaggle

10) Convolutional Neural Network (CNN) – fake news


Gereme et al. (2021) Amharic

11) Hybrid CNN-RNN – Nasir, Khan and Kaggle


Varlamis (2021)
12) Fake news detection in social media
with a BERT-based (FakeBERT) – Kaggle
Kaliyar, Goswami and Narang (2021)
13) AWD-LSTM – Dadkhah et al. (2021) Kaggle

14) CNN com marginloss – Goldani,


Safabakhsh and Momtazi (2021) Kaggle

15)BiLSTM-CapsNet – Sridhar and Kaggle 41)BERT – Lin et al. (2020) FNC-1


Sanagavarapu (2021)
42)DeepFakE – Kaliyar et al. (2020b) Kaggle
FNC-1
16)CNN-LSTM – Umer et al. (2020)
43)Knowledge-driven Multimodal Graph
17)Gradient Boosting (GB) – Thakur et al. Kaggle Convolutional Networks (KMGCN) – Wang Weibo
(2020) et al. (2020)
Kaggle
18)CNN+RNN – Agarwal et al. (2020) 44)Bayesian inference algorithm – Najar et al.
Kaggle
COVID- (2019)
19)DistilBERT – Ayoub, Yang and Zhou (2021)19 Fake News
45)AERNN – Chen et al. (2018) Weibo
Weibo
20)IARNet – Yu et al. (2020)
21)Stance Extraction and Reasonic Network 46)SVM – Alanazi and Khan (2020) Kaggle
(SERN) – Xie et al. (2021)
47)Gaussian Naive Bayes – Mugdha et al. Bengali
22)Grey Wolf Optimization (GWO) – (2020) Language
Ozbay and Alatas (2019)
48)Gradient Boosting – Kaliyar et al.
Kaggle
23)SMHA-CNN – Fang et al. (2019) (2019)
Fakeddit
24)DeepNet – Kaliyar et al. (2020b) 49)LSTM HAN – Ajao et al. (2019) Kaggle

25)Random Forest (RF) – Faustini and 50)XGBoost – Lin et al. (2019) Kaggle
Kaggle
Covões (2020)
51)FND-Bidirectional LSTM concatenated FNC-1
26)Random Forest – Varshney and Kaggle – Qawasmeh et al. (2019)
Vishwakarma (2020)
Kaggle 52)Multi-domain Visual Neural Network Weibo
27)S-HAN – Ahuja and Kumar (2020) btvlifestyle (MVNN) – Qi et al. (2019)
28)LSTM – 53)CROWDSOURCING – Shabani and Sokhn Fake vs
Kaggle Satire
Ivancov, Sarnovsk and Maslej-kre (2018)
(2021)
Kaggle 54)LSTM+GRU (Recurrent Neural Networks) Kaggle
29)WELFake – Verma et al. (2021) – Barua et al. (2019)
Slovak language
30)SemSeq4FD – Wang et al. (2021) 55)MVAE (Multimodal Variational Weibo
Autoencoder) – Khattar et al. (2016)
31)EchoFakeD – Kumar, Anurag and Kaggle
56)GTUT – Gangireddy et al. (2020) Kaggle
Pratik (2021)
LUN
32)CARMN – Song et al. (2021) 57)K-Nearest – Neighbor Kesarwani et al. Kaggle
Kaggle (2020)
33)BiLSTM – Sharma, Garg and
Shrivastava (2021) 58)AA-HGNN (Adversarial Active Learning
Weibo based Graph Neural Network) – Ren et al. Kaggle
34)SVM-RNN-GRUs bidirecionais – (2020)
Albahar (2021) Kaggle
59)Random Forest – Jardaneh et al. (2019) Twitter
35)BiLSTM-RNN – Bahad, Saxena and COVID-
Kaggle 60)Algoritmos genéticos – Al-Ahmad et al.
Kamal (2020) 19 Fake
(2021)
News
36)GRU-LSTM-CNN – Torgheh et al.
(2021) Kaggle 61)SSLNews – Konkobo et al. (2020) Kaggle
37)Random Forest – Lakshmanarao, Swathi
and Kiran (2019) Twitter

38)MCNN-TFV – Li et al. (2020) Regarding RQ3 (What are the top recommendations for
FNC-1 future studies?), some perspectives are presented, such as
39)Bi-LSTM-GRU-dense – Aslam et al. (2021) using other languages for the dataset. The accuracy of
NewsFN identifying fake news for the authors does not only depend
40)CNN + BiLSTM – Kumar et al. (2020)
on the algorithm but also on the dataset language. Therefore,
Kaggle Jiang et al. (2021) and Ahuja and Kumar (2020) recommend
extending studies by applying research in datasets from other
Kaggle languages.
Another research suggestion is classifying fake news using a scoring model, such as a credibility rate. The
scoring model is justified by the difficulty in classifying information as only false or true (Agarwal et al., 2020),
given the inherent complexity of human language and other aspects, such as the area of news (e.g., economics or
politics).
The combination of models generating hybrid models is a recommendation highlighted by Jiang et al. (2020),
Pardameanm, and Pardede (2021), and Kaliyar, Goswamim and Narang (2021). Another recommendation was to use
algorithms based on deep learning in future research and understand how this technique can help identify fake news
(Bahad, Saxena & Kamal, 2020).
Despite much research being directed toward textual information, Song et al. (2021) and Varshney and Vishwakarma
(2020) recommended research on the exploitation of visual information in search of fake news. Goel et al. (2021)
highlight the relevance of further expanding research on fake news in other areas, given that many were restricted to the
identification of fake news in datasets exclusive to political news.
Fang et al. (2019) recommended a better understanding of how the classifier detects fake news. Thus, it allows
modifying or replacing features to avoid detection method that relies on very specific semantics of fake news, which could
be explored and generate misclassification (e.g., false positives, false negatives).
For Albahar (2021), the challenge is great, and researchers need to devote more attention to understanding the patterns
of news structures and what is considered false in the digital universe. For the researcher, fake digital news continues to
acquire new formats, making it difficult to distinguish fake news embedded in long news.
Finally, some limitations regarding the performed analysis threaten this study’s validity. Although the number of
publications retrieved from the scientific repositories is expressive, the method employed does not intend to be exhaustive.
Accuracy comparison between different algorithms and datasets provides a limited view of the matter. A more accurate
comparison between algorithms required a controlled environment and advanced (statistic) analysis (e.g., n-fold cross-
validation, paired t-tests). The classification of the algorithms is also variable according to different authors’ perspectives
and theoretical backgrounds, especially regarding mixed approaches, called hybrid methods.

Conclusion
This research intended to investigate the computational techniques and datasets used in fake news identification, analyzing
the accuracy reported in scientific literature. For this, three questions were investigated. Regarding the accuracy of the
main algorithms used to identify fake news (RQ1), the top three approaches are as follows: the Stacking Method, with
99.9% accuracy, Bidirectional Recurrent Neural Network (BiRNN), with 99.8%, and the Convolutional Neural Network
(CNN), also with 99.8%.
The most popular technique was CNN, being used in ten studies. The scientific evolution in the past years for fake
news identification is remarkable. An accuracy superior to 90% was reached only in 2019, with the 21 highest accuracies,
above 96.6%, dating between 2020 and 2021.
Regarding the datasets used for the identification of fake news (RQ2), Kaggle has a more significant predominance,
probably due to its popularity and contemplating several datasets on its platform for studies of artificial intelligence. After
Kaggle, Weibo (i.e., a Chinese microblog similar to Twitter), FNC-1, COVID-19 Fake News, and Twitter were found and
are presented in order considering the highest number of occurrences in the analyzed studies.
The top recommendations for future research in fake news identification (RQ3) are pointed out as follows:

• the use of other languages in the datasets;


• the classification through a scoring model;
• development based on hybrid models;
• the use of algorithms based on deep learning;
• the exploration of visual information;
• expansion of research in other areas beyond politics;
• the replacement of keywords with synonyms; • understanding the patterns of news structures.
It is emphasized that the accuracy of 90% is considered a relevant result in this complex process of identifying fake
news. Most of the research used datasets in controlled environments (e.g., Kaggle) or without information updated in real-
time (from social networks). Few studies were applied directly in social network environments (where there is greater
dissemination of disinformation).
The results show a compelling development in computational techniques to identify fake news. However, considering
the ongoing trend, more research is still demanded to tackle the increasing complexity of fake digital news on social
media. Thus, we suggest for future research the need to extend research beyond political news, an area that was the
primary motivator for the growth of research from 2017, and the use of hybrid methods for fake news classification.
References
Almeida, L.D., Fuzaro, V. Nieto, F., & Santana, A.L.M. (2021). Identificação de “Fake News” no contexto político
brasileiro: uma abordagem computacional. In: Proceedings Workshop sobre as Implicações da Computação na Sociedade
(WICS), Porto Alegre, Brasil, pp. 78-89. https:// doi.org/10.5753/wics.2021.15966
Abouzeid, A., Granmo, O.C., Webersik, C., & Goodwin, M. (2019). Causality-based Social Media Analysis for Normal
Users Credibility Assessment in a Political Crisis. Proceedings of 25th Conference of Open Innovations Association
(FRUCT), Helsinki, Finland, pp. 1-14. https://doi.org/10.23919/FRUCT48121.2019.8981500
Agarwal, A., Mittal, M., Pathak, A., & Goyal, L.M. (2020). Fake News Detection Using a Blend of Neural Networks: An
Application of Deep Learning. SN Computer Science, 1,3:1-
9.
Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M.O. (2020). Fake News Detection Using Machine Learning Ensemble
Methods. Hindawi, 1-11. https://doi.org/10.1155/2020/ 8885861
Ahmed, H., Traore, I., & Saad, S. (2017). Detection of Online Fake News Using NGram Analysis and Machine Learning
Techniques. Proceedings of International conference on intelligent, secure, and dependable systems in distributed and
cloud environments, Springer, Cham, pp. 127-138
Ahuja, N., & Kumar, S. (2020). S-HAN: Hierarchical Attention Networks with Stacked Gated Recurrent Unit for Fake
News Detection. Proceedings 8th International Conference on Reliability, Infocom Technologies and Optimization,
Noida, India, pp. 873-877.
Ajao, O., Bhowmik, D., & Zargari, S. (2019). Sentiment aware fake news detection on online social networks.
Proceedings ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Brighton, UK, pp. 2507-2511. https://doi.org/10.1109/ ICASSP.2019.8683170
Al-Ahmad, B., Al-Zoubi, A.M., Abu Khurma, R., & Aljarah, I. (2021). An evolutionary fake news detection method for
covid-19 pandemic information. Symmetry, 13,6:1-16.
Alanazi, S.S., & Khan, M.B. (2020). Arabic fake news detection in social media using readers’ comments: Text mining
techniques in action. International Journal of Computer Science and Network Security, 20,9:29-35.
Albahar, M. (2021). A hybrid model for fake news detection: Leveraging news content and user comments in fake news.
IET Information Security, 15,2:169-177.
Albahr, A., & Albahar, M. (2020). An empirical comparison of fake news detection using different machine learning
algorithms. International Journal of Advanced Computer Science and Applications, 11,9:146-152.
Ali, H., Khan, M. S., AlGhadhban, A., Alazmi, M., Alzamil, A., Al-Utaibi, K., & Qadir, J. (2021). All Your Fake Detector
Are Belong to Us: Evaluating Adversarial Robustness of Fake-news Detectors Under Black-Box Settings. IEEE Access,
9:81678-81692.
Asaad, B., & Erascu, M. (2018). A tool for fake news detection. Proceedings 2018 20th International Symposium on
Symbolic and Numeric Algorithms for Scientific Computing
(SYNASC), Timisoara, Romania, pp. 379-386. https://doi.org/10.1109/SYNASC.2018.00064
Aslam, N., Ullah Khan, I., Alotaibi, F.S., Aldaej, L.A., & Aldubaikil, A.K. (2021). Fake detect: A deep learning ensemble
model for fake news detection. Hindawi, 1-8. https://doi.org/10.1155/2021/5557784
Ayoub, J., Yang, X.J., & Zhou, F. (2021). Combat COVID-19 infodemic using explainable natural language processing
models. Information Processing & Management, 58,4:1-11. https://doi.org/10.1016/j.ipm.2021.102569
Bahad, P., Saxena, P., & Kamal, R. (2020). Fake News Detection using Bi-directional LSTM-Recurrent Neural Network.
Procedia Computer Science, 165:74-82.
Barua, R., Maity, R., Minj, D., Barua, T., & Layek, A.K. (2019). F-NAD: An application for fake news article detection
using machine learning techniques. Proceedings IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India,
pp. 1-6). https://doi.org/10.1109/IBSSC47189 .2019.8973059
Birunda, S.S., & Devi, R.K. (2021). A Novel Score-Based
Multi-Source Fake News Detection using Gradient Boosting Algorithm. Proceedings of International Conference on
Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, pp. 406–414. https://doi.org/10.1109/ICAIS50930
.2021.9395896
Burfoot, C., & Baldwin, T. (2009). Automatic satire detection: Are you having a laugh? Proceedings of the ACL-IJCNLP
2009 conference short papers, p. 161–164.
Chapra, S.C., & Canale, R.P. (2016). Métodos Numéricos para Engenharia-7ª Ediçao. McGraw Hill Brasil.
Chen, W., Yang, C., Cheng, G., Zhang, Y., Yeo, C.K., Lau, C.T., & Lee, B.S. (2018). Exploiting Behavioral Differences to
Detect Fake News. Proceedings 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication
Conference (UEMCON), New York, NY, USA, pp. 879-
884. https://doi.org/10.1109/UEMCON.2018.8796519
Collins, B., Hoang, D.T., Nguyen, N.T., & Hwang, D. (2021). Trends in combating fake news on social media–a survey.
Journal of Information and Telecommunication, 5,2:247-
266. https://doi.org/10.1080/24751839.2020.1847379
Dadkhah, S., Shoeleh, F., Yadollahi, M.M., Zhang, X., & Ghorbani, A.A. (2021). A real-time hostile activities analyses
and detection system. Applied Soft Computing, 104:1-
28. https://doi.org/10.1016/j.asoc.2021.107175
Dresch, A., Lacerda, D.P. & Antunes Júnior, J.A.V. (2015). Design science research: método de pesquisa para avanço da
ciência e tecnologia. Bookman Editora.
Fang, Y., Gao, J., Huang, C., Peng, H., & Wu, R. (2019). Self multi-head attention-based convolutional neural networks
for fake news detection. PloS one, 14,9:1-13.
https://doi.org/10.1371/journal.pone.0222713
Faustini, P.H.A., & Covões, T.F. (2020). Fake news detection in multiple platforms and languages. Expert Systems with
Applications, 158:1-9. https://doi.org/10.1016/j.eswa.2020.
113503
Gangireddy, S.C.R., Long, C., & Chakraborty, T. (2020). Unsupervised fake news detection: A graph-based approach.
Proceedings of the 31st ACM conference on hypertext and social media, pp. 75-83. https://doi.org/10.1145/
3372923.3404783
Gereme, F., Zhu, W., Ayall, T., & Alemu, D. (2021). Combating fake news in “low-resource” languages: Amharic fake
news detection accompanied by resource crafting. Information, 12,1:1-9. https://doi.org/10.3390/info12010020
Goel, P., Singhal, S., Aggarwal, S., & Jain, M. (2021). Multi Domain Fake News Analysis using Transfer Learning.
Proceedings 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, pp.
1230–1237. https://doi.org/10.1109/ICCMC51019.2021.94
18411
Goldani, M.H., Momtazi, S., & Safabakhsh, R. (2021). Detecting fake news with capsule neural networks. Applied Soft
Computing, 101:1-8. https://doi.org/10.1016/j.asoc.2020.
106991
Goldani, M.H., Safabakhsh, R., & Momtazi, S. (2021). Convolutional neural network with margin loss for fake news
detection. Information Processing & Management, 58,1:1-12. https://doi.org/10.1016/j.ipm.2020.102418
Islam, M.R., Liu, S., Wang, X., & Xu, G. (2020). Deep learning for misinformation detection on online social networks: a
survey and new perspectives. Social Network Analysis and
Mining, 10,1:1-20. https://doi.org/10.1007/s13278-020-
00696-x
Ivancová, K., Sarnovský, M., & Maslej-Krcšñáková, V. (2021). Fake news detection in Slovak language using deep
learning techniques. Proceedings of 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI),
Herl'any, Slovakia, pp. 255-260. https://doi.org/10.1109 /SAMI50585.2021.9378650
Jardaneh, G., Abdelhaq, H., Buzz, M., & Johnson, D. (2019). Classifying Arabic tweets based on credibility using content
and user features. Proceedings IEEE Jordan International Joint Conference on Electrical Engineering and Information
Technology (JEEIT), Amman, Jordan, pp. 596-601. https://doi.org/10.1109/JEEIT.2019.8717386
Jiang, T., Li, J.P., Haq, A.U., & Saboor, A. (2020). Fake News Detection using Deep Recurrent Neural Networks.
Proceedings of 17th International Computer Conference on Wavelet Active Media Technology and Information
Processing (IC-
CWAMTIP), Chengdu, China, pp. 205-208.
https://doi.org/10.1109/ICCWAMTIP51612.2020.9317325
Jiang, T.A.O., Li, J.P., Haq, A.U., Saboor, A. & Ali, A. (2021). A novel stacking approach for accurate detection of fake
news. IEEE Access, 9:22626-22639. https://doi.org/10.1109/ ACCESS.2021.3056079
Kaliyar, R.K., Goswami, A., & Narang, P. (2019). Multiclass Fake News Detection using Ensemble Machine Learning.
IEEE 9th International Conference on Advanced Computing (IACC). Tiruchirappalli, India, pp. 103-107.
https://doi.org/10.1109/IACC48062.2019.8971579
Kaliyar, R.K., Goswami, A., & Narang, P. (2021). FakeBERT: Fake news detection in social media with a BERT-based
deep learning approach. Multimedia tools and applications,
80,8:11765-11788. https://doi.org/10.1007/s11042-020-
10183-2
Kaliyar, R.K., Goswami, A., Narang, P., & Sinha, S. (2020a). FNDNet – A deep convolutional neural network for fake
news detection. Cognitive Systems Research, 61:32–44. https://doi.org/10.1016/j.cogsys.2019.12.005
Kaliyar, R.K., Kumar, P., Kumar, M., Narkhede, M., Namboodiri, S., & Mishra, S. (2020b). DeepNet: an efficient neural
network for fake news detection using news-user engagements.Proceedings of 5th International Conference on
Computing, Communication and Security (ICCCS), Patna, India, pp. 1-6).
https://doi.org/10.1109/ICCCS49678.2020.92773 53
Kesarwani, A., Chauhan, S.S., & Nair, A.R. (2020). Fake news detection on social media using k-nearest neighbor
classifier. Proceedings International Conference on Advances in Computing and Communication Engineering (ICACCE),
Las Vegas, NV, USA, pp. 1-4. https://doi.org/
10.1109/ICACCE49060.2020.9154997
Khattar, D., Goud, J.S., Gupta, M., & Varma, V. (2019). Mvae: Multimodal variational autoencoder for fake news
detection.
Proceedings The world wide web conference, pp. 2915-
2921). https://doi.org/10.1145/3308558.3313552
Konkobo, P.M., Zhang, R., Huang, S., Minoungou, T.T., Ouedraogo, J.A., & Li, L. (2020). A deep learning model for
early detection of fake news on social media. Proceedings 7th International Conference on Behavioural and Social
Computing (BESC), Bournemouth, United Kingdom, pp. 1-
6). https://doi.org/10.1109/BESC51023.2020.9348311
Kumar, S., Asthana, R., Upadhyay, S., Upreti, N., & Akbar, M. (2020). Fake news detection using deep learning models:
A novel approach. Transactions on Emerging Telecommunications Technologies, 31,2:1-23. https://doi.org/10.1002/
ett.3767
Kumar, R., Anurag, K., & Pratik, G. (2021). EchoFakeD: improving fake news detection in social media with an efficient
deep neural network. Neural Computing and Applications,
33,14:8597–8613. https://doi.org/10.1007/s00521-02005611-1
Lakshmanarao, A., Swathi, Y., & Kiran, T.S.R. (2019). An effecient fake news detection system using machine learning.
International Journal of Innovative Technology and Exploring Engineering, 8,10:3125-3129.
Li, Q., Hu, Q., Lu, Y., Yang, Y., & Cheng, J. (2020). Multilevel word features based on CNN for fake news detection in
cultural communication. Personal and Ubiquitous Computing, 24,2:259–272.
Lin, J., Tremblay-Taylor, G., Mou, G., You, D., & Lee, K. (2019). Detecting fake news articles. Proceedings IEEE
International Conference on Big Data (Big Data), Los Angeles, CA, USA, pp. 3021-3025,
https://doi.org/10.1109/BigData47090.2019.9005980
Lin, S.X., Wu, B.Y., Chou, T.H., Lin, Y.J., & Kao, H.Y. (2020). Bidirectional perspective with topic information for stance
detection. Proceedings 2020 International Conference on Pervasive Artificial Intelligence (ICPAI), Taipei,
Taiwan, pp. 1-8.
https://doi.org/10.1109/ICPAI51961.2020.00009
Low, J. F., Fung, B. C., Iqbal, F., & Huang, S. C. (2022). Distinguishing between fake news and satire with transformers.
Expert Systems with Applications, 187, 115824.
Medeiros, H.I., & Braga, R. B. (2020). Fake News detection in social media: a systematic review. Proceedings 16th
Simpósio Brasileiro de Sistemas de Informação (SBSI), Porto Alegre, Brasil, pp. 1-8.
https://doi.org/10.5753/sbsi.2020.13782 Mugdha, S.B.S., Ferdous, S.M., & Fahmin, A. (2020). Evaluating machine
learning algorithms for bengali fake news detection. Proceedings 23rd International Conference on Computer and
Information Technology (ICCIT), DHAKA,
Bangladesh, pp. 1-6. https://doi.org/10.1109/ICCIT51783.2020.9392662
Najar, F., Zamzami, N., & Bouguila, N. (2019). Fake news detection using bayesian inference. Proceedings 20th
International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA, pp. 389-
394. https://doi.org/10.1109/IRI.2019.00066
Nasir, J.A., Khan, O.S., & Varlamis, I. (2021). Fake news detection: A hybrid CNN-RNN based deep learning approach.
International Journal of Information Management Data Insights, 1,1:1-13. https://doi.org/10.1016/j.jjimei.2020.1000
07
Ozbay, F.A., & Alatas, B. (2019). A novel approach for detection of fake news on social media using metaheuristic
optimization algorithms. Elektronika ir Elektrotechnika, 25,4:62–67.
Pardamean, A., & Pardede, H.F. (2021). Tuned bidirectional encoder representations from transformers for fake news
detection. Indonesian Journal of Electrical Engineering and Computer Science, 22,3:1667-1671.
Qawasmeh, E., Tawalbeh, M., & Abdullah, M. (2019). Automatic identification of fake news using deep learning.
Proceedings 6th international conference on social networks analysis, Management and Security (SNAMS), Granada,
Spain, pp. 383-388. https://doi.org/10.1109/SNAMS.2019. 8931873
Qi, P., Cao, J., Yang, T., Guo, J., & Li, J. (2019). Exploiting multi-domain visual information for fake news detection.
Proceedings IEEE international conference on data mining
(ICDM), Beijing, China, pp. 518-527.
https://doi.org/10.1109/ICDM.2019.00062
Ren, Y., Wang, B., Zhang, J., & Chang, Y. (2020). Adversarial active learning based heterogeneous graph neural network
for fake news detection. Proceedings IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, pp. 452-
461. https://doi.org/10.1109/ICDM50108.2020.00054
Ruchansky, N., Seo, S., & Liu, Y. (2017). CSI: A hybrid deep model for fake news detection. Proceedings of the 2017
ACM on Conference on Information and Knowledge Management, pp. 797-806.
Sahoo, S.R., & Gupta, B.B. (2021). Multiple features based approach for automatic fake news detection on social
networks using deep learning. Applied Soft Computing, 100:1-16. https://doi.org/10.1016/j.asoc.2020.106983
Shabani, S., & Sokhn, M. (2018). Hybrid machine-crowd approach for fake news detection. Proceedings of 4th
International Conference on Collaboration and Internet Computing
(CIC), Philadelphia, PA, USA, pp. 299-306.
https://doi.org/10.1109/CIC.2018.00048
Sharma, D.K., Garg, S., & Shrivastava, P. (2021). Evaluation of tools and extension for fake news detection. Proceedings
of International Conference on Innovative Practices in Technology and Management (ICIPTM), Noida, India, pp. 227-
232. https://doi.org/10.1109/ICIPTM52218.2021.9388356
Song, C., Ning, N., Zhang, Y., & Wu, B. (2021). A multimodal fake news detection model based on crossmodal attention
residual and multichannel convolutional neural networks. Information Processing and Management, 58,1:1-14.
Sridhar, S., & Sanagavarapu, S. (2021). Fake news detection and analysis using multitask learning with BiLSTM CapsNet
model. 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, pp.
905-911. https://doi.org/10.1109/Confluence51648.2021.93 77080
Thakur, A., Shinde, S., Patil, T., Gaud, B., & Babanne, V. (2020). MYTHYA: Fake News Detector, Real Time News
Extractor and Classifier. Proceedings of 4th International Conference on Trends in Electronics and Informatics (ICOEI),
Tirunelveli, India, pp. 982-987.
https://doi.org/10.1109/ICOEI48184.2020.9142971
Torgheh, F., Keyvanpour, M.R., Masoumi, B., & Shojaedini, S.V. (2021). A Novel Method for Detecting Fake news: Deep
Learning Based on Propagation Path Concept. Proceedings of 26th International Computer Conference, Computer
Society of Iran (CSICC), Tehran, Iran, pp. 1-5.
https://doi.org/10.1109/CSICC52343.2021.9420601
Umer, M., Imtiaz, Z., Ullah, S., Mehmood, A., Choi, G.S., & On, B.W. (2020). Fake news stance detection using deep
learning architecture (CNN-LSTM). IEEE Access, 8: 156695-156706.
Varshney, D., & Vishwakarma, D.K. (2021). Hoax news-inspector: a real-time prediction of fake news using content
resemblance over web search results for authenticating the credibility of news articles. Journal of Ambient Intelligence
and Humanized Computing, 12,9:8961-8974.
https://doi.org/10.1007/s12652-020-02698-1
Verma, P.K., Agrawal, P., Amorim, I., & Prodan, R. (2021). WELFake: Word Embedding Over Linguistic Features for
Fake News Detection. IEEE Transactions on Computational Social Systems, 8,4:881-893.
Wang, Y., Qian, S., Hu, J., Fang, Q., & Xu, C. (2020). Fake news detection via knowledge-driven multimodal graph
convolutional networks. Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 540-547.
https://doi.org/10.1145/3372278.3390713
Wang, Y., Wang, L., Yang, Y., & Lian, T. (2021). SemSeq4FD: Integrating global semantic relationship and local
sequential order to enhance text representation for fake news detection. Expert Systems with Applications, 166:1-12.
https://doi.org/10.1016/j.eswa.2020.114090
Xie, J., Liu, S., Liu, R., Zhang, Y., & Zhu, Y. (2021). SeRN: Stance extraction and reasoning network for fake news
detection. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto,
ON, Canada, pp. 2520-2524. https://doi.org/
10.1109/ICASSP39728.2021.9414787
Yu, J., Huang, Q., Zhou, X., & Sha, Y. (2020). Iarnet: An information aggregating and reasoning network over
heterogeneous graph for fake news detection. Proceedings of 2020 International Joint Conference on Neural Networks
(IJCNN), Glasgow, UK, pp. 1-9. https://doi.org/10.1109/ IJCNN48605.2020.9207406
Zhang, H., Alim, M.A., Li, X., Thai, M.T., & Nguyen, H.T. (2016). Misinformation in online social networks: Detect them
all with a limited budget. ACM Transactions on Information Systems, 34,3:1-24. https://doi.org/10.1145/2885494 Zhang,
Q., Lipani, A., Liang, S., & Yilmaz, E. (2019). Replyaided detection of misinformation via bayesian deep learning.
Proceedings of 19 the world wide web conference, pp. 2333-2343. https://doi.org/10.1145/3308558.3313718
RESEARCH PAPER -5

Detection of Fake News Using Machine Learning


and Natural Language Processing Algorithms
Noshin Nirvana Prachi, Md. Habibullah, Md. Emanul Haque Rafi, Evan Alam, and
Riasat Khan
Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
Email: {noshin.nirvana, md.habibullah, emanul.haque, evan.alam, riasat.khan}@northsouth.edu
Abstract—The amount of information shared on the internet, primarily via web-based networking media,
is regularly increasing. Because of the easy availability and exponential expansion of data through social
media networks, distinguishing between fake and real information is not straightforward. Most
smartphone users tend to read news on social media rather than on the internet. The information
published on news websites often needs to be authenticated. The simple spread of information and news by
instant sharing has included the exponential growth of its misrepresentation. So, fake news has been a
major issue ever since the growth and expansion of the internet for the general mass. This paper employs
several machine learning, deep learning and natural language processing techniques for detecting false
news, such as logistic regression, decision tree, naive bayes, support vector machine, long short-term
memory, and bidirectional encoder representation from transformers. Initially, the machine learning and
deep learning approaches are trained using an open-source fake news detection dataset to determine if the
information is authentic or counterfeit. In this work, the corresponding feature vectors are generated from
various feature engineering methods such as regex, tokenization, stop words, lemmatization and term
frequency-inverse document frequency. All the machine learning and natural language processing models’
performance were evaluated in terms of accuracy, precision, recall, F-1 score, ROC curve, etc. For the
machine learning models, logistic regression, decision tree, naive bayes, and SVM achieved classification
accuracies of 73.75%, 89.66%, 74.19%, and 76.65%, respectively. Finally, the LSTM attained 95%
accuracy, and the NLP-based BERT technique obtained the highest accuracy of 98%.

Index Terms—bidirectional encoder representation from transformers, fake news detection,


lemmatization, long shortterm memory, naive Bayes, support vector machine, tokenization

I. INTRODUCTION
Information is significant for human dynamics and affects life practices. In earlier days, the daily news
or information was presented through print media, newspapers, and electronic media such as television
and radio. The data from these publishing technologies are more credible as it is either self-screened or
constrained by specialists [1]. These days, individuals are presented with

Manuscript received April 4, 2022; revised June 13,


2022; accepted July 4, 2022. doi: 10.12720/jait.13.6.652-
661

an extreme amount of data through various sources, particularly with the prominence of the internet
and webbased media stages. The ease of internet access has caused the hazardous development of a
wide range of falsehoods like malicious discussion, double-dealing, fabrications, fake news, spam
assessment, which diffuses quickly and widely in the human culture. The misinformation of online
social media has become a global problem in public trust and society as it has become an essential
mode of communication and networking nowadays.
Nowadays, online social platforms and blogs contain a significant amount of fake and fabricated
news, negatively affecting society [2]. This news is embellished with dubious facts and misleading
information, causing interpersonal anxiety and detrimental social panic. This unreliable information
destroys people's trust and adversely influences the economy and major political processes, such as
the stock market, elections, etc. The proliferation of fake and fabricated news is generally detected
manually by human verification. This manual fact-checking process is subjective in nature, laborious,
time-consuming, and inefficient. In recent years, automatic systems based on machine learning and
natural language processing algorithms have been utilized to tackle the issue of fake news detection
[3], [4]. With the advancement of technology and artificial intelligence, these automatic systems
efficiently restrain misleading and false news propagation. Thus, these techniques have created deep
interest among researchers in detecting fake news for a better future endeavor.
This paper has designed a fake news detection and classification system using different types of
machine learning techniques. The open-source fake news datasets of the proposed artificial news
detection system contain the information of various articles' authors, captions, and main descriptions.
Initially, the dataset is preprocessed using conventional techniques, e.g., regex, tokenization, stop
words, lemmatization, and then applied NLP techniques, count vectorizer, TF-IDF vectorizer. The
major contributions of this work are as follows:
• In this paper, an automatic fake news detection system has been developed using various machine
learning and natural language processing algorithms. This work uses logistic regression, decision
tree, naive bayes, and SVM machine learning techniques.
• Additionally, Long Short-Term Memory (LSTM), deep learning model and natural language
processing algorithm, Bidirectional Encoder Representations from Transformers (BERT) are also
implemented.
• Next, the efficiency of all the machine learning and natural language processing models are
compared in terms of classification accuracy, precision, recall, F-1 score, and ROC curve.
• Finally, the performance of the proposed fake news detection system is compared with previous
relevant works in terms of classification accuracy. The nobility of this work is to utilize the
BERTbased NLP model for detecting fake news.
The other part of the paper is constructed as follows. In Section III, the proposed system has been
discussed with appropriate equations. The actual results of the research have been shown in Section IV.
Lastly, Section V concludes the paper with some directions for the future improvement of this work.

II. RELATED WORKS


Some of the recent works implemented to detect fake and fabricated news have been discussed in the
following section. Machine learning and deep learning-based neural network models execute the
identification and classification of real and fake news. For instance, in [5], the authors worked on fake
news detection with the help of a mixed deep learning technique, CNN-LSTM. This paper has used the
Fake News Challenge (FNC) dataset, which was created in 2017. They matched the claim with the
news article body whether the claim matches with the article body or not. The authors have developed
four data models. First, they use data without preprocessing, second with preprocessing. The authors
obtained different results when they preprocessed data and when they did not. The third and fourth
models are built on dimensionality reduction techniques by using PCA and Chi-square approaches.
Finally, they trained on forty-nine thousand and nine hundred seventy-two samples and tested on
twenty-five thousand and four hundred thirteen headlines and articles by CNN-LSTM. On their model
with no knowledge of cleanup or preprocessing, the achieved accuracy was 78%. When preprocessing
was done, the accuracy increased up to 93%. Next, the application of Chi-square raises the accuracy by
95%. Lastly, they conclude that using PCA with CNN and LSTM design resulted in the highest
accuracy of 96%, significantly reducing the prediction time. In [6], T. Jiang et al. used baseline fake
news identification techniques to locate the baseline methods' flaws and provide a viable alternative.
First, the authors performed five completely different conventional machine learning models and three
deep learning models to compare their efficiency. The authors used two datasets (ISOT and KDnugget)
of various sizes to test the corresponding models' performance in this work. Finally, they take
advantage of an adaptation of modified McNemar's check to decide if they are square measure contrasts
between these two models' presentation, then determine the simplest model for detecting the fake news.
The authors obtained accuracies of approximately 99.94% and 96.05% on the ISOT dataset and
KDnugget dataset, respectively.
In a recent work [7], the authors designed a system for detecting fake news using various machine
learning techniques. First, each tweet/post has been categorized as a binary categorization result by the
authors. They collected data manually from their own research sets by using Twitter API and the
DMOZ directory. The authors ran a test of their proposed system on the Twitter dataset. The results
show that fifteen percent of fabricated tweets and forty-five percent of the actual tweets were
adequately classified, and the remainder of the posts were not decided. In this paper, the author
proposed the detection of deception using the labeled benchmark dataset “LIAR”. They have also
improved efficiency in the detection of fake posts/news with evidence. The authors have introduced the
need for hoax detection in their system. They used the ML approach by combining news content and
social content. Finally, the authors claim their proposed system's performance is good compared to
other works described in the literature. In [8], A. Jain et al. design an automated system that detects the
news as false or true. Sometimes, social media like Facebook, YouTube, Twitter, and other online
platforms spread the news, creating anxiety and unrest in society. In this paper, the author applied
several machine learningbased fake news detection systems. The author utilized naive bayes classifier,
SVM algorithm and logistic regression in their proposed detection system. They implement their model
and classify the authentic and fabricated news. Finally, the proposed SVM model achieved an accuracy
of 93.5%. Machine learning ensemble techniques have been used in [9] to detect and classify fake news
automatically. In this regard, the textual features have been applied in different machine learning
approaches. This paper used ISOT and two open-source datasets to build the proposed system. In the
data preprocessing step, documents containing less than 20 words are filtered out. Next, the dialectal
mechanism LIWC is employed to convert the textual features into numerical values. Next, various
machine learning algorithms, logistic regression, SVM, KNN, random forest and boosting classifies
have been used. Finally, the decision tree approach with 10-fold cross-validation obtained the highest
accuracy of 94%.

III. METHODOLOGY
In this section, we have discussed the methodology of our work in great detail. We have explained all
the regular machine learning, neural network and NLP methods that we have used in our dataset.

A. Dataset
In this work, an open-source fake news dataset from Kaggle [10] has been used. The public dataset has
been created by web scrapping of different search engines. Lots of fake news and agenda always take
place around us, so the whole data was curated with the help of automated data science technologies. It
was posted on the data science community as a challenge to use those data to implement efficient fake
news detection architecture. This specific database of fake news has been utilized in this work because
it involves a diverse dataset from a wide variety of news portals and social sites. The dataset comprises
26,000 unique sample documents and has been used successfully in some papers to identify fake news
[11], [12]. The original dataset has four columns, viz. id, title, author, text. The id column represents a
particular numerical label for a news article; the title holds the heading of a news article; the author
column contains the information about the writer of the news item; and finally, under the text column,
the text of the report has been described. The training dataset has the label column, which marks the
news item as potentially unreliable or reliable. It is worth mentioning that, the dataset has 20,822
unique values in the text column.

B. Data Preprocessing
We need to transform the text data using preprocessing techniques, NLP, tokenization, and
lemmatization before feeding them through the ML and DL models [13]. Data preprocessing helps to
remove the noises and inconsistency of data, which increases the performance and efficiency of the
model. In this work, we have used traditional techniques, regex, tokenization, stopwords,
lemmatization, NLP technique, and TF-IDF for data preprocessing. The implemented data
preprocessing techniques are explained briefly in the subsequent paragraphs. 1) Regex
We use regex to remove punctuations from the text data. Often in the sentences, there may have extra
punctuations like exclamatory signs. We use regex to remove those additional punctuations to make the
dataset noise-free.
Regex is based on context-free grammar.
2) Tokenization
Tokenization, preprocessing tool is used to break the sentences into words [14].
3) Stopwords
We use the English stopwords library in our preprocessing technique because our model data is English.
We need to use the stopwords preprocessing technique to remove noises, make the model faster and
more efficient, and save memory space.
4) Lemmatization
Lemmatization is used to transform the words into root words. We can resolve data ambiguity and
inflection with lemmatization.
5) NLP technique
NLP techniques have been applied to convert the texts into meaningful numbers to feed these numbers
into our proposed machine learning algorithm.
6) Bag of words
The bag of words technique converts texts into machineunderstandable numbers, which is expressed as:
𝑇𝐹 − 𝐼𝐷𝐹 = 𝑇𝐹𝑡𝑑. 𝐼𝐷𝐹𝑡 (1) where 𝑡 is a term, and 𝑑 denotes the

a document. Consequently, term frequency 𝑇𝐹 is measured as: 𝑞𝑡𝑑


documents. TF stands for term frequency, which is a measurement of how frequently a term appears in

𝑇𝐹 = (2)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑟𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡
where 𝑞 is the number of times the term, 𝑡 appears in the document, 𝑑 .𝐼𝐷𝐹 denotes inverse
document frequency, which indicates the importance of a particular term. IDF is calculated as:
log(1+𝑛)
𝐼𝐷𝐹 = +1 (3)
(1+𝑑𝑓)𝑑𝑡

where 𝑛 means the number of documents and the denominator indicates the document frequency of the
term,
𝑡.
C. Machine Learning Algorithms
To detect and classify real and fake news, we have used different machine learning algorithms: logistic
regression, naive bayes, decision tree, and support vector machine.
1) Logistic regression
Logistic regression is a statistical ML classification model [15]. The basis of the proposed system
consists of the binary classification problem. Logistic regression is manipulated to model the
probability of a certain existing event, such as true/false, reliable/unreliable, win/lose, etc. Hence, the
logistic model is one of the most appropriate models for the fake news detection system. The condition
for predicting logistic model is:
0 ≤ ℎ𝜃(𝑥) ≤ 1 (4)
The logistic regression sigmoid function is expressed as:
ℎ𝜃(𝑥) = 𝑔(𝜃𝑇𝑋) (5)
where,

𝑔(𝑧) = (
1
1+ 𝑥−𝑧) (6)

and the cost function of logistic regression is:


𝐽 𝑐𝑜𝑠𝑡(ℎ𝜃(𝑥𝑖 , 𝑦𝑖)) (7)
2) Naive Bayes
The naïve Bayes method is at the basis of Bayesian classifiers. It is a strategy for looking at possible

probability that incident X will happen provided information Y. The typical notation for this is 𝑃(𝑋|
outcomes that allow flipping the state around straightforwardly [16]. A conditional probability is a

𝑌). We can use the naive bayes rule to compute this probability when we only have the probability of
the opposite result and the two components separately.

𝑃(𝑋|𝑌) = 𝑃(𝑋) 𝑃(𝑌|𝑋)


(8) 𝑃(𝑌|𝑋) 𝑃(𝑌)
This restatement can be extremely useful when we are trying to predict the likelihood of something
based on examples of it is happening.
In this research, we are attempting to determine if an article is false or genuine based on its contents.
We may rephrase it in terms of the likelihood of that document being real or fake if it has been
predetermined to be real or fake. This condition is useful since we already have instances of real and
fake articles in our data collection.
Generally, a large assumption is considered for computing the likelihood of the article happening; it is
equal to the product of the probabilities of each word inside its occurrence, making this procedure a
“naive” Bayesian one [17]. This assumption suggests that there is no connection between the two
words. It is also known as the assumption of independence. We can estimate the likelihood of a term
occurring by looking at a set of real and fake article samples and noting how many times it appears in
each class. The necessity for training the pre-classified samples distinguishes this method from the
typical supervised learning.
3) Decision tree
The conventional J48 method is one of the most widely used classification algorithms [18]. It is based
on the C4.5 algorithm, which requires all data to be studied quantitatively and categorically. As a result,
continuous data will not be investigated [19]. J48 technique employs two distinct pruning techniques.

Algorithm 1. Algorithm of the proposed decision tree classification model


Input: Predefined classes with 17,000 number of features.
Output: decision tree construction.
Begin
Step 1: Make the tree’s root node. Step 2:
Return leaf node ‘positive’ if all instances are positive. Return leaf node ‘negative’ if all instances
are negative.
Step 3: Determine the current state’s entropy 𝐻. (S)
Step 4: Calculate the entropy for each characteristic.
Step 5: Choose the attribute with the highest IG value
(𝑆, 𝑥)
Step 6: From the list of attributes, remove the attribute with the greatest IG.
Step 7: Continue until all characteristics have been exhausted or the decision tree has all leaf nodes.
End

Algorithm 1 briefly explains the building steps of the decision tree classification technique. The first
approach is subtree replacement, which refers to replacing nodes in a decision tree's leaves to reduce
the number of tests in the convinced route. In most cases, subtree raising has a minor influence on
decision tree models. Usually, there is no accurate method to forecast an option’s usefulness. However,
turning it off may be advisable if the induction operation takes longer than expected because the
subtree's raising is computationally complex. Next, the current state's entropy and its corresponding
characteristics are determined. Consequently, the attribute with the maximum information gain is
computed and removed. This process is continued until all features have been exhausted or the decision
tree has all leaf nodes.
4) Support Vector Machine (SVM)
SVM, which is also known as support vector machine network, is a supervised learning method [20].
SVMs are trained using particular data that has previously been divided into two groups [21]. As a
result, once the model has been trained, it is created. Moreover, the goal of the support vector machine
technique is to decide any new information belongs to which group and to increase the class label [22].
The final goal of the SVM is to locate a subspace that divides the data into two parts. As Radial Basis

kernel for this proposed system. On two samples 𝑥 and 𝑥′, the radial basis function is expressed as:
Function (RBF) is suitable for large systems like a collection of media articles, it was chosen as the

‖𝑥−𝑥 ‖2

𝐾(𝑥, 𝑥′) = 𝑒−

2
2𝜎 (9)

where ‖𝑥 − 𝑥′‖2 is a free parameter that denotes the squared Euclidean distance.

D. Deep Learning and Natural Language Processing Algorithms


In this work, we have used a deep learning technique, LSTM, and an NLP algorithm, BERT, to classify
fake news, and both of them are dynamic.
1) Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM) is an exclusive type of recurrent neural network, which allows
information to endure. Typical RNN networks encounter short-term memory, which is solved by the
cell states of the LSTM. Separate hidden motors are used in LSTMs, and their nature is to recall inputs
for a long time [23]. A memory cell, also known as a gated leaky neuron or an accumulator, has a
relationship in the following stages with its weight of 1. It mimics its genuine position and inserts an
external signal, but this signal is multiplied by another unit that determines when to wipe or keep
information from memory. Finally, the sigmoid layer-based forget gates control the transfer of data to
the following hidden networks. Fig. 1 shows a generic LSTM based neural network architecture.
Figure 1. Generic LSTM architecture.

For our classification, we used an LSTM model with an input layer that takes the input titles and article
body and an embedding layer that turns every word into a 300-pixel vector. As there are 256 features,
this layer will produce a 256×300 matrix. The weights we obtain from matrix multiplication will be in
the output matrix, which will generate a vector for every word. These vectors are input through an
LSTM, which is subsequently transferred to a fully linked dense layer, resulting in a single final output.
Table I shows the model layers and parameters, which were trained on batches of size 256.
TABLE I. LAYERS AND PARAMETERS OF THE PROPOSED LSTM
MODEL
Layer Output Shape Number of
Parameters
Input (None, 256) 0
Embedding (None, 256, 300) 60,974,100
Spatial Dropout (None, 256, 300) 0
Bidirectional (None, 256) 439,296
Dense (None, 64) 16,448
Dropout (None, 64) 0
Total parameters: 61,429,909
Trainable parameters: 61,429,909
Non-trainable parameters: 0

2) Bidirectional Encoder Representation from Transformers (BERT)


BERT is described to be pre-trained bidirectional representations from an unlabeled text by
conditioning both right and left backgrounds in all levels [24]. As a result, the BERT model might
suffice with only one additional output layer to produce advanced models for various tasks, including
query answers. BERT is composed of two components, encoder and decoder. In this pre-training phase,
this model learns about the language and its corresponding contexts. As this technique learns contexts
from both directions simultaneously, the contexts of words are better learned.
For tokenizing sentences into words, converting token strings to ids and back, and encoding/decoding,
BertTokenizer from the pretrained ‘bert-base-uncased’ model, was utilized in this study. The max
sentence length is 60 characters, and we utilized the encode plus technique to encode each one of them.
This technique will tokenize the phrase, prep the [CLS] (classification) token at the beginning, and
append the [SEP], which tells BERT where to start the next phrase. In most cases, it is inserted after
each phrase’s token. Tokens should be mapped to their ids; the phrase should be padded to the attention
masks, and the maximum length for [PAD] (padding) tokens should be created. The BERT model uses
the argument of attention mask, which specifies which tokens should be dealt with and which can be
ignored. Finally, in this step, the model is notified whether tokens include valid data or not. The
architecture of the BERT model employed in this proposed detection system has been depicted in Fig.
2.

Figure 2. Architecture of BERT technique.


Table II shows the layers and parameters of the proposed BERT model input ids and attention masks
used as the input layer. After that, the output of the input layers goes to the transformer BERT model,
which is subsequently transferred to a fully linked dense layer, resulting in a single final output.
TABLE II. LAYERS AND PARAMETERS OF THE PROPOSED BERTBASED NLP MODEL
Layer Number of Parameters Connected
to
Input 0
Attention 0
masks
Input [0][0]
TF BERT Attention
109,482,240
model masks [0]
[0]
Dense 24,608 TF BERT
model
[0][1]
Dropout 0 Dense [0]
[0]
Total parameters: 109,506,881
Trainable parameters: 109,506,881
Non-trainable parameters: 0
IV. RESULT AND ANALYSIS
This section discusses the numerical results of the proposed fake news detection system with the
applied regular ML, DL, and NLP approaches. The employed fake news dataset has been divided into
8:2 training and testing samples. After completion of the necessary processing and training of the
dataset, all the models are assessed. In this work, all the models are evaluated in various ways by
checking their accuracy, confusion matrix, recall, precision, F1-score, ROC curve, and other metrics.

A. Performance of Logistic Regression Model


In Fig. 3, the confusion matrix for the logistic regression model of the proposed system has been shown.
The real news class has 862 right predictions and 170 wrong predictions from 1032 test samples of real
news. Therefore, the accuracy for real news prediction is 83.52%, and for the fake news class, it has
487 correct predictions but a significant number of the wrong predictions of 310 from 797 test samples.
So, the accuracy for fake news is 61% and finally, the overall accuracy is 74%.

Figure 3. Confusion matrix for logistic regression.

Figure 4. ROC curve for logistic regression.

According to Fig. 4, the area under the curve (AUC) score of the ROC curve of the proposed logistic
regression algorithm is 0.79. The rest of the performance metrics for the logistic regression model are
demonstrated in Table III. The proposed logistic regression model's precision, recall, and F1-score are
74%, 72%, and 73%, respectively.
TABLE III. LOGISTIC REGRESSION MODEL’S PERFORMANCE METRICS
Precision Recall F1-
score
0 (Not 0.74 0.84 0.78
Fake)
1 (Fake) 0.74 0.61 0.67
Accuracy 0.74
Weighted 0.74 0.74 0.73
Average
B. Performance of Naive Bayes Model
The confusion matrix for the naive bayes model of the proposed system has been shown in Fig. 5. The
authentic news class has 830 right predictions and 202 wrong predictions from the total 1032 test
samples. So, the accuracy for real news prediction is 80%, and for the fake news class, it has a
significant number of wrong classifications similar to the logistic regression model. Finally, the
accuracy for fake news is 66%, and the overall accuracy is 74%.

Figure 5. Confusion matrix for naive bayes.

The true and false positive rates of the proposed naive bayes approach are depicted in Fig. 6. According
to Fig. 6, the naive bayes model has an ROC AUC score of 0.79. In Table IV, the rest of the
performance metrics for the naive bayes model are demonstrated. The precision, recall, and F1-score of
the proposed naive bayes model are 74%, 73%, and 73%, respectively.

Figure 6. ROC curve for naive bayes.

TABLE IV. VARIOUS EVALUATION METRICS OF THE NAIVE BAYES MODEL


Precision Recall F1-
score
0 (Not 0.75 0.80 0.78
Fake)
1 (Fake) 0.72 0.66 0.69
Accuracy 0.74
Weighted 0.74 0.74 0.74
Average
C. Performance of Decision Tree Model
In Fig. 7, the confusion matrix for the decision tree model of the proposed system has been
demonstrated. The real news class has 940 right predictions and 92 wrong predictions from 1032 test
samples of real news. So, the accuracy for real news prediction is 91%, and for the fake news class, it
has 700 correct predictions but an acceptable number of the wrong predictions of 97 from 797 test
samples of fake news. So, the accuracy for fake news is 88%. Finally, the decision tree technique
achieved an overall accuracy of 90%.

Figure 7. Confusion matrix for decision tree.

According to Fig. 8, the ROC AUC value of the proposed decision tree algorithm is 0.89. In Table V,
the rest of the performance metrics for the decision tree model are demonstrated. The precision, recall,
and F1-score of the proposed decision tree model are 90%, 89%, and 89%, respectively.

Figure 8. ROC curve for decision tree.

TABLE V. DECISION TREE MODEL ACCURACY METRICS


Precision Recall F1-
score
0 (Not 0.91 0.91 0.91
Fake)
1 (Fake) 0.88 0.88 0.88
Accuracy 0.90
Weighted
0.90 0.90 0.90
Average
D. Performance of Support Vector Machine Model
In Fig. 9, the confusion matrix for the SVM model with the RBF kernel of the proposed system has
been shown. The accuracies of the real and fake news are 82% and 70%, respectively. Finally, the
overall accuracy of the SVM classifier model is 77%.

Figure 9. Confusion matrix for SVM.

Figure 10. ROC curve for SVM.

According to Fig. 10, the ROC AUC coefficient of the proposed SVM algorithm is 0.83. Table VI
depicts the rest of the performance metrics for the SVM model.
TABLE VI. SVM MODEL ACCURACY METRICS
Precision Recall F1-score
0 (Not Fake) 0.78 0.82 0.80
1 (Fake) 0.75 0.70 0.72
Accuracy 0.77
Weighted 0.77 0.77 0.77
Average
E. Performance of LSTM Model
Fig. 11 illustrates the confusion matrix for the deep learning-based LSTM model of the proposed
system. The real news class has 1920 right predictions and 157 wrong predictions. So, the accuracy for
real news prediction is 92%, and for the fake news class, the prediction is significantly improved
compared to other ML techniques. Finally, the overall accuracy of the LSTM technique is 95%. The
total number of test samples for each class is different from the ML approaches because of the better
preprocessing for NLP methods which helps to decrease the chances of removing samples.

Figure 11. Confusion matrix for LSTM.

According to Table VII, other performance metrics for the LSTM model demonstrated better results.
The precision, recall, and F1-score of the proposed LSTM model are 94%, 95%, and 94%, respectively.
TABLE VII. PERFORMANCE METRICS OF THE LSTM APPROACH
Precision Recall F1-score
0 (Not Fake) 0.98 0.92 0.95
1 (Fake) 0.91 0.97 0.94
Accuracy 0.95
Weighted 0.95 0.95 0.95
Average

Figure 12. Accuracy and loss vs. epochs graph of LSTM.


Fig. 12 shows the accuracy and loss vs. epochs graphs of LSTM with respect to epoch. For the LSTM
model, initially, the model’s validation accuracy was 95%, which did not vary significantly with the
change of the epochs.
F. Performance of BERT Model
In this section, the results for the proposed fake news detection system implemented on the BERT
technique will be discussed. Table VIII shows the encoder and decoder result on an example sentence.
The purpose of this result is to show how all of the input sentences are encoded and decoded. Here the
input for the encoding is “Hi nice meet you!”. After encoding, we can see that all the words and
symbols represent a value, i.e., “hi” is assigned a numerical value of 7632. If we decode it, we will get
the exact output given to the encoder as an input. There are two new words after decoding. One is at the
beginning of the sentence, which is CLS, which represents classification. Another one is SEP at the end
of the sentence, which tells BERT where to start the following sentence.
TABLE VIII. ENCODER AND DECODER EXAMPLE RESULTS
Input encode = bert_tokenizer.encode (“Hi nice
meet you !”) decode =
bert_tokenizer.decode (encode)
Comman print (“Encode: X”,
d encode) print (“Decode:
X”, decode)
Output Encode: [101, 7632, 3835, 3113, 2017,
999, 102] Decode: [CLS] hi nice
meet you! [SEP]

Figure 13. Accuracy and loss vs. epochs graphs of BERT framework.

Fig. 13 shows the accuracy and loss graph of BERT with respect to epoch. For the BERT model, at the
initial stages of training, the model's validation starts from 97%, which did not change remarkably, and
after three epochs, it increased only by 1% and achieved 98%.

G. Model Comparison of Our Paper


In Table IX, comparison for all the applied detection models have been demonstrated that we have
trained in this work. For the machine learning-based techniques, the fake news detection performs well
for the decision tree classifier, but the naive bayes and logistic regression approaches perform
unsatisfactorily. The highest accuracy from machine learning models is 90% for the decision tree
approach. The deep learning LSTM approach achieved the second-highest accuracy of 95%. Finally, the
best detection performance is offered by the NLP-based BERT technique, with 98% accuracy.
TABLE IX. ACCURACY COMPARISON OF DIFFERENT APPLIED TECHNIQUES
Models Precision Recall F1- Accuracy
Score
Logistic 74% 72% 73% 74%
Regression
Naive 74% 73% 73% 74%
Bayes
Decision 90% 89% 89% 90%
Tree
SVM 76% 76% 76% 77%
LSTM 94% 95% 94% 95%
BERT 98%
H. Model Comparison with Others Work
Finally, the proposed fake news detection system with the BERT technique has been compared with
other related works. According to Table X, the implemented BERT approach outperformed all the other
works in terms of accuracy.
TABLE X. ROPOSED MODEL’S ACCURACY COMPARISON WITH
RELATED WORKS
Referenc Applied Accuracy
e Method
[3] Random 95%
forest
[4] Decision tree 96.8%
[5] CNN+LSTM 96%
with PCA
[8] SVM 93.5%
[9] Decision tree 94%
[25] Deep neural 94%
network
Our study BERT 98%

Finding the accuracy and credibility of information and news that is available on the internet is critical
nowadays. It has recently been discovered that various online platforms significantly influence
disseminating misleading information and spreading fake news to serve several dreadful purposes and
benefit many people. Because of the plethora of spreading and sharing data on the internet, there is a
growing demand for automated false news identification systems that are accurate and efficient. This
paper proposes an automatic fake news detection system that utilizes various regular machine learning,
deep learning, and natural language processing techniques. Various feature extraction methods, such as
regex, tokenization, stopwords, lemmatization, NLP, TF-IDF, were used to preprocess the data in this
suggested system. Next, several models, logistic regression, decision tree, naive bayes, support vector
machine, long short-term memory, bidirectional encoder representation from transformers have been
employed to classify the fabricated news. For the machine learning model logistic regression, decision
tree, naive bayes, and SVM, we got 73.75%, 89.66%, 74.19%, and 76.65% accuracies, respectively.
Finally, substantial better performance was achieved by the neural network LSTM and NLP-based
BERT techniques. In the future, the proposed system can be extended to detect more specific false news
with various categories, e.g., religious, political, COVID-19, etc. The word2vec approach can be
applied to deal with and classify images and video-related visual datasets. News data from diverse
languages can be utilized to identify false news from different nations and countries. A future extension
of this work can be to employ attention-based deep learning approaches.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS
M. E. H. Rafi proposed the research idea; N. N. Prachi and MH conducted the research; E. Alam and R.
Khan analyzed the data; N. N. Prachi, M. Habibullah and M. E. H. Rafi wrote the paper; R. Khan
helped to draft the final manuscript; all authors had approved the final version.

REFERENCES
[1] J. Strömbäck, Y. Tsfati, H. Boomgaarden, et al., “News media trust and its impact on media use: Toward a framework
for future research,” Annals of the International Communication Association, vol. 44, pp. 139-156, 2020.
[2] E. Mitchelstein and P. J. Boczkowski, “Online news consumption research: An assessment of past work and an agenda
for the future,” New Media & Society, vol. 12, pp. 1085-1102, 2010.
[3] P. Henrique, A. Faustini and T. F. Covões, “Fake news detection in multiple platforms and languages,” Expert Systems
with
Applications, vol. 158, pp. 1-9, 2020.
[4] F. A. Ozbay and B. Alatas, “Fake news detection within online social media using supervised artificial intelligence
algorithms,” Physica A: Statistical Mechanics and its Applications, vol. 540, pp. 1-19, 2020.
[5] M. Umer, “Fake news stance detection using deep learning architecture (CNN-LSTM),” IEEE Access, vol. 8, pp.
156695156706, 2020.
[6] T. Jiang, J. P. Li, A. U. Haq, et al., “A novel stacking approach for accurate detection of fake news,” IEEE Access, vol.
9, pp. 2262622639, 2021.
[7] S. I. Manzoor, J. Singla, and Nikita, “Fake news detection using machine learning approaches: A systematic review,” in
Proc. International Conference on Trends in Electronics and Informatics, 2019, pp. 230-234.
[8] A. Jain, A. Shakya, H. Khatter, et al., “A smart system for fake news detection using machine learning,” in Proc.
International Conference on Issues and Challenges in Intelligent Computing Techniques, 2019, pp. 1-4.
[9] I. Ahmad, M. Yousaf, S. Yousaf, et al., “Fake news detection using machine learning ensemble methods,” Complexity,
pp. 1-11, 2020.
[10] UTK machine learning club. (July 2017). Fake news, version 1. [Online]. Available: https://www.kaggle.com/c/fake-
news/data
[11] H. Ali, M. S. Khan, A. AlGhadhban, et al., “All your fake detector are belong to us: Evaluating adversarial robustness
of fake-news detectors under black-box settings,” IEEE Access, vol. 9, pp. 8167881692, 2021.
[12] I. K, Sastrawan, I. P. A. Bayupati, and D. M. S. Arsa, “Detection of fake news using deep learning CNN-RNN based
methods,” ICT Express, pp. 1-13, 2021.
[13] Y. A. Solangi, Z. A. Solangi, S. Aarain, et al., “Review on Natural Language Processing (NLP) and its toolkits for
opinion mining and sentiment analysis,” in Proc. International Conference on Engineering Technologies and Applied
Sciences, 2018, pp. 1-4.
[14] G. Kim and S. H. Lee, “Comparison of Korean preprocessing performance according to Tokenizer in NMT transformer
model,” Journal of Advances in Information Technology, vol. 11, pp. 228232, 2020.
[15] T. Daghistani and R. Alshammari, “Comparison of statistical logistic regression and random forest machine learning
techniques in predicting diabetes,” Journal of Advances in Information Technology, vol. 11, pp. 78-83, 2020.
[16] W. He, Y. He, B. Li, et al., “A naive-Bayes-based fault diagnosis approach for analog circuit by using image-oriented
feature extraction and selection technique,” IEEE Access, vol. 8, pp. 50655079, 2020.
[17] Q. Xue, Y. Zhu, and J. Wang, “Joint distribution estimation and naïve bayes classification under local differential
privacy,” IEEE Transactions on Emerging Topics in Computing, vol. 9, pp. 20532063, 2021.
[18] H. A. Maddah, “Decision trees based performance analysis for influence of sensitizers characteristics in dye-sensitized
solar cells,” Journal of Advances in Information Technology, vol. 13, pp. 271276, 2022.
[19] I. D. Mienye, Y. Sun, and Z. Wang, “Prediction performance of improved decision tree-based algorithms: A review,”
Procedia Manufacturing, vol. 35, pp. 698-703, 2019.
[20] J. A. C. Moreano and N. B. L. S. Palomino, “Global facial recognition using gabor wavelet, support vector machines
and 3D face models,” Journal of Advances in Information Technology, vol. 11, pp. 143-148, 2020.
[21] A. B. Gumelar, A. Yogatama, D. P. Adi, et al., “Forward feature selection for toxic speech classification using support
vector machine and random forest,” International Journal of Artificial Intelligence, vol. 11, pp. 717-726, 2022.
[22] J. Cervantes, F. Garcaí-Lamont, L. Rodrgíuez, et al., “A comprehensive survey on support vector machine
classification:
Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189-215, 2020.
[23] I. Benchaji, S. Douzi, and B. E. Ouahidi, “Credit card fraud detection model based on LSTM recurrent neural
networks,” Journal of Advances in Information Technology, vol. 12, pp. 113118, 2021.
[24] N. Yadav and A. K. Singh, “Bi-directional encoder representation of transformer model for sequential music
recommender system,” in Proc. Forum for Information Retrieval Evaluation, 2020, pp. 4953.
[25] S. Ni, J. Li, and H. Y. Kao, “MVAN: Multi-view attention networks for fake news detection on social media,” IEEE
Access, vol. 9, pp. 106907-106917, 2021.

Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License
(CC BYNC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly
cited, the use is noncommercial and no modifications or adaptations are made.

Noshin Nirvana Prachi obtained her bachelor's degree in computer science and engineering in July 2021 from North South
University, Bangladesh. Noshin was born in Dhaka, Bangladesh. One of her research work on deep learning-based speaker
recognition system was published at Interdisciplinary Research in Technology and Management (IRTM) conference. She is
working on data science, machine learning, computer vision and software engineering.

Md. Habibullah completed his B.Sc. degree in computer science and engineering in 2021 from North South University,
Bangladesh's electrical and computer engineering department. Recently he has published a manuscript on a deep learning-
based speaker recognition system at an IEEE conference. Currently, he is doing research on data science, machine learning,
cryptography and cyber security.

Md. Emanul Haque Rafi received his bachelor of science degree in computer science and engineering from the electrical
and computer engineering department of North South University, Bangladesh. Emanul was born in Dhaka, captial city of
Bangladesh. His primary research interest includes data science and management, machine learning, deep learning, and
natural language processing.

Evan Alam has a bachelor’s degree in computer science and engineering from electrical and computer engineering
department of North South University, Bangladesh. He was an active member of the Computer & Engineering Club of North
South University during his undergraduate study. Currently, his primary research interests are computer vision, data science,
machine learning, and computer network security.

Riasat Khan received a B.Sc. degree in Electrical and Electronic Engineering from the Islamic University of Technology,
Bangladesh, in 2010. He completed his M.Sc. and Ph.D. degrees in Electrical Engineering from New Mexico State
University, Las Cruces, USA, in 2018. Currently, Dr. Khan is working as an Assistant Professor in the Department of
Electrical and Computer Engineering at North South University, Dhaka, Bangladesh. His research interests include
biomedical engineering, cardiac electrophysiology and computational bioelectromagnetics.
CODE
Fake news detection using ML
Sushwanth Reddy 17STUCHH010063

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn import feature_extraction, linear_model, model_selection,
preprocessing
from sklearn.metrics import accuracy_score from
sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

Read datasets
fake = pd.read_csv("data/Fake.csv")
true = pd.read_csv("data/True.csv")
fake.shape
(23
481
,4)
true.shape
(21417,4)

Data cleaning and preparation


# Add flag to track fake and real
fake['target'] = 'fake' true['target'] =
'true'

# Concatenate dataframes
data = pd.concat([fake, true]).reset_index(drop = True) data.shape
(44
898
,
5)
# Shuffle the data
from sklearn.utils import shuffle data =
shuffle(data)
data = data.reset_index(drop=True)

# Check the data data.head()


titl
e \
0 EU Commission says all sides should
stick to I...
1 PRESIDENT TRUMP Looking at Executive
Action on...
2 EU official says no sign Trump plans to
ease R...
3 Subdued by Harvey, Congress reconvenes
facing ...
4 EVIL HILLARY SUPPORTERS Yell “F*ck
Trump”…Burn...
text
subject \
0 BRUSSELS (Reuters) - The European Commission s...
worldnews
1 Remember during the effort to get Obamacare pa...
politics
2 WASHINGTON (Reuters) - A senior European Union...
politicsNews
3 WASHINGTON (Reuters) - Hurricane Harvey devast...
politicsNews
4 These people are sick and evil. They will stop...
politics
da
te target 0
October 6, 2017
true
1 Aug 1, 2017
fake
2 April 4, 2017
true
3 September 4,
2017 true
4 Nov 6, 2016
fake
# Removing the date (we won't use it for the analysis)
data.drop(["date"],axis=1,inplace=True) data.head()
title
\
0 EU Commission says all sides should stick
to I...
1 PRESIDENT TRUMP Looking at Executive
Action on...
2 EU official says no sign Trump plans to
ease R...
3 Subdued by Harvey, Congress reconvenes
facing ...
4 EVIL HILLARY SUPPORTERS Yell “F*ck
Trump”…Burn...
text
subject
targe t
0 BRUSSELS (Reuters) - The European Commission s...
worldnews
tru e
1 Remember during the effort to get Obamacare pa...
politics
fak e
2 WASHINGTON (Reuters) - A senior European Union...
politicsNews
tru e
3 WASHINGTON (Reuters) - Hurricane Harvey devast...
politicsNews
tru e
4 These people are sick and evil. They will stop...
politics
fak e
# Removing the title (we will only use the text)
data.drop(["title"],axis=1,inplace=True) data.head()

text
subject
target
0 BRUSSELS (Reuters) - The European Commission s...
worldnews
true
1 Remember during the effort to get Obamacare pa...
politics
fake
2 WASHINGTON (Reuters) - A senior European Union...
politicsNews
true
3 WASHINGTON (Reuters) - Hurricane Harvey devast...
politicsNews
true
4 These people are sick and evil. They will stop...
politics
fake
# Convert to lowercase

data['text'] = data['text'].apply(lambda x: x.lower()) data.head()


text
subject
target
0 brussels (reuters) - the european commission s...
worldnews
true
1 remember during the effort to get obamacare pa...
politics
fake
2 washington (reuters) - a senior european union...
politicsNews
true
3 washington (reuters) - hurricane harvey devast...
politicsNews
true
4 these people are sick and evil. they will stop...
politics
fake
# Remove punctuation import

string

def punctuation_removal(text): all_list = [char for


char in text if char not in string.punctuation]
clean_str = ''.join(all_list)
return clean_str

data['text'] = data['text'].apply(punctuation_removal)

# Check data.head()
text
subject
target
0 brussels reuters the european commission said...
worldnews
true
1 remember during the effort to get obamacare pa...
politics
fake
2 washington reuters a senior european union of... politicsNews
true
3 washington reuters hurricane harvey devastate... politicsNews
true
4 these people are sick and evil they will stop ... politics
fake

# Removing stopwords import


nltk
nltk.download('stopwords') from
nltk.corpus import stopwords stop =
stopwords.words('english')

data['text'] = data['text'].apply(lambda x: ' '.join([word for word in


x.split() if word not in (stop)]))
[nltk_data] Downloading package stopwords to C:\Users\
sharanya
[nltk_data] reddy\AppData\Roaming\
nltk_data...
[nltk_data] Package stopwords is already up-to-
date!
data.head()
text
subject
target
0 brussels reuters european commission said frid...
worldnews
true
1 remember effort get obamacare passed nancy pel...
politics
fake
2 washington reuters senior european union offic...
politicsNews
true
3 washington reuters hurricane harvey devastated...
politicsNews
true
4 people sick evil stop nothing get way laws mea...
politics
fake

Basic data exploration


# How many articles per subject? print(data.groupby(['subject'])
['text'].count()) data.groupby(['subject'])
['text'].count().plot(kind="bar") plt.show()
subject
Government News
1570
Middle-east
778
News
9050 US_News
783 left-news
4459 politics
6841

politicsNews
11272 worldnews
10145 Name: text,
dtype: int64

# How many fake and real articles?


print(data.groupby(['target'])['text'].count()) data.groupby(['target'])
['text'].count().plot(kind="bar") plt.show()
target
fake
23481 true
21417
Name: text, dtype:
int64

# Word cloud for fake news


from wordcloud import
WordCloud

fake_data = data[data["target"] == "fake"]


all_words = ' '.join([text for text in fake_data.text])

wordcloud = WordCloud(width= 800, height=


500,
max_font_size = 110,
collocations =
False).generate(all_words)

plt.figure(figsize=(10,7))
plt.imshow(wordcloud,
interpolation='bilinear') plt.axis("off")
plt.show()
# Word cloud for real news
from wordcloud import
WordCloud

real_data = data[data["target"] == "true"]


all_words = ' '.join([text for text in fake_data.text])

wordcloud = WordCloud(width= 800, height=


500,
max_font_size = 110,
collocations =
False).generate(all_words)

plt.figure(figsize=(10,7))
plt.imshow(wordcloud,
interpolation='bilinear') plt.axis("off")
plt.show()
# Most frequent words counter (Code adapted from
https://www.kaggle.com/rodolfoluna/fake-news-detector)
from nltk import tokenize

token_space = tokenize.WhitespaceTokenizer()

def counter(text, column_text, quantity): all_words = '


'.join([text for text in text[column_text]])
token_phrase = token_space.tokenize(all_words) frequency
= nltk.FreqDist(token_phrase)
df_frequency = pd.DataFrame({"Word": list(frequency.keys()),
"Frequency":
list(frequency.values())})
df_frequency = df_frequency.nlargest(columns = "Frequency", n =
quantity)
plt.figure(figsize=(12,8))
ax = sns.barplot(data = df_frequency, x = "Word", y = "Frequency",
color = 'blue')
ax.set(ylabel = "Count")
plt.xticks(rotation='vertical')
plt.show()

# Most frequent words in fake news


counter(data[data["target"] == "fake"], "text", 20)
# Most frequent words in real news counter(data[data["target"] == "true"], "text",
20)
Modeling
# Function to plot the confusion matrix (code from
https://scikitlearn.org/stable/auto_examples/model_selection/plot_conf
usion_matrix.h
tml)
from sklearn import metrics
import itertools

def plot_confusion_matrix(cm, classes,


normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):

plt.imshow(cm, interpolation='nearest',
cmap=cmap) plt.title(title) plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

if normalize: cm = cm.astype('float') /
cm.sum(axis=1)[:, np.newaxis] print("Normalized
confusion matrix")
else: print('Confusion matrix, without
normalization')

thresh = cm.max() / 2. for i, j in


itertools.product(range(cm.shape[0]),
range(cm.shape[1])): plt.text(j, i, cm[i,
j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')

Peparing the data


# Split the data
X_train,X_test,y_train,y_test = train_test_split(data['text'], data.target,
test_size=0.2, random_state=42)

Naive Bayes

dct = dict()

from sklearn.naive_bayes import MultinomialNB

NB_classifier = MultinomialNB()
pipe = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()), ('model',
NB_classifier)])

model = pipe.fit(X_train, y_train) prediction =


model.predict(X_test)
print("accuracy: {}%".format(round(accuracy_score(y_test,
prediction)*100,2)))

dct['Naive Bayes'] = round(accuracy_score(y_test, prediction)*100,2)


accuracy:
95.27%
cm = metrics.confusion_matrix(y_test, prediction) plot_confusion_matrix(cm,
classes=['Fake', 'Real'])

Confusion matrix, without normalization


Logistic regression

# Vectorizing and applying TF-IDF


from sklearn.linear_model import LogisticRegression

pipe = Pipeline([('vect', CountVectorizer()),


('tfidf', TfidfTransformer()),
('model', LogisticRegression())])

# Fitting the model


model = pipe.fit(X_train, y_train)

# Accuracy
prediction = model.predict(X_test)
print("accuracy: {}%".format(round(accuracy_score(y_test,
prediction)*100,2)))
dct['Logistic Regression'] = round(accuracy_score(y_test, prediction)*100,2)
accuracy:
98.84%
cm = metrics.confusion_matrix(y_test, prediction) plot_confusion_matrix(cm,
classes=['Fake', 'Real'])

Confusion matrix, without normalization


Decision Tree

from sklearn.tree import DecisionTreeClassifier

# Vectorizing and applying TF-IDF


pipe = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('model', DecisionTreeClassifier(criterion=
'entropy',
max_depth = 20,
splitter='best',
random_state=42))])
# Fitting the model
model = pipe.fit(X_train, y_train)

# Accuracy
prediction = model.predict(X_test)
print("accuracy: {}%".format(round(accuracy_score(y_test,
prediction)*100,2)))
dct['Decision Tree'] = round(accuracy_score(y_test, prediction)*100,2)
accuracy:
99.58%
cm = metrics.confusion_matrix(y_test, prediction) plot_confusion_matrix(cm,
classes=['Fake', 'Real'])
Confusion matrix, without normalization

Random Forest

from sklearn.ensemble import RandomForestClassifier

pipe = Pipeline([('vect', CountVectorizer()),


('tfidf', TfidfTransformer()),
('model', RandomForestClassifier(n_estimators=50,
criterion="entropy"))])

model = pipe.fit(X_train, y_train) prediction =


model.predict(X_test)
print("accuracy: {}%".format(round(accuracy_score(y_test,
prediction)*100,2)))
dct['Random Forest'] = round(accuracy_score(y_test, prediction)*100,2)
accuracy:
99.3%
cm = metrics.confusion_matrix(y_test, prediction) plot_confusion_matrix(cm,
classes=['Fake', 'Real'])

Confusion matrix, without normalization


SVM

from sklearn import svm

#Create a svm Classifier


clf = svm.SVC(kernel='linear') # Linear Kernel

pipe = Pipeline([('vect', CountVectorizer()),


('tfidf', TfidfTransformer()),
('model', clf)])

model = pipe.fit(X_train, y_train) prediction =


model.predict(X_test)
print("accuracy: {}%".format(round(accuracy_score(y_test,
prediction)*100,2)))
dct['SVM'] = round(accuracy_score(y_test, prediction)*100,2)
accuracy:
99.44%
cm = metrics.confusion_matrix(y_test, prediction) plot_confusion_matrix(cm,
classes=['Fake', 'Real'])

Confusion matrix, without normalization


Comparing Different Models
import matplotlib.pyplot as plt plt.figure(figsize=(8,7))
plt.bar(list(dct.keys()),list(dct.values())) plt.ylim(90,100)
plt.yticks((91, 92, 93, 94, 95, 96, 97, 98, 99, 100))

([<matplotlib.axis.YTick at 0x18101096a00>,
<matplotlib.axis.YTick at 0x18101096670>,
<matplotlib.axis.YTick at 0x181038b14c0>,
<matplotlib.axis.YTick at 0x18104447880>,
<matplotlib.axis.YTick at 0x18104447d90>,
<matplotlib.axis.YTick at 0x181044302e0>,
<matplotlib.axis.YTick at 0x181044307f0>,
<matplotlib.axis.YTick at 0x18104430d00>,
<matplotlib.axis.YTick at 0x1810446a250>,
<matplotlib.axis.YTick at 0x18104430a30>],
[Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0, ''),
Text(0, 0,, '')
Text(0, 0,, '')
Text(0, 0,] )'')

the end.....

You might also like