0% found this document useful (0 votes)
65 views

Constructing A User-Centered Fake News Detection Model by Using Classification Algorithms in Machine Learning Techniques

Uploaded by

Harsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Constructing A User-Centered Fake News Detection Model by Using Classification Algorithms in Machine Learning Techniques

Uploaded by

Harsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received 5 June 2023, accepted 24 June 2023, date of publication 12 July 2023, date of current version 19 July 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3294613

Constructing a User-Centered Fake News


Detection Model by Using Classification
Algorithms in Machine Learning Techniques
MINJUNG PARK AND SANGMI CHAI
Ewha School of Business, Ewha Womans University, Seodaemun-gu, Seoul 03760, South Korea
Corresponding author: Sangmi Chai ([email protected])
This work was supported in part by the Ministry of Education of the Republic of Korea and in part by the National Research Foundation of
Korea under Grant NRF-2020S1A5A2A01046634.

ABSTRACT As fake news spreads rapidly in social media, attempts to develop detection technology to
automatically identify fake news are actively being developed, recently. However, most of them focus only
on the linguistic and compositional characteristics of fake news (e.g., source or authors indication, length of a
message, frequency of negative words). Compared to them, this study proposes a fake news detection model
based on machine learning that reflects the characteristics of users, news content, and social networks based
on social capital. To comprehensively reflect the characteristics related to the spread of fake news, this study
applied the XGBoost model to estimate the feature importance of each variable to derive the priority factors
that preferentially affect fake news detection. Based on the derived variables, we established SVM, RF, LR,
CART, and NNET, which are representative classification models of machine learning, and compared the
performance rate of fake news detection. To generalize the established models (i.e., to avoid overfitting or
underfitting), this study performed a cross-validation step, and to compare the predictive accuracy of the
established models. As a result, the RF model indicated the highest prediction rate at about 94%, while the
NNET had the lowest performance rate at about 92.1%. The results of this study are expected to contribute
to improve the fake news detection system in preparation for the more sophisticated generation and spread
of fake news.

INDEX TERMS Classification algorithms, fake news, fake news detection, feature selection, prediction
algorithms, predictive models, XGBoost.

I. INTRODUCTION Almost prior studies have focused on detecting or identi-


The content-based recommender system developed to pro- fying fake news depending on its linguistic or compositional
vide customized content to users improved their satisfaction; characteristics. They have identified fake news based on
however, it recently became a decisive opportunity for spread- whether the article has a clear author and source or whether
ing fake news [1]. These are designed to continuously show the article has enough length [4], [5], [6], [7]. This approach
content in the feed similar to what the user has previously assumed differences in linguistic or compositional features
seen or has shown engagement with in the past by ‘‘Likes’’ between fake and factual news. It is hard to reflect the
or comments, regardless of whether the news is true [2]. In characteristics of users who accept or spread fake news and
other words, once a user encounters fake news, the system the features of the social media networks where fake news
has no choice but to recommend similar content to the user spreads.
continuously. They even intentionally adjust the appearance With the advent of ChatGPT(Generative Pre-trained Trans-
of unwanted content in their social media feeds [3]. former), which can describe stylistic features that do not look
awkward, such as those written by low-level AI, it is no
longer possible to guarantee the accuracy of detecting fake
The associate editor coordinating the review of this manuscript and news in the previous way. Recently, it has become so easy to
approving it for publication was Geng-Ming Jiang . create fake news that looks like real news that users mistake
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 71517
M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

news written by ChatGPT in a few seconds for news written groups. If a system mistakenly identifies a legitimate news
by a professional journalist [48]. Therefore, it is necessary story as fake news, it could lead to accusations of bias or
to identify fake news differently than before. In this study, censorship against the news outlet that published it [5].
we finally propose a detection model that comprehensively To summarize the current state-of-the-art in fake news
considers not only the visual features of the content, but also detection systems, most of the previous research assumes
the characteristics of the users who generate and share fake that linguistic and compositional features of content are the
news and the networks that spread fake news. main criteria for distinguishing fake news and real news [9].
Moreover, fake news can be generated more sophisticat- Fake news detection systems typically rely on linguistic and
edly due to the technology that can automatically cause it structural features of news articles, but they often fail to
to be identical to real news easily in a short time by using capture the context of the news, such as the history of the
AI (Artificial Intelligence) is spreading today. AI-powered news source or the socio-political environment in which the
bots in Twitter (i.e., AI Twitterbot) can populate thousands of news is circulated. These methods which could not detect
user accounts that can support and oppose any content, which words semantic meaning and context of the word picked up
looks the same as the real news, even if it is fake, the bot from a fake news have been identified with low accuracy
controllers target [8]. Therefore, it has become increasingly value [10].
difficult to detect precisely manipulated fake news based The content-oriented fake news detection, which is the
only on its superficial features. It is necessary to approach most common approach, focuses on natural language pro-
the user characteristics and networks of social media from a cessing (NLP) to identify fake news by concentrating on
more diverse viewpoint, to overcome the existing fake news the characteristics of the text. NLP techniques process news
detection method that focuses on linguistic characteristics. content based on language pattern detection, word occur-
This study aims to improve the prediction performance of rences common to satire, irony, sentiment, and topicality [11].
fake news detection by overcoming the limitation that previ- To find deemphasizing the source or design highlights the
ous studies did not consider the characteristics of information article headline of the news is also a way of identifying fake
recipients. Therefore, we establish a fake news detection news by paying attention to its textual characteristics [12].
model by considering various content features and users in This content-oriented approach assumes that fake and real
social media and the network where fake news is generated news have different linguistic and composition structures. It
and propagated. Among the different explanatory variables to proposes a hybrid fake news detection algorithm that com-
detect fake news, ‘feature selection,’ which is the priority of bines a linguistic approach and network cues and provides
the explanatory variable, is first derived through the XGBoost operational guidelines for a feasible fake news detecting sys-
(Extreme Gradient Boosting). By constructing an optimal tem [13]. In addition, based on grammatical characteristics
fake news detection model through the selected explanatory through syntax parsing through Probabilistic Context Free
variables by XGBoost, we aim to increase the predictive Grammar (PCFG) and the difference between keywords used
performance rate. It is constructed by applying five machine in fake news and real news, semantic characteristics, rhetor-
learning techniques which are Logistic Regression (LR), ical structure, and discourse analysis results were selected
Neural Network (NNET), Random Forest (RF), Support Vec- as explanatory variables to determine whether fake news or
tor Machine (SVM), Classification and Regression Trees not [14]. On fake news detection targeting Facebook posts
(CART). A model with the highest prediction performance and various articles, Term Frequency - Inverse Document
rate for detecting fake news is finally derived by comparing Frequency (TF-IDF) is frequently used to represent text char-
their performance rate. acteristics in text analysis and was used as a criterion for
classifying fake news [5].
II. RELATED WORKS It is constructed as an automated fake news detection
A. FAKE NEWS DETECTION model by extracting linguistic features from the text of
Recent studies demonstrate the diverse range of approaches online newspaper articles [15]. Using the LIWC (Linguistic
that researchers are taking to develop fake news detection Inquiry and Word Count), the ratio using punctuation marks
models using machine learning, and the potential of these (e.g., periods, commas, question marks, exclamation points)
models to improve the accuracy of news verification. How- was calculated. In addition, it also figured how many words
ever, like any technology, fake news detection systems are related to positive and negative were mentioned in each doc-
hard to be perfect and can sometimes make errors in identi- ument based on the LIWC lexicon to extract the proportions
fying fake news. If a fake news is not detected appropriately of observations that fall into psycholinguistic categories and,
by the system,] it can be shared widely on social media in a finally, constructed as readability metrics to apply in the
short time, leading to a significant impact on public opinion fake news detection model. A naive Bayesian classifier to
and behavior. For example, hundreds of people died in Iran, news attributes [16], the combination of the text of news and
after drinking methanol for curing COVID-19, due to the clickbait [17], a hybrid model combining spread patterns of
fake news which had been accepted fact in the first [47]. fake news, and semantic analysis to develop an automated
Furthermore, errors in fake news detection systems can lead fake news detection model is being actively conducted [18],
to false accusations or misidentifications of individuals or [19], [20], [21].
71518 VOLUME 11, 2023
M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

There are several datasets available in literature, such as Finally, we include the context of fake news and the latest
the LIAR dataset or Fakenewsnet, that can be used to train fake news generation styles by reflecting word sentiment,
and evaluate machine learning models for the task of detect- similarity, and users’ network relationships to overcome the
ing fake news. The LIAR dataset, for example, is a widely limitations of the existing fake news system presented above.
used dataset for fake news detection, consisting of statements
labeled as either true, mostly true, half true, barely true, B. SOCIAL CAPITAL THEORY
false [51]. The Fakenewsnet dataset, on the other hand, con-
Social influence can be explained that is a structure of social
tains news articles labeled as either fake, bias, or conspiracy
relationships by exchanging interactions between users based
[52]. Both datasets have been labeled by human annotators.
on the social network. The main background enabling this
The LIAR dataset has been labeled by human fact-checkers,
mutual exchange of social influence is derived from each
while the Fakenewsnet dataset has been labeled by crowd-
individual’s social capital in the social network [23]. There-
sourced workers. They determined fake news depending on
fore, it was found that the type of social network formed
its surface-level linguistic patterns. It is difficult to apply their
by the users differs according to the social capital possessed
patterns when fake news has the appearance as same as real
by them. At the same time, the will to create a social rela-
news and uses a professional expression look like written
tionship and the degree of persistence of the relationship are
by professional journalist, these days. Therefore, it has been
different [24]. Social capital arises from different attributes
addressed, with the number of sophisticated fake news has
within three dimensions –structural, relational, and cogni-
been increased, the needs for reflecting not only its linguis-
tive [24]. The structural dimension is the key to whether
tic characteristics but also various approaches including the
or not to establish a network connection between actors in
users’ network relationships and the context of fake news.
the network and includes the overall strength of the connec-
As fake news detection systems have become more
tion [25]. The relational dimension of social capital refers
popular, recently, models for spreading fake news have
to personal relationships between actors formed through
emerged that bypass the algorithmic identification methods
interactions between individuals [26]. Finally, the cognitive
of well-known fake news systems to avoid being identified
dimension refers to sharing shared representations, interpre-
as fake news. Fake news detection systems can be targeted by
tations, semantic content, and systems among agents [27].
individuals who deliberately create fake news that is designed
Based on the three dimensions of social capital presented
to bypass these systems [22]. These adversarial attacks can
above, determinants influencing was selected to detect fake
make it difficult to develop accurate and reliable fake news
news spread on Twitter. It is to comprehensively consider the
detection systems. While the technology used to spread fake
characteristics of the network of Twitter and users affecting
news continues to evolve, current fake news detection sys-
the individual acceptance and spread of fake news.
tems are still focused on the linguistic and compositional
features used in fake news, similar to the past. Therefore,
we aim to improve the existing system that are difficult to 1) NETWORK FEATURES OF STRUCTURAL DIMENSION
distinguish fake news with the increased the more sophisti- The network features of Twitter can be defined as factors
cated fake news in the recent seems like fact news. Recent that can affect users mutually through Twitter. In this study,
fake news detection systems have some limitations, which three factors were adopted to estimate the network features
have made it easier for fake news to spread. Some limitations of social network structures: the number of followers and
that have recently been identified include as follows. First, the followings and the degree of centrality in the network. First,
existing fake news detection system mainly rely on keywords. the number of followers and followings is representative as
Some systems depend on keywords to identify fake news, an indirect proxy variable that shows the user’s willingness
which can be easily manipulated by those spreading false to interact with others on Twitter [28], [29]. It can be inferred
information [49]. In other words, fake news can be designed that users have many followers means that the user has
to reach a wider audience by exploiting weaknesses in these high expectations for establishing relationships with others
systems. Second, it has been identified that fake news system, based on the structural network features of Twitter. It can be
in the recent, has been hard to detect new forms of fake news. explained that the user has a higher willingness to engage in
As the methods used to spread fake news evolve, the existing networking activities with others when the user has a lot of
detection systems may not be able to keep up. For example, followings than those with few followings. As the degree of
deepfake videos or manipulated images may be difficult to centrality in the network can be measured differently depend-
detect using current systems [50]. Third, many fake news ing on the way users communicate and interact with others
systems rely on analyzing individual pieces of content in on Twitter, considering all these factors (i.e., in-degree, out-
isolation, without considering the broader context in which degree, betweenness centrality) were regarded as features of
it was shared [22]. This can make it difficult to determine the network. In-degree centrality indicates how many users
whether a piece of information is intentionally false or simply are followed by other users in Twitter’s limited network [30].
a mistake or misunderstanding. In other words, it is hard In other words, it is an index of influence in which the direc-
to determine a content is fact or not only to depend on the tion of exchange within the Twitter network has a direction
existing fake news systems which has a lack of contents. from others to oneself. In contrast, out-degree centrality refers

VOLUME 11, 2023 71519


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

to how much information or attention you provide to other 140 characters, it is difficult for the information recipient to
users on the Twitter network. Therefore, it is estimated as an sufficiently acquire the information through only the tweet
indicator of outward connection centrality within the Twitter message. Accordingly, it is not easy for a user to determine
network, and it means the number of arrows directed away the authenticity of the information on Twitter. In this study,
from each user to the others [30]. The in-degree centrality the text characteristics of tweet messages were judged as
and out-degree centrality of node i, denoted respectively with influential factors for identifying fake news, and the charac-
CI ,i and CO,i , can be defined as: teristics of tweet text messages were examined in terms of
XN lij ‘word similarity’ and ‘word sentiment.’
CI ,i = (1) Word similarity evaluates how two words/phrases/
j=1,j̸=i N − 1
XN lij documents are similar. Words used with similar meanings to
CO,i = (2) specific words in a sentence into numerical values to check
j=1,j̸=i N − 1
the similarity between words composing the entire sentence.
Betweenness centrality refers to how information is mediated The similarity is calculated through word embedding, which
between different users in Twitter’s network. It is often used is a method of quantifying a single word constituting a
to find nodes that serve as a bridge from one part of a network sentence. A word is expressed as a vector, and the distance
to another. It is an indicator showing how much a user can between a specific term and a similar word is calculated
transmit information among users [32]. The betweenness to derive the similarity between the words. Therefore, word
Centrality of node i, denoted as BCi , can be defined as: similarity analysis estimates the semantics of word meaning
P PN 9jk(i) based on context according to the word embedding method
2 N 1 k gjk
BCi = 2
, j ̸= k ̸= i (3) in calculating the distance between words. When each tweet
N − 3N + 2 message consists of frequent use of vocabulary with similar
Therefore, in this study, the degree centrality index on the purposes, the value of word similarity increases. Accordingly,
Twitter network is divided into three network influence indi- in the case of general information messages that convey
cators as above and measured. Then a fake news detection only the facts without arbitrarily forming the tone of the
model is constructed based on the degree of centrality influ- message, it is common to record a relatively low value of
ence level of each. word similarity by using mainly non-emotional words or
neutral expressions without duplication or repetition. How-
2) USER FEATURES OF RELATIONAL DIMENSION ever, this study predicted that fake news would constitute
The relational dimension in Twitter is to examine how much a tweet message by repeatedly and intentionally using the
and what kind of relational actions the users have taken on same or similar words to incite users who encountered the
Twitter. It is estimated based on the total number of tweets information to accept and spread the information. Therefore,
posted by each user and the total number of ‘Likes’ and word similarity was also considered a vital antecedent factor
‘Retweets’ received from others. In addition to tweet posts, for detecting fake news in this study and included as an
‘Retweets’ and ‘Likes’ are essential for inferring user char- explanatory variable to identify fake news among general
acteristics in the network. They are the most representative information.
and only communication methods on Twitter [23]. SentiWordNet (SWN) is a dictionary that adds sentiment
A user’s network activity in social media means how each scores to each word. It gives a positivity, negativity, and
user influences the other users based on the number of Twitter objectivity score, which measures how much sentiment a
users’ mentions and retweets [30]. There have been identified word has. SWN automatically sets the values of positive,
that empirically evaluate the effects of influencers based on negative, and objective sentiment for each synonym set of
the probability that a tweet is retweeted on Twitter [33], WordNet. Various methods for sentiment analysis have been
[34]. It has been confirmed that when a tweet from a user developed. However, SWN has a difference in that it is
with less than 1,000 followers is retweeted, that tweet is possible to calculate an adequate emotional intensity level
delivered to thousands of additional users. A retweet can be because the emotion value is applied differently for each part
explained as having an additive impact on many Twitter users, of speech of the word used.
regardless of their number of followers [35]. Combining the Fake news is expected to show a high emotional intensity
previous studies presented above, this study also measures the because it uses more provocative vocabulary and negative
characteristics of Twitter users based on the total number of phrases that appeal to emotions. Analyzing about 320,000
tweets, retweets, and likes created by users for evaluating the news articles published in the New York Times from 2012 to
relational dimension of social capital on Twitter. 2014 by applying SWN revealed that out of 754 days, positive
and negative news was recorded 322 days and 432 days,
3) CONTENT FEATURES OF COGNITIVE DIMENSION respectively. Analysis of positive and negative news based
Twitter, the subject of this research, is a representative social on SWN showed an improved stock price prediction perfor-
media where information is shared mainly through short mance of about 4% or more than when the only technical
text messages of 140 characters or less. Therefore, since analysis was performed [36]. Parts of speech of online prod-
the content of the tweet message is implied by the limited uct reviews were classified into adjectives, adverbs, and

71520 VOLUME 11, 2023


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

verbs, and found that SWN improved performance compared model by lowering the accuracy of the prediction model.
to previous sentiment analysis methods [37]. Furthermore, Feature selection through XGBoost is expected to enable
SWN demonstrated that the performance of identifying the model construction with high accuracy while preventing
tone of each document was improved even for articles such model overfitting. Accordingly, in this study, the main factors
as relatively crude reviews [38]. Combining previous studies, affecting identifying tweet messages-related fake news are
this study judged that deduction of linguistic characteristics first derived and based on them, an optimal model is built.
of fake news based on SWN would have a significant effect
on identification of fake news.
IV. DATA COLLECTION
We collected a total of 23,592 tweets over a period of 595 days
III. RESEARCH METHOD
(from Mar. 5, 2019 to Oct. 19, 2020). We first extracted
This study conducts the following research analysis pro-
the topics of popular and representative fake news cases that
cedures to identify fake news. First, XGBoost is applied
spread globally. Only news that were clearly determined to
to derive the priority of variables that have a significant
be fake by authoritative media outlets (e.g., The Wall Street
effect on fake news detection. Second, we establish a model
Journal, CNN) could be included in the final data analysis as
to distinguish fake news with five representative classifica-
a fake. The selected fake news was collected considering var-
tion algorithms (i.e., LR, NNET, RF, SVM, CART) among
ious fields: medicine, politics, economy, IT, entertainment,
machine learning models based on the derived factors from
and international areas. For example, ’Drinking alcohol-
performing XGBoost. Third, we adopt k-fold cross-validation
including beverages with high percentages of alcohol-offers
steps to improve the performance rate and generalized of the
protection from COVID 19. . . ’ was collected, which is pop-
established each model, and also performed ablation studies
ular fake news in medicine. Data preprocessing was also
to increase the robustness of the model.
performed as follows. First, considering the characteristics
Machine learning techniques have recently been widely
of fake news, tweets which include same topics or similar
used in various research fields for prediction. This study
contents were regarded as the same news and excluded from
uses supervised learning classification algorithms to identify
the final analysis. In other words, tweets and retweets that
fake news from given data. Recently, ensemble learning uses
talk about the same content were considered duplicates and
multiple models together to improve the performance of an
removed except for the first tweet posted. Second, tweets in
algorithm rather than using a single model [39]. XGBoost is a
which users unilaterally expressed emotions such as anger,
representative ensemble model, which usually shows superior
joy, and sadness, including agreeing or disapproving with
performance in classification than other single classification
a specific tweet, were considered tweets that delivered the
algorithms [40]. The existing Gradient Boost Machine has a
only personal emotions, were removed from final data set.
limitation in that the speed of analysis is significantly slow
We also excluded tweets in which users evaluated by the
as the learning weights are sequentially increased. However,
users whether the tweet was fake or fact. 402 tweets, finally,
unlike this, XGBoost is faster than the existing gradient
including 202 fake news and 200 true news tweets, were used
boosting technique because it can learn in a parallel CPU. In
to establish a fake news detection model.
addition, it has the advantage of solid durability against over-
In this study, each tweet’s ‘word sentiment’ and ‘word
fitting in that it provides regularization to prevent overfitting
similarity’ were calculated as content features in a cognitive
[40], [41].
dimension of social capital. And the degree of centrality of
XGBoost model is formed to identify feature importance,
users (i.e., in-degree, out-degree, betweenness centrality) in
the gain for the accuracy of each variable, and the frequency
the Twitter network and the user’s account information of
of the appearance of the variable in the entire tree can
the numbers of followers and followings were also collected
be reflected together [41]. As the split criterion used and
for network features in the structural dimension of the social
the accuracy contribution point value due to the pruning is
capital. Additionally, data on the total number of tweet mes-
derived through each pruning, the direction of the variable
sages posted by each ID and the total number of likes and
can be grasped through this. It is possible to identify which
retweets received for each account were also obtained to
variables and how much they worked in making ‘‘important
evaluate users’ relational features on Twitter.
decisions’’ in the decision-making process among the many
variables input to construct the model when XGBoost is used.
Therefore, the importance of the variables input for model V. RESEARCH ANALYSIS
building is calculated for each variable, and each variable can A. EXTRACTING PRIORITY FACTORS USING XGBoost
be sorted by rank. This study aims to determine which factors have a significant
In general, when constructing a model by inputting many influence in discriminating between fake news and true news
variables, there is an advantage in that the model can be built by identifying the feature importance of each variable through
by considering the various conditions. In contrast, the model’s XGBoost. The importance of a variable is the result of adding
noise is increased. It also has a limitation that the model up the gains that each variable contributed to the model’s
fitness inevitably deteriorates due to the increase in noise. accuracy in constructing the XGBoost model. As a result
It becomes an obstacle in constructing the fake news detection of performing XGBoost based on a total of 10 explanatory

VOLUME 11, 2023 71521


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

variables input, the importance of the derived variables is as common to record relatively low word similarity by mainly
follows FIGURE 1. and TABLE 1. using neutral expressions without overlapping or repeating
non-emotional words. However, when the word similarity is
high, there is a high probability that the writer of the message
wrote it without considering the context or repeatedly used
a word that appealed to the user’s emotions to mislead the
reader. Furthermore, with the advent of AI Bots that auto-
matically write tweets, word similarity is high even when
similar words are repeated without paraphrasing. We can also
suspect that an individual or a few intentionally spread the
information if the number of tweets on a specific topic is
too many on Twitter for a certain period. Therefore, it can be
inferred that the information is intentionally for the profit of
a few individuals, not to provide factual information. On the
contrary, Twitter user’s number of followers and followings
was relatively low importance. Therefore, this study intends
to establish a fake news detection model based on a machine
learning algorithm based on the four explanatory factors
(i.e., word sentiment, in-degree centrality, word similarity,
the number of total tweets) extracted as significant important
variables XGBoost.

B. ESTABLISHING FAKE NEWS DETECTION MODELS


USING MACHINE LEARNING
FIGURE 1. Feature importance of each variable.
A detection model based on a machine learning algorithm
is constructed using categorical binary variables as a depen-
dent variable to determine whether the tweet message is
TABLE 1. Feature importance of each variable.
fake news or not. Additionally, the highest performance rate
of the fake news detection model is selected by evaluating
the performance of classification models based on various
machine learning algorithms. A fake news detection model
is established based on five machine learning algorithms.
First, the Logistic Regression (LR) performed a stepwise
method analysis to overcome the limitation that the complex-
ity of the model increases as all variables are input. As a
result, when the AIC value was 540.1684, the best model with
the highest performance rate was constructed. Classification
and Regression Tree (CART), to increase the performance of
the prediction rate, the deviance value was adjusted 100 times
As shown in FIGURE 1. the most influential factor in repeatedly. As a result, the optimal model was established
identifying fake news among tweets is ‘word sentiment.’ The when the number of nodes was 6. To find the optimal value
more positive or negative the tweet’s tone is too one-sided, the of deviance, tree pruning was repeatedly performed. This
more it needs to be suspected as fake news. It was found that iterative pruning process prevents the node from being split
Twitter users’ in-degree centrality, word similarity, and a total and stopped due to generating more nodes than necessary in
number of tweets sequentially influenced the construction the model or preventing performance degradation due to a
of a fake news detection model. As the in-degree centrality minimal number of nodes. Finally, 6 (deviance =7.75), the
formed by Twitter users appears as an explanatory variable best size value with the smallest deviation, was fixed as the
with high importance, it can be confirmed that how actively terminal node. In Neural Network (NNET), the parameters
individuals form relationships and interact with other users tested were size and decay. The optimal model was searched
in the Twitter network are essential factors in determining by controlling the number of such hidden layers, and as a
fake news. Word similarity is a significant criterion for detect- result, the three layers (size =1, decay =0.1) showed the
ing fake news. A high word similarity value is estimated best performance. Support Vector Machine (SVM) was ana-
if each tweet message repeatedly uses the same vocabulary lyzed using the nonlinear Radial Basis Function (RBF) of
and sentences with similar meanings. Therefore, in the case Gaussian Kernel. The parameters tested were Sigma and C.
of general information messages that deliver only the facts C is used for the soft margin cost function, which involves
without arbitrarily forming the tone of the message, it is trading error penalty for stability, while Sigma is the standard

71522 VOLUME 11, 2023


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

deviation. Through this, Sigma =0.1, C =10, which has the TABLE 3. Results of evaluation metrics of five machine learning
classifiers.
smallest difference between the training data and the result
of the evaluation data, was selected as the final model. To
establish Random Forest (RF), the model constructed through
the change of parameter values was implemented in the
evaluation data to evaluate the model’s performance. As a
result, the number of generated trees in the random forest
was set to 100 (ntree =100). The optimization of RF’s tuning
parameter ‘mtry’ is considered in conjunction with variable
selection in each node as 3. Accordingly, the prediction accu-
racy according to the misclassification rate, which is the final
classification performance of each model, is presented in the
following TABLE 2.
using all the given data set. Still, the prediction accuracy may
TABLE 2. Comparisons of prediction performance rates.
be lowered when new data is added. A representative way
to verify whether such overfitting occurs and to solve the
overfitting problem is to use some of the given data as training
data to build a model and use the rest as a test dataset for
the model [43]. In this study, 402 data sets were divided into
seven sets by setting the k value as 7 for each model of fake
news detection using five machine learning techniques.
We compare the accuracy, precision, recall, and F1-score
of the proposed each model to evaluate each model. Mul-
tiple evaluation metrics, including the accuracy, precision,
F1-score, recall, and specificity, were adopted to evaluate
the performances of the established model. The accuracy is
the ratio of the number of samples correctly classified to the
total number of samples in a given test dataset. The precision
is the ratio of the true positive samples to the sum of the
true positive and false positive samples. The F1-score is the
weighted average of precision and recall. The recall indicates
to the ratio of the true positive samples to the sum of the true
positive and false negative samples. The F1-score value is
used to evaluate the success of machine learning algorithms
[44]. The specificity is the true negative rate. The accuracy,
precision, F1-score, recall, and specificity are explained as
follows, where TP, TN, FP, and FN represent the numbers of
true positive, true negative, false positive, and false negative
samples in the confusion matrix, respectively.

Accuracy = TP + TN /TP + TN + FP + FN (4)


Precision = TP/TP + FP (5)
Recall = TP/TP + FN (6)
F1−score = 2 × Precision × Recall
/Precision + Recall (7)
Specificity = TN /FP + TN (8)
C. MODEL EVALUATION
As presented above, a fake news detection model was finally TABLE 3 shows that the accuracy, precision, recall, F1-score
established using five machine learning techniques. We and specificity of the proposed each model.
applied the k-fold cross-validation method as a method for
constructing model optimization. Recently, data resampling D. COMPARE THE PERFORMANCE RATE
methods such as k-fold cross validation and bootstrapping We simulated 1000 iterations of each model to analyze the
were applied to reduce the uncertainty of input dataset par- effect of data and model learning. Therefore, it measures how
tition [42]. As previously performed, high performance can frequently the classification model misclassifies in 1000 dif-
be achieved in the corresponding data when a model is built ferent simulations. In other words, ‘‘misclassification rates in

VOLUME 11, 2023 71523


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

1000 simulations’’ refer to a metric that shows how accurately sentiment, in-degree centrality, word similarity, the number of
a specific classification model performs. This type of simula- total tweets. ‘‘Feature Group B’’, include additional feature as
tion can help evaluate the accuracy of the classification model the number of followings based on ‘‘Feature Group A’’ and,
and can be used to improve the model’s performance. ‘‘Feature Group C’’, include the number of total retweets,
As a result of the iterations of the suggested model, the based on ‘‘Feature Group B’’. Then the performances of the
accuracy of the RF was about 94.1%, showing the best perfor- optimal feature group which was derived by XGBoost model
mance rate among the five models, while the accuracy of the and the other feature groups are compared with one another
NNET was the lowest at about 92.1%. In addition, the LR and and the results shown in TABLE 5. The results of the ablation
CART showed a performance rate of approximately 93.1% study confirmed that the combination of features in ‘‘Feature
and 92.8%, respectively. As described above, the average Group A’’, as we mentioned above, has the best performance.
value of the misclassification rate according to 1000 iterations
of each model and the performance according to the final TABLE 5. Results of ablation study on the rf model of each feature group.
prediction accuracy is presented in TABLE 4 and FIGURE 2.

TABLE 4. Misclassification rates in 1000 simulations.

VI. CONCLUSION
We constructed five classification machine learning models
to identify fake news spread and shared through Twitter and
compare their performance rate. The main findings of the
study can be summarized as follows. First, we derived the
feature importance of various explanatory variables estimated
to impact the identification of fake news spreading on Twitter.
Four major explanatory factors affecting fake news detec-
tion among various factors were finally extracted, and models
for each machine learning algorithm were constructed based
on those derived factors. These variables could be explained
to significantly contribute to the construction of a fake news
detection model in the following order: word sentiment,
in-degree centrality, word similarity, and a total number of
tweets.
Fake news detection models were established based on five
machine learning algorithms: LR, NNET, CART, SVM, and
FIGURE 2. Misclassification rates in 1000 simulations. RF with the top four derived variables. Second, the CART
model and the NNET model showed the highest performance
rate, about 94.6% among five classification machine learning
E. ABLATION STUDY models. The LR model and the SVM model, on the other
As an additional experiment to identify the impact of the hand, indicated about 91.3% performance rate, which had
different features we performed an ablation study for the the lowest prediction rate. Third, the performance of the
RF which was the best-performing model for each feature constructed models was evaluated with the misclassification
group. As with the prior model analysis, understanding the rate. As the primary purpose of this study is to identify
effect of each feature and which features are redundant is the optimal model for detecting fake news with the highest
important for future model development [45]. An ablation prediction rate, additional analysis was performed to compare
study can show the effect of removing specific features. the performance of each model. We established an evaluation
Furthermore, the study is needed to confirm results from the model in the following way to solve the data imbalance prob-
established XGBoost model. In particular, we have organized lem that occurs in constructing the optimal of each model.
each feature group, based on the feature importance, called Cross-validation, the entire data was divided into the training
the ‘‘Feature Group A, B, C’’, respectively. For example, and test data set and inputted into the model establishment
‘‘Feature Group A’’ include the number of the top four fea- process. The data imbalance problem was alleviated through
tures which were ranked in the XGBoost model. Therefore, the oversampling technique. Based on the data set config-
‘‘Feature Group A’’ include the feature as follows: word ured, a simulation was performed in which each model was

71524 VOLUME 11, 2023


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

repeated 1000 times. As a result, the RF model showed the each model. RF model showed the highest accuracy among
highest performance rate of 94.1% in identifying fake news the five models in this study, and it was verified that ensemble
among fake and fact news, whereas the NNET model showed model generally has the better performance than the single
the lowest prediction rate of about 92.1%. model. Ensemble models outperform single models as they
Based on the research results presented above, this study use multiple models together to improve the performance
derived the following contributions. First, this study is mean- of an algorithm, rather than using a single model. How-
ingful because it derived various variables that were not ever, ensemble models can be more complex and difficult
previously considered as factors for detecting fake news, to interpret than single models. This is because the final
including the word sentiment of fake news content. In par- prediction is generated by combining the outputs of mul-
ticular, according to previous studies, as it is known that fake tiple models, which can make it harder to understand how
news uses more words that can provoke negative emotions individual models contribute to the final result. Therefore,
in readers than news of general information, attention has it can be considered to use model selection techniques such as
been needed to the negative tone of fake news [38]. It can be grid search or Bayesian optimization to increase the accuracy
inferred that fake news appeals to the negative emotions of of fake news detection with the ensemble model including
readers and urges their acceptance, whereas news containing RF proposed in this study and to increase the explainability
fact uses neutral words that are relatively unbiased. This is of a single model with high contribution. It can help to
because fake news makes the reader aware of the user’s sense avoid including redundant or poorly-performing models in
of crisis about negative sentiments such as fear, anger, and the ensemble to identify the best combination of models and
anxiety. Similarly, this study found that the word sentiment, hyperparameters. Finally, it needs to be investigated how we
which measures whether a message has a strong negative or can establish autonomous fake news detection system with
positive tone, has the most significant influence on establish- not only the higher accuracy but also with explainable AI
ing a model for classifying fake news. Therefore, it can be model.
inferred that the type of word sentiment of the written tweet This study suggests the following contributions as inte-
is a significant factor in identifying fake news if it is written grating the main findings of results. This study identified
in a state that is overly biased in one direction. It means that that the number of total tweets of each social media account,
it is necessary to doubt fake news if a message contains too which has not been considered in past studies, is a significant
much positive meaning and the use of negative words. Fake factor on fake news detection. It can be inferred that users
news related to healthy food, for example, often uses too who upload more tweets than necessary, that are intentionally
many positive words to entice the reader to a variety of bodily made by AI-bots to automatically generate a large amount
positive effects that have not been scientifically verified. of content in a short time. Therefore, it suggests that in the
Second, this study identifies the relative feature impor- future, when developing a system to identify fake news, the
tance of variables, although various factors for determining number of uploaded contents of the user account need to be
fake news have been suggested through previous studies. considered and identified comprehensively to improve the
Today attempts to develop fake news detection systems and detection accuracy.
automated algorithms for classifying fake news are actively This study established a model of fake news spread
performed. However, only a few studies concentrate on users’ through Twitter. Twitter is different from other social media
attitudes toward fake news for constructing models to detect because it spreads mainly through short text messages, so it
fake news. Therefore, this study contributes to extending can be expected that there will be differences from other
future related studies in that the importance of variables social media in spreading fake news. Therefore, in future
among various factors was derived. This study can be a foun- research, we plan to collect data related to fake news
dation to develop the fake news detection systems that are distributed through other platforms to reduce the differ-
currently being introduced by various social media platforms. ence between social media platforms and generalize the
If all aspects are considered in designing the system, it has the research results. This study also could not consider vari-
advantage of being able to predict in various situations, but ous cultures by collecting fake news messages written in
the performance of the prediction rate is inevitably lowered only English without considering various languages. It has
due to the model fitness problem. Accordingly, the results been found that the cultures affect the users’ behaviors on
of this study are presented in a priority of their relative social media, such as the frequency of posting and the
importance while at the same time deriving various factors intentions of sharing messages [46]. It can be inferred that
that have not been dealt with before and reflecting them in the there can be differences in the strength of the relationship
model. This is expected to contribute to selecting factors that (i.e., the degree of centrality of users in social media) and
should be prioritized and considered in designing a system for the behaviors of accepting or spreading fake news according
detecting fake news in companies on social media platforms to the user’s cultural norms. Therefore, it provides the future
in the future. directions for improving the existed fake news detection
Third, the results of this study address the needs for per- system.
forming further research on ensemble models to detect fake While various algorithms for detecting fake news are being
news and XAI (Explainable AI) to extend the reliability of actively advanced, this study constructed AI models based

VOLUME 11, 2023 71525


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

on the derived priority factors from the perspective of social [20] V. L. Rubin, Y. Chen, and N. K. Conroy, ‘‘Deception detection for news:
capital theory. We proposed an optimized model for detecting Three types of fakes,’’ Proc. Assoc. Inf. Sci. Technol., vol. 52, no. 1, pp. 1–4,
2015.
fake news by reflecting the feature of the information receiver [21] R. J. Sethi, ‘‘Spotting fake news: A social argumentation framework for
and social network. It finally suggests the need to develop an scrutinizing alternative facts,’’ in Proc. IEEE Int. Conf. Web Services
algorithm with an excellent prediction rate and fully reflect (ICWS), Jun. 2017, pp. 866–869.
[22] M. Carter, M. Tsikerdekis, and S. Zeadally, ‘‘Approaches for fake content
the social network and the characteristics of participants who detection: Strengths and weaknesses to adversarial attacks,’’ IEEE Internet
maintain the network to develop an automated fake news Comput., vol. 25, no. 2, pp. 73–83, Mar. 2021.
detection system. [23] M. M. Wasko and S. Faraj, ‘‘Why should I share? Examining social
capital and knowledge contribution in electronic networks of practice,’’
REFERENCES MIS Quart., vol. 29, no. 1, pp. 35–57, Mar. 2005.
[24] J. Nahapiet and S. Ghoshal, ‘‘Social capital, intellectual capital, and
[1] C. Wu, F. Wu, Y. Huang, and X. Xie, ‘‘Personalized news recommendation: the organizational advantage,’’ Acad. Manage. Rev., vol. 23, no. 2,
Methods and challenges,’’ ACM Trans. Inf. Syst., vol. 41, no. 1, pp. 1–50, pp. 242–266, Apr. 1998.
Jan. 2023.
[25] P. Dawson, J. Scott, J. L. Thompson, and D. Preece, ‘‘The dynamics of
[2] X. Su, G. Sperlì, V. Moscato, A. Picariello, C. Esposito, and C. Choi,
innovation and social capital in social enterprises: A relational sense-
‘‘An edge intelligence empowered recommender system enabling cul-
making perspective,’’ in Proc. Massey Univ. Social Innov. Entrepreneur-
tural heritage applications,’’ IEEE Trans. Ind. Informat., vol. 15, no. 7,
ship Conf., 2011, pp. 177–191.
pp. 4266–4275, Jul. 2019.
[26] N. Lin, ‘‘Building a network theory of social capital,’’ Connections, vol. 22,
[3] F. Zhou, X. Xu, G. Trajcevski, and K. Zhang, ‘‘A survey of informa-
no. 1, pp. 28–51, 1999.
tion cascade analysis: Models, predictions, and recent advances,’’ 2020,
[27] C.-M. Chiu, M.-H. Hsu, and E. T. G. Wang, ‘‘Understanding knowledge
arXiv:2005.11041.
sharing in virtual communities: An integration of social capital and social
[4] S. Gaillard, Z. A. Oláh, S. Venmans, and M. Burke, ‘‘Countering the
cognitive theories,’’ Decis. Support Syst., vol. 42, no. 3, pp. 1872–1888,
cognitive, linguistic, and psychological underpinnings behind suscepti-
Dec. 2006.
bility to fake news: A review of current literature with special focus on
[28] D. R. Bild, Y. Liu, R. P. Dick, Z. M. Mao, and D. S. Wallach, ‘‘Aggregate
the role of age and digital literacy,’’ Front. Commun., vol. 6, Jul. 2021,
characterization of user behavior in Twitter and analysis of the retweet
Art. no. 661801.
graph,’’ ACM Trans. Internet Technol., vol. 15, no. 1, pp. 1–24, Mar. 2015.
[5] M. L. D. Vedova, E. Tacchini, S. Moret, G. Ballarin, M. DiPierro, and
L. de Alfaro, ‘‘Automatic online fake news detection combining content [29] B. Gonçalves, N. Perra, and A. Vespignani, ‘‘Modeling users’ activity on
and social signals,’’ in Proc. 22nd Conf. Open Innov. Assoc. (FRUCT), Twitter networks: Validation of Dunbar’s number,’’ PLoS ONE, vol. 6,
May 2018, pp. 272–279. no. 8, Aug. 2011, Art. no. e22656.
[6] E. C. Tandoc, Z. W. Lim, and R. Ling, ‘‘Defining ‘‘fake news’’: A typology [30] M. Cha, H. Haddadi, F. Benevenuto, and P. K. Gummadi, ‘‘Measuring
of scholarly definitions,’’ Digit. Journalism, vol. 6, no. 2, pp. 137–153, user influence in Twitter: The million follower fallacy,’’ in Proc. ICWSM,
Feb. 2018. vol. 10. nos. 10–17, 2010, p. 30.
[7] H. Kim, ‘‘An exploratory study on fake news using topic modeling: [31] X. Chen, Z. Chong, P. Giudici, and B. Huang, ‘‘Network centrality effects
Focused on fake news published in the online journalism,’’ M.S. thesis, in peer to peer lending,’’ Phys. A, Stat. Mech. Appl., vol. 600, Aug. 2022,
School Manag. Inf. Syst., Kookmin Univ., Seoul, South Korea, 2017. Art. no. 127546.
[8] A. K. Cybenko and G. Cybenko, ‘‘AI and fake news,’’ IEEE Intell. Syst., [32] R. Garcia-Gavilanes, D. Quercia, and A. Jaimes, ‘‘Cultural dimensions
vol. 33, no. 5, pp. 1–5, Sep. 2018. in Twitter: Time, individualism and power,’’ in Proc. 7th Int. AAAI Conf.
[9] N. Seddari, A. Derhab, M. Belaoued, W. Halboob, J. Al-Muhtadi, Weblogs Social Media, 2013, pp. 195–204.
and A. Bouras, ‘‘A hybrid linguistic and knowledge-based analysis [33] D. Gayo-Avello, ‘‘Nepotistic relationships in Twitter and their impact
approach for fake news detection on social media,’’ IEEE Access, vol. 10, on rank prestige algorithms,’’ Inf. Process. Manage., vol. 49, no. 6,
pp. 62097–62109, 2022. pp. 1250–1280, Nov. 2013.
[10] A. Galli, E. Masciari, V. Moscato, and G. Sperlí, ‘‘A comprehensive [34] E. Tsourougianni and N. Ampazis, ‘‘Recommending who to follow on
benchmark for fake news detection,’’ J. Intell. Inf. Syst., vol. 59, no. 1, Twitter based on tweet contents and social connections,’’ Social Netw.,
pp. 237–261, Aug. 2022. vol. 2, no. 4, pp. 165–173, 2013.
[11] A. P. Salazar, ‘‘AI tools on fake news detection: An overview and com- [35] H. Kwak, C. Lee, H. Park, and S. Moon, ‘‘What is Twitter, a social network
parative study,’’ Graduate School Technol. Univ. Philippines, Manila, or a news media?’’ in Proc. 19th Int. Conf. World Wide Web, Apr. 2010, pp.
Philippines, Tech. Rep., 2020. 591–600.
[12] A. Kim, I. University, A. R. Dennis, and I. University, ‘‘Says who? The [36] D. Kim and Y. Lee, ‘‘News based stock market sentiment lexicon acqui-
effects of presentation format and source rating on fake news in social sition using Word2Vec,’’ Korea J. BigData, vol. 3, no. 1, pp. 13–20,
media,’’ MIS Quart., vol. 43, no. 3, pp. 1025–1039, Jan. 2019. Aug. 2018.
[13] N. K. Conroy, V. L. Rubin, and Y. Chen, ‘‘Automatic deception detection: [37] Y. Dang, Y. Zhang, and H. Chen, ‘‘A lexicon-enhanced method for sen-
Methods for finding fake news,’’ Proc. Assoc. for Inf. Sci. Technol., vol. 52, timent classification: An experiment on online product reviews,’’ IEEE
no. 1, pp. 1–4, Jan. 2015. Intell. Syst., vol. 25, no. 4, pp. 46–53, Jul. 2010.
[14] H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, and Y. Choi, ‘‘Truth of [38] D. M. J. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill,
varying shades: Analyzing language in fake news and political fact- F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook, D. Rothschild,
checking,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., M. Schudson, S. A. Sloman, C. R. Sunstein, E. A. Thorson, D. J. Watts,
2017, pp. 2931–2937. and J. L. Zittrain, ‘‘The science of fake news,’’ Science, vol. 359, no. 6380,
[15] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea, ‘‘Automatic pp. 1094–1096, 2018.
detection of fake news,’’ 2017, arXiv:1708.07104. [39] A. Dey, ‘‘Machine learning algorithms: A review,’’ Int. J. Comput. Sci. Inf.
[16] M. Granik and V. Mesyura, ‘‘Fake news detection using naive Bayes clas- Technol., vol. 7, no. 3, pp. 1174–1179, 2016.
sifier,’’ in Proc. IEEE 1st Ukraine Conf. Electr. Comput. Eng. (UKRCON), [40] T. Chen and C. Guestrin, ‘‘XGBoost: A scalable tree boosting system,’’
May 2017, pp. 900–903. in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
[17] Y. Chen, N. J. Conroy, and V. L. Rubin, ‘‘Misleading online content: Rec- Aug. 2016, pp. 785–794.
ognizing clickbait as ‘false news,’’’ in Proc. ACM Workshop Multimodal [41] L. Torlay, M. Perrone-Bertolotti, E. Thomas, and M. Baciu, ‘‘Machine
Deception Detection, Nov. 2015, pp. 15–19. learning–XGBoost analysis of language networks to classify patients with
[18] C. Chen, K. Wu, V. Srinivasan, and X. Zhang, ‘‘Battling the Internet water epilepsy,’’ Brain Inform., vol. 4, no. 3, pp. 159–169, Sep. 2017.
army: Detection of hidden paid posters,’’ in Proc. IEEE/ACM Int. Conf. [42] X. C. Nguyen, T. T. H. Nguyen, D. D. La, G. Kumar, E. R. Rene,
Adv. Social Netw. Anal. Mining (ASONAM), Aug. 2013, pp. 116–120. D. D. Nguyen, S. W. Chang, W. J. Chung, X. H. Nguyen, and V. K. Nguyen,
[19] Z. Jin, J. Cao, Y. Jiang, and Y. Zhang, ‘‘News credibility evaluation on ‘‘Development of machine learning–based models to forecast solid waste
Microblog with a hierarchical propagation model,’’ in Proc. IEEE Int. generation in residential areas: A case study from Vietnam,’’ Resour.,
Conf. Data Mining, Dec. 2014, pp. 230–239. Conservation Recycling, vol. 167, Apr. 2021, Art. no. 105381.

71526 VOLUME 11, 2023


M. Park, S. Chai: Constructing a User-Centered Fake News Detection Model

[43] Y. Bengio and Y. Grandvalet, ‘‘No unbiased estimator of the variance of MINJUNG PARK received the M.S. and Ph.D.
K-fold cross-validation,’’ J. Mach. Learn. Res., vol. 5, pp. 1089–1105, degrees in data analytics and business admin-
Dec. 2004. istration from Ewha Womans University, Seoul,
[44] F. Khan, I. Tarimer, H. S. Alwageed, B. C. Karadağ, M. Fayaz, South Korea, in 2016 and 2021, respectively. Her
A. B. Abdusalomov, and Y.-I. Cho, ‘‘Effect of feature selection on the current research interests include artificial intelli-
accuracy of music popularity classification using machine learning algo- gence (AI), big data analytics, blockchain, infor-
rithms,’’ Electronics, vol. 11, no. 21, p. 3518, Oct. 2022. mation security, and privacy.
[45] D. Kauchak, O. Mouradi, C. Pentoney, and G. Leroy, ‘‘Text simplification
tools: Using machine learning to discover features that identify difficult
text,’’ in Proc. 47th Hawaii Int. Conf. Syst. Sci., Jan. 2014, pp. 2616–2625.
[46] I. Pentina, L. Zhang, and O. Basmanova, ‘‘Antecedents and consequences
of trust in a social media brand: A cross-cultural study of Twitter,’’ Comput.
Hum. Behav., vol. 29, no. 4, pp. 1546–1555, Jul. 2013.
[47] N. Karimi and J. Gambrell. (Mar. 27, 2020). Hundreds Die of Poisoning in
Iran as Fake News Suggests Methanol Cure for Virus. The Times of Israel,
erusalem, Israel. [Online]. Available: https://url.kr/w3c8hd SANGMI CHAI received the M.S. degree in
[48] K. Miller. (Feb. 13, 2023). Human Writer or AI? Scholars Build a Detection business administration from Seoul National Uni-
Tool. Stanford Univ. Hum.-Centered Artif. Intell., Stanford, CA, USA. versity, South Korea, and the Ph.D. degree in
[Online]. Available: https://url.kr/w3c8hd management information systems from the School
[49] A. A. A. Ahmed, A. Aljabouh, P. K. Donepudi, and M. S. Choi, ‘‘Detecting of Management, The State University of New York
fake news using machine learning : A systematic literature review,’’ 2021, at Buffalo. She is currently a Professor with
arXiv:2102.04458.
the School of Business, Ewha Womans Univer-
[50] J. Botha and H. Pieterse, ‘‘Fake news and deepfakes: A dangerous threat
sity, South Korea. She has published her articles
for 21st century information security,’’ in Proc. 15th Int. Conf. Cyber
Warfare Secur. (ICCWS). New York, NY, USA: Academic Conferences and
in Journal of Management Information Systems,
publishing limited, Mar. 2020, pp. 1–10. International Journal of Production Economics,
[51] W. Y. Wang, ‘‘‘Liar, liar pants on fire’: A new benchmark dataset for fake International Journal of Logistics Management, Decision Support Systems,
news detection,’’ 2017, arXiv:1705.00648. IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, International Journal
[52] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, ‘‘FakeNewsNet: of Information Management, and Information Systems Frontiers. Her current
A data repository with news content, social context, and spatiotemporal research interests include artificial intelligence (AI), blockchain, privacy, and
information for studying fake news on social media,’’ Big Data, vol. 8, information security.
no. 3, pp. 171–188, Jun. 2020.

VOLUME 11, 2023 71527

You might also like