Detecting Bots

This document presents a literature review on the detection of social bots on Twitter, highlighting the increasing sophistication of these automated accounts and the need for effective detection methods. It discusses various detection techniques, including graph-based, crowdsourcing, and machine learning approaches, and evaluates their effectiveness using different datasets and classifiers. The paper also addresses the challenges in the field and suggests future research directions to improve social bot detection technologies.

Uploaded by

Diego Cano Gómez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Detecting Bots

Uploaded by

Diego Cano Gómez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Detecting Social Bots on Twitter: A Literature Review

Eiman Alothali*1, Nazar Zaki2, Elfadil A. Mohamed3, and Hany Alashwal4
1, 2, 4
College of Information Technology
United Arab Emirates University, AlAin, UAE
Email: {201790016, nzaki, halashwal}@uaeu.ac.ae
3
College of Information Technology, Ajman University, UAE
Email: [email protected]

Abstract—Due to the exponential growth in the popularity of online social In an attempt to limit the threats posed by social bots, researchers have
networks (OSNs), such as Twitter and Facebook, the number of machine accounts proposed different methods by which social bots can be detected and blocked.
that are designed to mimic human users has increased. Social bots accounts (Sybils) The majority of the studies in this domain to date have focused on studying
have become more sophisticated and deceptive in their efforts to replicate the
behaviors of normal accounts. As such, there is a distinct need for the research
behavior patterns [18]-[22]. For example, a recent study that was performed
community to develop technologies that can detect social bots. This paper presents a by Fu et al. [23] proposed a dynamic metric to measure the change in users’
review of the recent techniques that have emerged that are designed to differentiate activities as a means of identifying the strategies employed by spammers.
between social bot account and human accounts. We limit the analysis to the Another detection scheme aimed to identify malicious account groups by
detection of social bots on the Twitter social media platform. We review the various understanding the algorithms associated with the generated account names
detection schemes that are currently in use and examine common aspects such as and subsequently relating these to creation time [24]. This study analyzed 4.7
the classifier, datasets, and selected features employed. We also compare the million accounts that were collected from Twitter and achieved reasonable
evaluation techniques that are employed to validate the classifiers. Finally, we
highlight the challenges that remain in the domain of social bot detection and
accuracy. In the same area, Stringhini et al. [8] studied data from three large
consider future directions for research efforts that are designed to address this social networks after creating large and diverse honey-profiles. They
problem. successfully detected 15,857 spam profiles that had been deleted by Twitter.
It is important to note that not all social bot accounts can be classified as
Keywords—Social Bots; Twitter; Detection; Sybil malicious accounts. Some even explicitly state their nature in the profile of the
account. Social bots that operate without malicious intent may serve positive
purposes, such as managing news feeds or acting as customer care responders.
I. INTRODUCTION The problem we are concerned with in this paper is undisclosed social bots
Online social networks (OSN) represent a global platform through which that have malicious intentions. As outlined above, these social bots can pose
people share and promote products, links, opinions, and news. In the third fundamental financial, social, political, and security risks. They have become
quarter of 2007, Twitter had 330 million active users [1]. By 2015, the increasingly sophisticated in their designs and capabilities to avoid social bot
estimated number of users had grown to 1.3 billion [2]. The data-sharing detection techniques [5], [14]. A study by Freitas et al. [25] found that only 38
feature of social networks allows users to distribute content and links; out of every 120 social bots were detected and removed by Twitter.
however, this feature is also commonly used by spammers and fraudsters. In light of the above, there is a requirement to gain in-depth insights into
Social bot accounts make OSNs vulnerable to adversaries. Social bots are the capabilities and limitations of the social bot detection techniques that are
programs that automatically generate content, distribute it via a particular currently in use on the Twitter social network platform. By comparing and
social network, and interact with its users [3]. According to a recent study by evaluating the existing approaches, we can develop an understanding of the
Varol et al., between 9% and 15% of Twitter accounts are bot accounts [4], different solutions that are available based on the selected features and trained
which is the equivalent of 48 million accounts [2]. A further study found that classifiers, and can subsequently apply this understanding to identify which
social bots are responsible for generating 35% of the content that is posted on technologies achieve the best accuracy and detection results.
Twitter [5]. The rest of this paper is organized as follows. The review methodology is
Many studies have aimed to address the problems associated the use of presented in Section II. In Section III, we evaluate the datasets employed in
automated accounts on social networks [6][7], [8], [9], which can spread existing studies. Section IV then progresses to identify the methods by which
spam, warms, and phishing links or manipulate legitimate accounts by social bots can be detected. Section V presents a discussion and evaluation of
hijacking and deceiving users [10]-[12]. Malicious accounts typically operate the techniques mentioned in the previous section. Section VI highlights the
under a botmaster, who controls a group of social bots to distribute spam or challenges that remain in the domain of social bot detection and considers
manipulate behaviors on a given social network [13]. For example, in Syria, a future directions for research efforts that aim to address this problem.
social bot was employed to flood Twitter with hashtags related to the Syrian
civil war with irrelevant topics that redirected the attention of users from
controversial government actions [5]. Social bots have also played a II. REVIEW METHODOLOGY
significant role in the uprisings that occur in the aftermath of major events In this paper, we focus on studying the detection techniques that are
such as elections or conflicts [14]. Gupta et al. [15] studied the fake content commonly employed to detect social bots or fake accounts on the Twitter
that was proliferated via Twitter during the Boston Marathon blasts and the social network platform. The analysis does not include alternative social
role such content played in spreading rumors and misinformation. They found networks, such as Facebook or Tumbler, or other malicious activities and
that bot accounts were created and generated after the blasts, many of which problems such as spam or hijacking.
impersonated real accounts [15]. The malicious activities of bots during
events such as these can be used to spread spam. In addition, they can also
cause financial harm, as was observed in the case of Cynk, which suffered a III. DATASETS & PREPROCESSING
220-fold drop in market price as a result of the activities of automated stock
trading social bots [3]. The approaches that underpin social bot detection techniques can vary
The activities of social bots also impact the social graph of OSNs because significantly. However, they can broadly be categorized into three common
of the large number of non-genuine social relationships. If social bots methods: graph-based, crowdsourcing, and machine learning [3], [38]. The
successfully infiltrate users’ accounts, they can harvest social bot private data process of detecting social bots commences by retrieving data from the
and subsequently use it for phishing and spamming activities [9] [16]. In Twitter stream. Once the data is collected, the next step involves preparing
addition, they can aggregate information from the web to impersonate others, this data for the chosen classifier by extracting and selecting the features that
replicate human behaviors, and influence people by ranking and retweeting. In can be studied through statistical methods, as [26], [27], or those that can be
addition to essentially misleading users, social bots can damage the ecosystem manually labelled using previous work, as [6], [28], [29].
of the social network by establishing fake fellowship relations [17] and/or In this section, we review the datasets that were employed in the studies
poisoning the network content. identified in the literature review and determine whether they use public or
private datasets. Through the use of a graph-based method, we identify the

!!!""#$% &&& 175

social graph data employed and the total number of nodes for sampling both and malicious accounts [38]. The first method is based on trust propagation,
classes: Sybil and legitimate. In the machine-learning methods analysis, we which evaluates whether the trust relationship that exists between two graph
identify the tools employed in the data collection process and assess the total objects is strong or weak. The second method is graph clustering, by which
number of accounts included in the testing phase. In addition, we state the related nodes of a social graph are grouped based on similar characteristics
number of features that were involved in the preprocessing phase. such as users’ distance. The third method involves studying graph metrics and
Within their graph-based approach, [30] and [31] used a public dataset properties, where probability distribution, scale-free graph structure, and
that was compiled using data from previous studies. For example, the authors centrality metric measures are addressed in a social graph.
in [31] used a sample of 100 Sybil nodes and a sample of 100 benign nodes In this subsection, we present three graph-based Sybil detection systems
for the synthesized social network in addition to a real dataset from Twitter to that were evaluated using datasets from Twitter. These three systems were
compare their proposed bot-detection method with other random walk-based chosen because they involved the analysis of a Twitter dataset. They are
efforts. They employed a Twitter dataset sample of 50,000 nodes for the Sybil presented in chronological order, starting with the most recent work.
region and 50,000 for benign nodes for training and testing after processing a SybilWalk [31] is a proposed Sybil detection method that employs a
dataset that contained 41,652,230 nodes and 1,202513,046 edges. Moreover, random walk-based method on an undirected social graph. The idea of the
to complete their dataset for experimenting, both [32], [30] purchased a random walk method is to label legitimate users with benignness scores and
number of fake Twitter accounts to implement within the Sybil social network Sybil users with badness scores. Therefore, these scores will help to classify
region. users into two classes: legitimate and Sybil. In addition, this method can rank
In studies that have focused on machine learning methods, the reviewed all users as a means of identifying top-ranked accounts that are likely to be
studies used datasets that consisted of a combination of privately obtained Sybils. The authors of SybilWalk assumed that the graph satisfies the
accounts and the public datasets that were employed in previous studies (See homophily property, for which two linked nodes tend to share the same label.
Table I). In certain cases, some researchers, such as [27] and [35], used the They labeled the legitimate node badness score with 0 and the Sybil node with
available public datasets as a ground truth baseline for testing their techniques. 1 and employed a directed Twitter graph dataset that was obtained from a
In general, most of the research employed the Twitter API to collect data and previous study [43]. To evaluate their experiment, the authors used the Area
compile the datasets with the exception of [33], who used their own API to Under the Receiver Operating Characteristic Curve (AUC) as a standard
collect data [34]. Feature selection methods are commonly applied to increase metric to measure the quality of their ranking method, which they awarded
the speed of the classier, reduce the training time, improve generalization, and with a score of 0.96. They also presented the classification results of Sybil and
avoid the overfitting problem. For example, as part of their preprocessing legitimate nodes in the form of the false positive rate (FPR) and false negative
phase, [27] used a correlation-based system in combination with a principal rate (FNR) of 1.3% and 17.3% respectively.
components analysis method. The selected features were then analyzed using Mehrotra et al. proposed a method to detect fake followers using social
a cumulative distribution function for each selected feature. graph-based features that relate to the centrality of all nodes in the graph [30].
They claimed that their proposed method can be applied to all social
networking platforms. They employed five datasets, two of which were of the
IV. SOCIAL BOT DETECTION METHODS legitimate follower type and the remaining three of fake followers. They used
six features of centralities of graph-based centrality measurements for the
Generally, social bot detection on social networks is performed by one or
purposes of the classification. After computing the centrality measures of all
more of the three common methods mentioned earlier: Graph-based,
the given nodes in the graph, they applied three classifiers: Artificial Neural
crowdsourcing, and machine learning.
Networks, Decision Tree, and Random Forest. The random forest classifiers
The graph-based method involves using the social graph of a social
scored the highest accuracy of 95%, with the precision of 88.99%, and recall
network to understand the network information and the relationships between
of 100%.
edges or links across accounts to detect bot activity. The crowdsourcing
TrueTop [32] is another influence measurement system that employed a
method involves using expert annotators to identify, evaluate, and determine
graph-based approach to test Sybil resilience. The authors employed a
social bot behaviors. Finally, the machine learning method involves
synthetic simulation of users on Twitter to avoid violating the platform’s
developing algorithms and statistical methods that can develop an
terms of service. They employed four datasets to implement the system,
understanding of the revealing features or behavior of social network accounts
evaluate the accuracy of the model, and test Sybil resilience against a set of
in order to distinguish between human- and computer-led activity.
predetermined metrics. They presented a model of the strength of Sybil
In this section, we provide an overview of the three methods that have
attacks based on the α parameter, which represented the ratio of total weight
been used by researchers to detect social bot accounts on Twitter. In each
of the edge in the non-Sybil region against that of the Sybil region. They
subsection, we discuss the related studies and the datasets, detection
assumed the worst-case scenario of Sybil attacks in which there was no
mechanisms, classifiers involved, and the process by which the results were
interaction between the two regions.
validated.
B. Crowdsourcing
A. Graph-Based Detection As previously described, the crowdsourcing approach to social bot
detection involves leveraging human detection to identify patterns across
Social network graphs are commonly employed to understand and
given account profiles or the content shared by human and social bot
distinguish between users’ relationships on social networking platforms.
accounts. The role of the human is to distinguish between bot accounts and
Three social graph-based methods are typically employed to detect social bots

176

BeDM [35] is a proposed behavior-enhanced deep learning model for bot

detection in social networks. It implements convolutional neural network
(CNN) layers with LSTM long-short-term memory and hidden layers to
capture the latent temporal patterns of users’ tweet history and behavior. They
human accounts. For example, DARPA held a Twitter bot challenge used a public dataset in their experiment and an additional collection of 1000
competition in March 2015[6]. The competition involved identifying tweets using the Twitter API. They employed tenfold cross-validation with
influential bots that supported pro-vaccination discussions on Twitter to serve measuring precision, recall, and F1 scores to evaluate their model and
as ground truth. Teams were asked to submit their guesses to a web server, compare it against similar baselines. The BeDM performance achieved the
which calculated and presented the results in real-time. All teams used human highest in the F1 score at 87.32% in comparison to alternative baselines[34],
judgment to identify bots after using their own implementation of bot [44] [45]. The recall score was 86.26%, and the precision was 88.41%.
detection techniques based on the features they had chosen to inform their Davis et al. proposed a system called BotOrNot [26] that employed the
guesses. Three teams out of the six scored the highest in terms of their ability random forest classifier to evaluate social bots. Their classification system
to detect the bot accounts. extracted more than 1000 features from 6 main classes through which they
Another use of human annotation in bot detection involved constructing analyzed network features, user features, friends features, temporal features,
ground truth datasets [33], [27]. Four annotators were tasked with classifying content features, and sentiment features. They used tenfold cross-validation to
and labeling Twitter profiles into two categories, bot or human, and providing measure their system performance and scored 95% AUC. In 2017, they
a justification for their choice [33]. They were provided with a list of extended their work to improve the accuracy of the evaluation through the use
attributes or features to consider, such as creation date of account, number of of a new training data [4]. The new system achieved 0.85% AUC less than
tweets, number of favorited tweets, etc. To ensure the reliability of the their previous results. This was attributed to the challenging sample of bot
annotation experiments, the researchers employed Cohen’s kappa (k) accounts that were added to evaluate their classifier. They repeated the
coefficient and average pairwise inter-annotator agreement across all dataset experiment with two datasets and achieved 0.94% AUC in detecting simple
bands. and sophisticated bot accounts. Moreover, they tested their system with 14
A study by Alarifi et al. involved 10 volunteers rating and labeling 2000 million accounts to estimate the fraction of bot population. The results
random accounts to build a ground truth dataset [27]. They evaluated the indicated that this ranged between 9% and 15% [4], as mentioned earlier in
accuracy and reliability of their labeling process by injecting their ground this article. More interestingly, they used clustering analysis to group accounts
truth dataset with a subset of 1020 Twitter bot accounts from another ground according to behavior, and they identified three types of bot accounts:
truth reference. They achieved an accuracy of 96% with 4% error rate during spammers, self-promoters, and accounts that use applications to post content.
the labeling process. Gilani et al. classified Twitter accounts into automated agents and human
users [33]. They used their own platform, Stweeler [34], to collect their data.
C. Machine Learning They collected 2.5 and 3 million tweets a day and partitioned their dataset into
In this section, we present a review of the different machine learning four subsets: 10M, 1M, 100 K, and 1K, each of which represented the
methods that have been employed in recent social bot detection efforts (see popularity of the account based on the number of followers. They used human
Table II). The objective of machine learning techniques is to solve the annotation for the labeling process and Cohen’s kappa coefficient to maintain
problem through the use of large amounts of data that have many variables. the reliability of the annotator judgments. The total number of accounts
Using machine-learning techniques can facilitate the detection of behavioral included in the testing phase was 3,536 across the four bands. After
patterns based on the features of users’ accounts to ascertain the likelihood of performing a statistical calculation, the authors extracted 15 features and
those accounts being bots or human [3]. employed the random forest classifier. They performed three sets of
A framework for bot detection on Twitter was proposed by Kantepe and experiments through which they ran fivefold cross-validation by training and
Ganiz, who applied machine learning algorithms after an extensive process of testing. The accuracy rate was 86.44%, precision was 85.4%, recall was
data preprocessing and feature extraction [28]. They employed the Twitter 82.2%, and F-measure was 83%. They found 6 features scored the highest
API and Apache Spark to collect the data. They collected 1,800 Twitter among the 15 features.
accounts and extracted 62 features that were categorized into three types: User Alarifi et al. analyzed the detection features of bot accounts. They
Features, Tweet Features, and Periodic Features. They used percentages of 60- collected data that consisted of 1.8 million accounts and then randomly
40% and 70-30% and 90-10% for the training set and test set. They used four selected 2000 accounts for the sample after manually labeling them into
classifiers: logistic regression, multinomial naive-byes, support vector human, bot, and hybrid accounts [27]. They employed two feature methods to
machine, and gradient-boosted trees. The highest accuracy result was 86% for extract effective features and used principal component analysis and
gradient-boosted trees, and the F1 score was 83%. correlation-based methods. They selected eight features to evaluate their
A study by Ersahin et al. presented a classification method to detect fake models by applying four machine learning algorithms: decision tree, Bayesian
accounts on Twitter by testing the dataset using entropy minimization network, support vector machine, and multilayer artificial neural network. For
discretization (EMD) [29]. They tested their data before and after the the purpose of performance measurements, they used six indicators: detection
discretization was performed and employed the naive base classifier and F- rate, error rate, TP/FP, precision, recall, and F-measure. Both random forest
measure to calculate the prediction accuracy of the system. The results were and Bayes net classifiers performed better with 88% and 86.74% in sequence.
85.5% before the proposed technique was applied and this was improved to DeBot is a bot detection system that used a pairwise approach by
90.41% after the preprocessing process using the discretization technique on applying a lag-sensitive hashing technique to cluster user accounts into
selected features. correlated sets in real time [36][39]. They employed dynamic time warping to

177

sample that is well defined in terms of size and content can be very difficult to
achieve. Therefore, many researchers employ human annotation of a
reasonable training sample to perform this task, even though it takes time and
is prone to human error. One solution to this problem that some studies have
identified is using accounts that Twitter has suspended for use as social bots.
However, this solution is not significantly accurate because human users are
sometimes suspended for violating Twitter’s terms of use. In addition, this
approach will rely on the researcher’s ability to obtain the data for suspended
accounts, and this is not readily available.

B. Common Features
Social bot detection is based on classifications of selected features to sort
accounts into either legitimate or bot accounts. However, the studies reviewed
in this paper highlight how common features are used to detect social bot
accounts. These include factors related to timing, automation, text use,
sentiment, and clickstream behavior. Therefore, we cannot assume a social bot
depends on one feature without addressing the other features [37]. In Table
III, we summarize the common features that are extracted from a full set of
capture the correlation between posting activities using a cross-correlation- features in the reviewed papers to measure the likelihood of an account being
based random projection technique. As such, the synchronized behavior in a a human or bot. In general, the extracted features can address the network
sequence of 40 activities acted as an indicator of automated accounts. They features to identify the community features. We can also identify the social
calculated and compared their model against five alternative methods such as connections of users and ranking through performing content and behavioral
Twitter, BotORNot, etc. They achieved 94% precision in their generated daily analysis. For example, if an account is verified or protected, it is a logical
reports. indicator that it is a human account, not a bot account. The profile features
Chu et al. studied the features related to tweeting behavior, tweet content, that are extracted from the metadata, such as profile image, screen name, and
and account properties [37] to detect the automation of bots. They categorized description, may also indicate the nature of the account. For example, a
accounts into human, bot, and cyborg according to the investigated features. default profile image is a sign of a new user or a bot account [27]. The
Their classification system incorporated an entropy-based component to detect temporal pattern, such as the average of tweeting and retweeting ratios, for
regularity of timing to measure automation, a spam detection component in example, can be a sign of bot activity if it occurs with small inter-arrivals
the form of a Bayesian classification that detected text patterns as a means of [35][40]. Therefore, using an entropy component to detect behavior as part of
detecting spam, account properties, and a random forest classifier as a the classification system is essential.
decision maker. They collected data covering 512,407 accounts using the In addition, the rate of posting similar content with URL can be an
Twitter API. They constructed their ground truth sample by using 6000 indicator of a spammer [10][41]. In other words, the URL feature can be used
accounts divided equally per human, bot, and cyborg. They extracted 8 to detect the link farming behavior that is typically employed by spammers
features and implemented a random forest classifier with tenfold cross- and bot accounts [42]. Also, using the mention feature in association with the
validation. They employed a confusion matrix to measure the system URL and number of link feature and entropy of tweets can indicate a bot
performance and achieved an average score of 96%. account with malicious intention [7]. Moreover, if the number of followers is
high yet the account is relatively new, it’s likely that the followers are fake
V. DISCUSSION and the account is a bot.
Thus far, this paper has examined some of the approaches that have been
employed to detect the activities of social bots on Twitter. As mentioned
earlier, social bot accounts are more deceptive than ever before, and it is
C. Methods Employed to Detect Bots
becoming increasingly difficult to develop systems that can detect these The literature review of the recent studies that have been performed in
applications. To make progress in this area, there is a need to consider the this domain highlights how different approaches to detecting social bots have
main challenges and factors that impact social bot detection activities, as been implemented. The main methods focus on the primary components of
understanding these challenges will facilitate the development of new social networks, such as network structure, content, and behavior features. In
technologies that can address the issues that are at play. In this section, we general, the content and behavioral characteristics of bot accounts are
highlight the factors that commonly represent challenges in social bot employed in off-the-shelf machine learning algorithms [4]. This section
detection. These factors are datasets, common features, methods employed, evaluates some of the methods that were most commonly used within the
and performance measures. studied papers to detect social bot accounts. Table IV presents a summary of
the advantages and disadvantages of each method (See Table IV).
A. Datasets: Using a graph-based method, [31] studied the trust propagation of the
To study and understand the behavior of social bots in comparison to network in which the ranking of users is easier based on the trust scores.
human behavior in social networks, it is essential to maintain datasets that However, this approach is sensitive to the selection of trust seeds, and it works
consist of both human and bot accounts. In Table I, the difference in data sizes based on assumptions, which are not always accurate. Using the same method,
across the reviewed papers is obvious. Researchers encounter two issues with [32] and [30] measured the influence of social users based on the social
datasets. The first of these concerns the availability of recent public datasets network graph. This approach is useful for visualization and measuring the
on which to perform experimentation. Some studies use their own platform to influence rate of a social network based on the distance measurement of the
collect data as a solution to avoid this issue and avoid situations in which they centrality of the influencer nodes. However, the computation cost can be high
have a limited amount of Twitter API request per hour during the process of if the targeted network is large and real. Therefore, using a synthetic network
collecting the data, as was the case in [34]. However, it usually takes time to to apply this approach is useful, even though the results in the real network are
collect data that is of a decent size and contains enriched content. For sometimes unpredictable.
example, the maximum tweets per user that the API can provide is 3200. In The crowdsourcing method can be employed to effectively build ground
this regard, a good number of studies have established a system of sharing truth data and annotation tasks. This approach employs human intelligence to
processed datasets so that other researchers can then use as a baseline or for identify different patterns. The problem with this approach is that it consumes
comparative purposes. However, the datasets that are available will be limited time and is prone to human error. However, as described in Section III, there
to the features and size in which they were issued to avoid violating the are some solutions by which the annotation task can be validated during the
privacy of the users. preprocessing phase to maintain the best results.
The second issue relates to developing a trained dataset that is diverse in The survey of the existing literature revealed that researchers are more
terms of the content of the bot accounts. This is especially significant in likely to employ machine learning methods than the other two approaches.
studies that employ a machine learning approach to bot detection. Labeling a The majority of reviewed papers used tree-based approaches and Bayes-
theorem. The random forest classifier was the most commonly employed

178

classifier within the reviewed methods. The advantages of this classifier are sophisticated mechanisms to avoid detection and researchers have yet to
that it is less complex in terms of tuning and achieves a more accurate develop viable methods by which such mechanisms can be identified. As
performance. However, the complexity of the tree will generate overfitting, as such, there is a requirement for ongoing studies into bot detection approaches.
is the case with most decision tree algorithms. Bayes-theorem and random Twitter is encouraged to develop systems that can recognize automated tweets
forest were widely used in the studies described in the literature. Bayes- and tag them with a unified label so that they can be readily identified by
theorem as a statistical theorem is fast in terms of training and prediction time. users. The research community is encouraged to collaborate to build a
However, the performance of this classifier is better when the data set periodically updated public dataset that includes recently detected bots.
contains a relatively low number of features. Many researchers have
employed a support vector machine to reduce the error rate in the
classification process. However, SVM depends on the selective kernel and REFERENCES
parameter. In addition, a major disadvantage of this classifier is that it depends [1] Twitter, October 2017. [Online]. Available: https://investor.twitterinc.com/results.cfm
on the use of a large training set to increase performance. Similar to SVM,
[2] C. Smith. (2017, November) 400 amazing twitter statistics and facts. [Online]. Available:
neural network effectiveness depends on the sample size; when the sample
https://expandedramblings.com/index.php/
size is large, the vector performs well. One detection method that was applied
in the literature was that of the pairwise similarity technique [36]. This [3] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise of social bots,”
method can effectively detect bots based on the similarity of the profile Communications of the ACM, vol. 59, no. 7, pp. 96-104, 2016.
activity. However, the extent to which the method can be scaled depends on [4] O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A. Flammini, “Online human-bot
storing and analyzing the user’s history. interactions: Detection, estimation, and characterization,” arXiv preprint arXiv:1703.03107,
2017.
D. Performance Measures: [5] N. Abokhodair, D. Yoo, and D. W. McDonald, “Dissecting a social botnet: Growth, content and
Within the investigated studies, different performance measurements influence in twitter,” in Proceedings of the 18th ACM Conference on Computer Supported
have been used to evaluate social bot detection classification techniques. The Cooperative Work & Social Computing. ACM, 2015, pp. 839-851.
approach that is commonly used to measure performance in these articles is [6] V. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan, K. Lerman, L. Zhu, E. Ferrara, A.
the accuracy rate, which relates to the percentage of accounts that are Flammini, and F. Menczer, “The darpa twitter bot challenge,” Computer, vol. 49, no. 6, pp. 38-
correctly classified with respect to the whole sample. However, using the 46, 2016.
accuracy rate alone is not sufficient to evaluate the chosen classifier. The Chu, [7] C. Grier, K. Thomas, V. Paxson, and M. Zhang, “@ spam: the underground on 140 characters or
Zi et al. study evaluated the accuracy of each feature, and the result was less,” in Proceedings of the 17th ACM conference on Computer and communications security.
meaningless when compared to using a confusion matrix to evaluate the ACM, 2010, pp. 27-37.
whole features for each class [37].
Table V presents the list of the performance measures that were employed [8] G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in
Proceedings of the 26th annual computer security applications conference. ACM, 2010, pp. 1-9.
in each of the reviewed papers. The majority used classifiers and tenfold
cross-validation and fivefold cross-validation to validate their results. Five to [9] A. H. Wang, “Detecting spam bots in online social networking sites: A machine learning
six studies used F-measure, precision, and recall to measure performance. approach.” DBSec, vol. 10, pp. 335-342, 2010.
These performance measurements are appropriate for the bot detection [10] X. Zhang, S. Zhu, and W. Liang, “Detecting spam and promoting campaigns in the Twitter
problem since it is ultimately a binary classification problem. social network,” in Data Mining (ICDM), 2012 IEEE 12th International Conference on. IEEE,
2012, pp. 1194-1199.
VI. CONCLUSION [11] S. Rathore, P. K. Sharma, V. Loia, Y.-S. Jeong, and J. H. Park, “Social network security: Issues,
In this paper, we reviewed the bot detection methods that have been challenges, threats, and solutions,” Information Sciences, vol. 421, pp. 43-69, 2017.
employed in recent studies on the Twitter social network. We quantified the [12] M. Shafahi, L. Kempers, and H. Afsarmanesh, “Phishing through social bots on Twitter,” in Big
existing papers according to the detection scheme and classifiers employed. Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016, pp. 3703-3712.
We then summarized the main observations on the reviewed literature within [13] J. Zhang, R. Zhang, Y. Zhang, and G. Yan, “The rise of social botnets: Attacks and
four main subsections: dataset, analyzed features, classier, and performance countermeasures,” IEEE Transactions on Dependable and Secure Computing, 2016.
measures.
[14] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, “The socialbot network: when bots
The findings revealed that social bot detection is challenging and this
socialize for fame and money,” in Proceedings of the 27th annual computer security applications
challenge is exacerbated as the social network volume increases. Bots employ conference. ACM, 2011, pp. 93-102.

179

[15] A. Gupta, H. Lamba, and P. Kumaraguru, “$1.00 per rt# bostonmarathon# prayforboston: detection in online social networks,” in Dependable Systems and Networks (DSN), 2017 47th
Analyzing fake content on twitter,” eCrime Researchers Summit (eCRS). IEEE, 2013, pp. 1-12. Annual IEEE/IFIP International Conference on. IEEE, 2017, pp. 273-284.
[16] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, “Design and analysis of a social [32] J. Zhang, R. Zhang, J. Sun, Y. Zhang, and C. Zhang, “Truetop: A sybilresilient system for user
botnet,” Computer Networks, vol. 57, no. 2, pp. 556-578, 2013. influence measurement on twitter,” IEEE/ACM Transactions on Networking, vol. 24, no. 5, pp.
[17] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, “Fame for sale: efficient 2834-2846, 2016.
detection of fake twitter followers,” Decision Support Systems, vol. 80, pp. 56-71, 2015. [33] Z. Gilani, E. Kochmar, and J. Crowcroft, “Classification of twitter accounts into automated
[18] F. Amato, A. Castiglione, A. De Santo, V. Moscato, A. Picariello, F. Persia, and G. Sperlí, agents and human users.”
“Recognizing human behaviours in online social networks,” Computers & Security, 2017. [34] Z. Gilani, L. Wang, J. Crowcroft, M. Almeida, and R. Farahbakhsh, “Stweeler: A framework for
[19] S. Sivanesh, K. Kavin, and A. A. Hassan, “Frustrate twitter from automation: How far a user can twitter bot analysis,” in Proceedings of the 25th International Conference Companion on World
be trusted?” in Human-Computer Interactions (ICHCI), 2013 International Conference on. IEEE, Wide Web. International World Wide Web Conferences Steering Committee, 2016, pp. 37-38.
2013, pp. 1-5. [35] C. Cai, L. Li, and D. Zengi, “Behavior enhanced deep bot detection in social media,” in
[20] G. Laboreiro, L. Sarmento, and E. Oliveira, “Identifying automatic posting systems in Intelligence and Security Informatics (ISI), 2017 IEEE International Conference on. IEEE, 2017,
microblogs,” Progress in Artificial Intelligence, pp. 634-648, 2011. pp. 128-130.

[21] C. M. Zhang and V. Paxson, Detecting and Analyzing Automated Activity on Twitter. Berlin, [36] N. Chavoshi, H. Hamooni, and A. Mueen, “Debot: Twitter bot detection via warped correlation.”
Heidelberg: Springer Berlin Heidelberg, 2011, pp. 102-111. [Online]. Available: in ICDM, 2016, pp. 817-822.
https://doi.org/10.1007/978-3-642-19260-9_11 [37] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, “Detecting automation of twitter accounts: Are
[22] N. Chavoshi, H. Hamooni, and A. Mueen, “Temporal patterns in bot activities,” in Proceedings you a human, bot, or cyborg?” IEEE Transactions on Dependable and Secure Computing, vol. 9,
of the 26th International Conference on World Wide Web Companion. International World Wide no. 6, pp. 811-824, 2012.
Web Conferences Steering Committee, 2017, pp. 1601-1606. [38] K. S. Adewole, N. B. Anuar, A. Kamsin, K. D. Varathan, and S. A. Razak, “Malicious accounts:
[23] Q. Fu, B. Feng, D. Guo, and Q. Li, “Combating the evolving spammers in online social dark of the social networks,” Journal of Network and Computer Applications, vol. 79, pp. 41-67,
networks,” Computers & Security, vol. 72, pp. 60-73, 2018. 2017.

[24] S. Lee and J. Kim, “Early filtering of ephemeral malicious accounts on [39] N. Chavoshi, H. Hamooni, and A. Mueen, “Identifying correlated bots in twitter,” in
International Conference on Social Informatics. Springer, 2016, pp. 14-21.
twitter,” Computer Communications, vol. 54, pp. 48-57, 2014.
[40] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, “Who is tweeting on twitter: human, bot, or
[25] C. Freitas, F. Benevenuto, S. Ghosh, and A. Veloso, “Reverse engineering socialbot infiltration
cyborg?” in Proceedings of the 26th annual computer security applications conference. ACM,
strategies in twitter,” in Proceedings of the 2015 IEEE/ACM International Conference on
2010, pp. 21-30.
Advances in Social Networks Analysis and Mining 2015. ACM, 2015, pp. 25-32.
[41] F. Ahmed and M. Abulaish, “A generic statistical approach for spam detection in online social
[26] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer, “Botornot: A system to evaluate networks,” Computer Communications, vol. 36, no. 10, pp. 1120-1129, 2013.
social bots,” in Proceedings of the 25th International Conference Companion on World Wide
[42] M. Chakraborty, S. Pal, R. Pramanik, and C. R. Chowdary, “Recent developments in social spam
Web. International World Wide Web Conferences Steering Committee, 2016, pp. 273-274.
detection and combating techniques: A survey,” Information Processing & Management, vol. 52,
[27] A. Alarifi, M. Alsaleh, and A. Al-Salman, “Twitter turing test: Identifying social machines,” no. 6, pp. 1053-1073, 2016.
Information Sciences, vol. 372, pp. 332-346, 2016.
[43] Kwak, H., Lee, C., Park, H., & Moon, S. “What is Twitter, a social network or a news media?”.
[28] M. Kantepe and M. C. Ganiz, “Preprocessing framework for twitter bot detection,” in Computer In Proceedings of the 19th international conference on World wide web (pp. 591-600). ACM.
Science and Engineering (UBMK), 2017 International Conference on. IEEE, 2017, pp. 630-634. 2010.
[29] B. Er¸sahin, Ö. Akta¸s, D. Kılınç, and C. Akyol, “Twitter fake account detection,” in Computer [44] Lee, K., Eoff, B. D., & Caverlee, J. .” Seven Months with the Devils: A Long-Term Study of
Science and Engineering (UBMK), 2017 International Conference on. IEEE, 2017, pp. 388-392. Content Polluters on Twitter”. In ICWSM. 2011
[30] A. Mehrotra, M. Sarreddy, and S. Singh, “Detection of fake twitter followers using graph [45] Morstatter, F., Wu, L., Nazer, T. H., Carley, K. M., & Liu, H. “ A new approach to bot detection:
centrality measures,” in Contemporary Computing and Informatics (IC3I), 2016 2nd Striking the balance between precision and recall”. In Advances in Social Networks Analysis and
International Conference on. IEEE, 2016, pp. 499-504. Mining (ASONAM), 2016 IEEE/ACM International Conference on (pp. 533-540). IEEE. 2016
[31] J. Jia, B. Wang, and N. Z. Gong, “Random walk based fake account

180

Improving Warehouse Efficiency Through Automated Counting of Pallets: YOLOv8-Powered Solutions
No ratings yet
Improving Warehouse Efficiency Through Automated Counting of Pallets: YOLOv8-Powered Solutions
9 pages
10th Part B-Unit 2 AI Project Cycle-1
40% (5)
10th Part B-Unit 2 AI Project Cycle-1
17 pages
Digital Footprints
From Everand
Digital Footprints
Alex Mather
No ratings yet
Chatbot User Detection Using Likes and Comments On Social Media
No ratings yet
Chatbot User Detection Using Likes and Comments On Social Media
7 pages
sustainability-15-06662
No ratings yet
sustainability-15-06662
17 pages
bot detection
No ratings yet
bot detection
7 pages
Socialbots:: Impacts, Threat-Dimensions, and Defense Challenges
No ratings yet
Socialbots:: Impacts, Threat-Dimensions, and Defense Challenges
10 pages
Online Human-Bot Interactions
No ratings yet
Online Human-Bot Interactions
10 pages
MPRP
No ratings yet
MPRP
9 pages
On Profiling Bots
No ratings yet
On Profiling Bots
19 pages
Of Bots and Humans (On Twitter)
No ratings yet
Of Bots and Humans (On Twitter)
8 pages
Of Bots and Humans (On Twitter)
No ratings yet
Of Bots and Humans (On Twitter)
6 pages
j.ipm.2020.102250
No ratings yet
j.ipm.2020.102250
23 pages
s40537-023-00796-3
No ratings yet
s40537-023-00796-3
37 pages
1 s2.0 S0020025518306248 Main
No ratings yet
1 s2.0 S0020025518306248 Main
11 pages
The Rise of Social Bots
No ratings yet
The Rise of Social Bots
11 pages
The Impact of Social Bots On The Spread of Misinformation During The 2020 U.S. Presidential Election
No ratings yet
The Impact of Social Bots On The Spread of Misinformation During The 2020 U.S. Presidential Election
19 pages
AI Based Social Media Bot Detection Model For Trend Centric Twitter Network
No ratings yet
AI Based Social Media Bot Detection Model For Trend Centric Twitter Network
19 pages
A Conceptual Framework of A Detective Model For Social Bot Classification
No ratings yet
A Conceptual Framework of A Detective Model For Social Bot Classification
9 pages
A Comparative Study On Fake Profile Identification Using Different Machine Learning Techniques
No ratings yet
A Comparative Study On Fake Profile Identification Using Different Machine Learning Techniques
11 pages
Bot Finder
No ratings yet
Bot Finder
16 pages
An Enhanced Graph-Based Semi-Supervised Learning A
No ratings yet
An Enhanced Graph-Based Semi-Supervised Learning A
22 pages
Fine Tuned Understanding Enhancing Social Bot Detection With Transformer Based Classification
No ratings yet
Fine Tuned Understanding Enhancing Social Bot Detection With Transformer Based Classification
8 pages
Bot Conversations Are Different: Leveraging Network Metrics For Bot Detection in Twitter
No ratings yet
Bot Conversations Are Different: Leveraging Network Metrics For Bot Detection in Twitter
8 pages
Detection of Malicious Bots in Twitter Network Ijariie23761
No ratings yet
Detection of Malicious Bots in Twitter Network Ijariie23761
6 pages
s13278-022-01020-5
No ratings yet
s13278-022-01020-5
40 pages
Do Social Bots Dream of Electric Sheep
No ratings yet
Do Social Bots Dream of Electric Sheep
11 pages
s00521-023-08352-z copy
No ratings yet
s00521-023-08352-z copy
16 pages
s13278-021-00800-9 (1)
No ratings yet
s13278-021-00800-9 (1)
11 pages
[31]Bot2Vec-IS-2021
No ratings yet
[31]Bot2Vec-IS-2021
16 pages
A Decade of Social Bot Detection
No ratings yet
A Decade of Social Bot Detection
12 pages
Machine Learning-Based Secure Data Acquisition For
No ratings yet
Machine Learning-Based Secure Data Acquisition For
10 pages
Deep Neural Networks For Bot Detection
No ratings yet
Deep Neural Networks For Bot Detection
35 pages
27752-Article Text-31806-1-2-20240324
No ratings yet
27752-Article Text-31806-1-2-20240324
8 pages
Bot Random Forest
No ratings yet
Bot Random Forest
7 pages
The Rise of Sbots
No ratings yet
The Rise of Sbots
9 pages
Stweeler: A Framework For Twitter Bot Analysis
No ratings yet
Stweeler: A Framework For Twitter Bot Analysis
2 pages
Bots and Automation Over Twitter During The U.S. Election: Comprop
No ratings yet
Bots and Automation Over Twitter During The U.S. Election: Comprop
5 pages
Enhanced Twitter Bot Detection A Hybrid Model Integrating Deep Learning and Machine Learning
No ratings yet
Enhanced Twitter Bot Detection A Hybrid Model Integrating Deep Learning and Machine Learning
6 pages
Fake Profile Identification
No ratings yet
Fake Profile Identification
51 pages
Spammer Detection and Fake User Identification On Social Networks
No ratings yet
Spammer Detection and Fake User Identification On Social Networks
9 pages
Lec 06- Fake Profiles, Bots, & Misinformation Analysis
No ratings yet
Lec 06- Fake Profiles, Bots, & Misinformation Analysis
37 pages
Twitter Bot Detection (K2203674)
No ratings yet
Twitter Bot Detection (K2203674)
11 pages
Spam_detection_in_online_social_networks_by_deep_learning
No ratings yet
Spam_detection_in_online_social_networks_by_deep_learning
4 pages
Social Bots
No ratings yet
Social Bots
15 pages
Software Design Specification Document
No ratings yet
Software Design Specification Document
10 pages
Spammer Detect Project Document
No ratings yet
Spammer Detect Project Document
45 pages
Twitter_bot_detection_using_supervised_machine_lea
No ratings yet
Twitter_bot_detection_using_supervised_machine_lea
12 pages
Using AI for Social Media
From Everand
Using AI for Social Media
Terry C Power
No ratings yet
Social Media Business Model Analysis
From Everand
Social Media Business Model Analysis
Xiaoyan Hu
No ratings yet
Twitter Bot Detection: Presented by Sharmila G Harshita CH Vishnu Varhdan Richit S
No ratings yet
Twitter Bot Detection: Presented by Sharmila G Harshita CH Vishnu Varhdan Richit S
7 pages
Pattern Identification of Bot Messages For Media Literacy
No ratings yet
Pattern Identification of Bot Messages For Media Literacy
4 pages
Advanced Analytics of Social Media Datasets
From Everand
Advanced Analytics of Social Media Datasets
Dr. Zemelak Goraga
No ratings yet
A Study of Different Methodologies to De
No ratings yet
A Study of Different Methodologies to De
5 pages
Fake_Social_Media_Profile_Detection[1]
No ratings yet
Fake_Social_Media_Profile_Detection[1]
10 pages
bots article
No ratings yet
bots article
24 pages
Social Media Governance
From Everand
Social Media Governance
Michael Johnson
No ratings yet
Analysis_and_detection_of_fake_profile_over_social_network
No ratings yet
Analysis_and_detection_of_fake_profile_over_social_network
5 pages
1 s2.0 S0020025521009695 Main
No ratings yet
1 s2.0 S0020025521009695 Main
13 pages
10 1109@BigData 2018 8621913
No ratings yet
10 1109@BigData 2018 8621913
10 pages
Analyzing Time Series Activity of Twitter Political Spambots
No ratings yet
Analyzing Time Series Activity of Twitter Political Spambots
10 pages
The Role of Social Media in Society: A Simple Guide to Big Ideas
From Everand
The Role of Social Media in Society: A Simple Guide to Big Ideas
NOVA MARTIAN
No ratings yet
x Bots and Earnings Announcments
No ratings yet
x Bots and Earnings Announcments
46 pages
Berger2C20The20Suit20and20the20Photograph - PDF 2
No ratings yet
Berger2C20The20Suit20and20the20Photograph - PDF 2
19 pages
1Berger2C20The20Suit20and20the20Photograph PDF
No ratings yet
1Berger2C20The20Suit20and20the20Photograph PDF
19 pages
Governing Algorithms From The South A Case Study of AI Development in Africa
No ratings yet
Governing Algorithms From The South A Case Study of AI Development in Africa
14 pages
Spam News Detection Report: Manikiran
No ratings yet
Spam News Detection Report: Manikiran
12 pages
Machine_Learning_for_Detecting_the_Phishing_Threats
No ratings yet
Machine_Learning_for_Detecting_the_Phishing_Threats
6 pages
Module - 1 - ECE3047 - Machine Learning - 1 (8748)
No ratings yet
Module - 1 - ECE3047 - Machine Learning - 1 (8748)
38 pages
Fire Detection Algorithm Based On The Fusion of YOLOv8 and Deformable Conv DCN
No ratings yet
Fire Detection Algorithm Based On The Fusion of YOLOv8 and Deformable Conv DCN
8 pages
1036 2050 1 SM
No ratings yet
1036 2050 1 SM
10 pages
Autism Spectrum Disorder Detection Using Facial Images
No ratings yet
Autism Spectrum Disorder Detection Using Facial Images
14 pages
Ugc Parper 2 2009
No ratings yet
Ugc Parper 2 2009
91 pages
Harnessing Deep Learning Methods For Detecting Different Retinal Diseases: A Multi-Categorical Classification Methodology
No ratings yet
Harnessing Deep Learning Methods For Detecting Different Retinal Diseases: A Multi-Categorical Classification Methodology
11 pages
Predictionof Diabetesusing Machine Learning
No ratings yet
Predictionof Diabetesusing Machine Learning
6 pages
Major Report
No ratings yet
Major Report
53 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Fake App Detection Using Sentiment Analysis
No ratings yet
Fake App Detection Using Sentiment Analysis
6 pages
DSF Unit 4
No ratings yet
DSF Unit 4
12 pages
Weka Tutorial 2
No ratings yet
Weka Tutorial 2
50 pages
IOT Base Water Quality Measurment Using Meachine Learning
No ratings yet
IOT Base Water Quality Measurment Using Meachine Learning
45 pages
AI X Sample Papers Sultan Chand 2023
No ratings yet
AI X Sample Papers Sultan Chand 2023
9 pages
UPI Fraud Detection Using Convolutional Neural Net
No ratings yet
UPI Fraud Detection Using Convolutional Neural Net
16 pages
Fraud_Detection_and_Prevention_for_a_Secure_Financial_Future_Using_Artificial_Intelligence (1)
No ratings yet
Fraud_Detection_and_Prevention_for_a_Secure_Financial_Future_Using_Artificial_Intelligence (1)
6 pages
Tooth Detection From Panoramic Radiographs Using Deep Learning
No ratings yet
Tooth Detection From Panoramic Radiographs Using Deep Learning
11 pages
Lab_6
No ratings yet
Lab_6
6 pages
Detecting Click Fraud Paper
No ratings yet
Detecting Click Fraud Paper
42 pages
Module 5 Aws
No ratings yet
Module 5 Aws
55 pages
Ardra Suresh SBCE 5
No ratings yet
Ardra Suresh SBCE 5
8 pages
Respostas Machine Learning Engineer
No ratings yet
Respostas Machine Learning Engineer
14 pages
Text Summarization As Feature Selection For Arabic Text Classification
No ratings yet
Text Summarization As Feature Selection For Arabic Text Classification
4 pages
Classifying Emergency Patients into Fast-Track and Complex Cases using Machine Learning
No ratings yet
Classifying Emergency Patients into Fast-Track and Complex Cases using Machine Learning
13 pages
Weka
No ratings yet
Weka
22 pages
Walmart LLM KG
No ratings yet
Walmart LLM KG
11 pages

Detecting Bots

Uploaded by

Detecting Bots

Uploaded by

                 

Detecting Social Bots on Twitter: A Literature Review

 !!!""#$% &&& 175

BeDM [35] is a proposed behavior-enhanced deep learning model for bot

You might also like

!!!""#$% &&& 175