Detecting Bots
Detecting Bots
Abstract—Due to the exponential growth in the popularity of online social In an attempt to limit the threats posed by social bots, researchers have
networks (OSNs), such as Twitter and Facebook, the number of machine accounts proposed different methods by which social bots can be detected and blocked.
that are designed to mimic human users has increased. Social bots accounts (Sybils) The majority of the studies in this domain to date have focused on studying
have become more sophisticated and deceptive in their efforts to replicate the
behaviors of normal accounts. As such, there is a distinct need for the research
behavior patterns [18]-[22]. For example, a recent study that was performed
community to develop technologies that can detect social bots. This paper presents a by Fu et al. [23] proposed a dynamic metric to measure the change in users’
review of the recent techniques that have emerged that are designed to differentiate activities as a means of identifying the strategies employed by spammers.
between social bot account and human accounts. We limit the analysis to the Another detection scheme aimed to identify malicious account groups by
detection of social bots on the Twitter social media platform. We review the various understanding the algorithms associated with the generated account names
detection schemes that are currently in use and examine common aspects such as and subsequently relating these to creation time [24]. This study analyzed 4.7
the classifier, datasets, and selected features employed. We also compare the million accounts that were collected from Twitter and achieved reasonable
evaluation techniques that are employed to validate the classifiers. Finally, we
highlight the challenges that remain in the domain of social bot detection and
accuracy. In the same area, Stringhini et al. [8] studied data from three large
consider future directions for research efforts that are designed to address this social networks after creating large and diverse honey-profiles. They
problem. successfully detected 15,857 spam profiles that had been deleted by Twitter.
It is important to note that not all social bot accounts can be classified as
Keywords—Social Bots; Twitter; Detection; Sybil malicious accounts. Some even explicitly state their nature in the profile of the
account. Social bots that operate without malicious intent may serve positive
purposes, such as managing news feeds or acting as customer care responders.
I. INTRODUCTION The problem we are concerned with in this paper is undisclosed social bots
Online social networks (OSN) represent a global platform through which that have malicious intentions. As outlined above, these social bots can pose
people share and promote products, links, opinions, and news. In the third fundamental financial, social, political, and security risks. They have become
quarter of 2007, Twitter had 330 million active users [1]. By 2015, the increasingly sophisticated in their designs and capabilities to avoid social bot
estimated number of users had grown to 1.3 billion [2]. The data-sharing detection techniques [5], [14]. A study by Freitas et al. [25] found that only 38
feature of social networks allows users to distribute content and links; out of every 120 social bots were detected and removed by Twitter.
however, this feature is also commonly used by spammers and fraudsters. In light of the above, there is a requirement to gain in-depth insights into
Social bot accounts make OSNs vulnerable to adversaries. Social bots are the capabilities and limitations of the social bot detection techniques that are
programs that automatically generate content, distribute it via a particular currently in use on the Twitter social network platform. By comparing and
social network, and interact with its users [3]. According to a recent study by evaluating the existing approaches, we can develop an understanding of the
Varol et al., between 9% and 15% of Twitter accounts are bot accounts [4], different solutions that are available based on the selected features and trained
which is the equivalent of 48 million accounts [2]. A further study found that classifiers, and can subsequently apply this understanding to identify which
social bots are responsible for generating 35% of the content that is posted on technologies achieve the best accuracy and detection results.
Twitter [5]. The rest of this paper is organized as follows. The review methodology is
Many studies have aimed to address the problems associated the use of presented in Section II. In Section III, we evaluate the datasets employed in
automated accounts on social networks [6][7], [8], [9], which can spread existing studies. Section IV then progresses to identify the methods by which
spam, warms, and phishing links or manipulate legitimate accounts by social bots can be detected. Section V presents a discussion and evaluation of
hijacking and deceiving users [10]-[12]. Malicious accounts typically operate the techniques mentioned in the previous section. Section VI highlights the
under a botmaster, who controls a group of social bots to distribute spam or challenges that remain in the domain of social bot detection and considers
manipulate behaviors on a given social network [13]. For example, in Syria, a future directions for research efforts that aim to address this problem.
social bot was employed to flood Twitter with hashtags related to the Syrian
civil war with irrelevant topics that redirected the attention of users from
controversial government actions [5]. Social bots have also played a II. REVIEW METHODOLOGY
significant role in the uprisings that occur in the aftermath of major events In this paper, we focus on studying the detection techniques that are
such as elections or conflicts [14]. Gupta et al. [15] studied the fake content commonly employed to detect social bots or fake accounts on the Twitter
that was proliferated via Twitter during the Boston Marathon blasts and the social network platform. The analysis does not include alternative social
role such content played in spreading rumors and misinformation. They found networks, such as Facebook or Tumbler, or other malicious activities and
that bot accounts were created and generated after the blasts, many of which problems such as spam or hijacking.
impersonated real accounts [15]. The malicious activities of bots during
events such as these can be used to spread spam. In addition, they can also
cause financial harm, as was observed in the case of Cynk, which suffered a III. DATASETS & PREPROCESSING
220-fold drop in market price as a result of the activities of automated stock
trading social bots [3]. The approaches that underpin social bot detection techniques can vary
The activities of social bots also impact the social graph of OSNs because significantly. However, they can broadly be categorized into three common
of the large number of non-genuine social relationships. If social bots methods: graph-based, crowdsourcing, and machine learning [3], [38]. The
successfully infiltrate users’ accounts, they can harvest social bot private data process of detecting social bots commences by retrieving data from the
and subsequently use it for phishing and spamming activities [9] [16]. In Twitter stream. Once the data is collected, the next step involves preparing
addition, they can aggregate information from the web to impersonate others, this data for the chosen classifier by extracting and selecting the features that
replicate human behaviors, and influence people by ranking and retweeting. In can be studied through statistical methods, as [26], [27], or those that can be
addition to essentially misleading users, social bots can damage the ecosystem manually labelled using previous work, as [6], [28], [29].
of the social network by establishing fake fellowship relations [17] and/or In this section, we review the datasets that were employed in the studies
poisoning the network content. identified in the literature review and determine whether they use public or
private datasets. Through the use of a graph-based method, we identify the
social graph data employed and the total number of nodes for sampling both and malicious accounts [38]. The first method is based on trust propagation,
classes: Sybil and legitimate. In the machine-learning methods analysis, we which evaluates whether the trust relationship that exists between two graph
identify the tools employed in the data collection process and assess the total objects is strong or weak. The second method is graph clustering, by which
number of accounts included in the testing phase. In addition, we state the related nodes of a social graph are grouped based on similar characteristics
number of features that were involved in the preprocessing phase. such as users’ distance. The third method involves studying graph metrics and
Within their graph-based approach, [30] and [31] used a public dataset properties, where probability distribution, scale-free graph structure, and
that was compiled using data from previous studies. For example, the authors centrality metric measures are addressed in a social graph.
in [31] used a sample of 100 Sybil nodes and a sample of 100 benign nodes In this subsection, we present three graph-based Sybil detection systems
for the synthesized social network in addition to a real dataset from Twitter to that were evaluated using datasets from Twitter. These three systems were
compare their proposed bot-detection method with other random walk-based chosen because they involved the analysis of a Twitter dataset. They are
efforts. They employed a Twitter dataset sample of 50,000 nodes for the Sybil presented in chronological order, starting with the most recent work.
region and 50,000 for benign nodes for training and testing after processing a SybilWalk [31] is a proposed Sybil detection method that employs a
dataset that contained 41,652,230 nodes and 1,202513,046 edges. Moreover, random walk-based method on an undirected social graph. The idea of the
to complete their dataset for experimenting, both [32], [30] purchased a random walk method is to label legitimate users with benignness scores and
number of fake Twitter accounts to implement within the Sybil social network Sybil users with badness scores. Therefore, these scores will help to classify
region. users into two classes: legitimate and Sybil. In addition, this method can rank
In studies that have focused on machine learning methods, the reviewed all users as a means of identifying top-ranked accounts that are likely to be
studies used datasets that consisted of a combination of privately obtained Sybils. The authors of SybilWalk assumed that the graph satisfies the
accounts and the public datasets that were employed in previous studies (See homophily property, for which two linked nodes tend to share the same label.
Table I). In certain cases, some researchers, such as [27] and [35], used the They labeled the legitimate node badness score with 0 and the Sybil node with
available public datasets as a ground truth baseline for testing their techniques. 1 and employed a directed Twitter graph dataset that was obtained from a
In general, most of the research employed the Twitter API to collect data and previous study [43]. To evaluate their experiment, the authors used the Area
compile the datasets with the exception of [33], who used their own API to Under the Receiver Operating Characteristic Curve (AUC) as a standard
collect data [34]. Feature selection methods are commonly applied to increase metric to measure the quality of their ranking method, which they awarded
the speed of the classier, reduce the training time, improve generalization, and with a score of 0.96. They also presented the classification results of Sybil and
avoid the overfitting problem. For example, as part of their preprocessing legitimate nodes in the form of the false positive rate (FPR) and false negative
phase, [27] used a correlation-based system in combination with a principal rate (FNR) of 1.3% and 17.3% respectively.
components analysis method. The selected features were then analyzed using Mehrotra et al. proposed a method to detect fake followers using social
a cumulative distribution function for each selected feature. graph-based features that relate to the centrality of all nodes in the graph [30].
They claimed that their proposed method can be applied to all social
networking platforms. They employed five datasets, two of which were of the
IV. SOCIAL BOT DETECTION METHODS legitimate follower type and the remaining three of fake followers. They used
six features of centralities of graph-based centrality measurements for the
Generally, social bot detection on social networks is performed by one or
purposes of the classification. After computing the centrality measures of all
more of the three common methods mentioned earlier: Graph-based,
the given nodes in the graph, they applied three classifiers: Artificial Neural
crowdsourcing, and machine learning.
Networks, Decision Tree, and Random Forest. The random forest classifiers
The graph-based method involves using the social graph of a social
scored the highest accuracy of 95%, with the precision of 88.99%, and recall
network to understand the network information and the relationships between
of 100%.
edges or links across accounts to detect bot activity. The crowdsourcing
TrueTop [32] is another influence measurement system that employed a
method involves using expert annotators to identify, evaluate, and determine
graph-based approach to test Sybil resilience. The authors employed a
social bot behaviors. Finally, the machine learning method involves
synthetic simulation of users on Twitter to avoid violating the platform’s
developing algorithms and statistical methods that can develop an
terms of service. They employed four datasets to implement the system,
understanding of the revealing features or behavior of social network accounts
evaluate the accuracy of the model, and test Sybil resilience against a set of
in order to distinguish between human- and computer-led activity.
predetermined metrics. They presented a model of the strength of Sybil
In this section, we provide an overview of the three methods that have
attacks based on the α parameter, which represented the ratio of total weight
been used by researchers to detect social bot accounts on Twitter. In each
of the edge in the non-Sybil region against that of the Sybil region. They
subsection, we discuss the related studies and the datasets, detection
assumed the worst-case scenario of Sybil attacks in which there was no
mechanisms, classifiers involved, and the process by which the results were
interaction between the two regions.
validated.
B. Crowdsourcing
A. Graph-Based Detection As previously described, the crowdsourcing approach to social bot
detection involves leveraging human detection to identify patterns across
Social network graphs are commonly employed to understand and
given account profiles or the content shared by human and social bot
distinguish between users’ relationships on social networking platforms.
accounts. The role of the human is to distinguish between bot accounts and
Three social graph-based methods are typically employed to detect social bots
176
177
sample that is well defined in terms of size and content can be very difficult to
achieve. Therefore, many researchers employ human annotation of a
reasonable training sample to perform this task, even though it takes time and
is prone to human error. One solution to this problem that some studies have
identified is using accounts that Twitter has suspended for use as social bots.
However, this solution is not significantly accurate because human users are
sometimes suspended for violating Twitter’s terms of use. In addition, this
approach will rely on the researcher’s ability to obtain the data for suspended
accounts, and this is not readily available.
B. Common Features
Social bot detection is based on classifications of selected features to sort
accounts into either legitimate or bot accounts. However, the studies reviewed
in this paper highlight how common features are used to detect social bot
accounts. These include factors related to timing, automation, text use,
sentiment, and clickstream behavior. Therefore, we cannot assume a social bot
depends on one feature without addressing the other features [37]. In Table
III, we summarize the common features that are extracted from a full set of
capture the correlation between posting activities using a cross-correlation- features in the reviewed papers to measure the likelihood of an account being
based random projection technique. As such, the synchronized behavior in a a human or bot. In general, the extracted features can address the network
sequence of 40 activities acted as an indicator of automated accounts. They features to identify the community features. We can also identify the social
calculated and compared their model against five alternative methods such as connections of users and ranking through performing content and behavioral
Twitter, BotORNot, etc. They achieved 94% precision in their generated daily analysis. For example, if an account is verified or protected, it is a logical
reports. indicator that it is a human account, not a bot account. The profile features
Chu et al. studied the features related to tweeting behavior, tweet content, that are extracted from the metadata, such as profile image, screen name, and
and account properties [37] to detect the automation of bots. They categorized description, may also indicate the nature of the account. For example, a
accounts into human, bot, and cyborg according to the investigated features. default profile image is a sign of a new user or a bot account [27]. The
Their classification system incorporated an entropy-based component to detect temporal pattern, such as the average of tweeting and retweeting ratios, for
regularity of timing to measure automation, a spam detection component in example, can be a sign of bot activity if it occurs with small inter-arrivals
the form of a Bayesian classification that detected text patterns as a means of [35][40]. Therefore, using an entropy component to detect behavior as part of
detecting spam, account properties, and a random forest classifier as a the classification system is essential.
decision maker. They collected data covering 512,407 accounts using the In addition, the rate of posting similar content with URL can be an
Twitter API. They constructed their ground truth sample by using 6000 indicator of a spammer [10][41]. In other words, the URL feature can be used
accounts divided equally per human, bot, and cyborg. They extracted 8 to detect the link farming behavior that is typically employed by spammers
features and implemented a random forest classifier with tenfold cross- and bot accounts [42]. Also, using the mention feature in association with the
validation. They employed a confusion matrix to measure the system URL and number of link feature and entropy of tweets can indicate a bot
performance and achieved an average score of 96%. account with malicious intention [7]. Moreover, if the number of followers is
high yet the account is relatively new, it’s likely that the followers are fake
V. DISCUSSION and the account is a bot.
Thus far, this paper has examined some of the approaches that have been
employed to detect the activities of social bots on Twitter. As mentioned
earlier, social bot accounts are more deceptive than ever before, and it is
C. Methods Employed to Detect Bots
becoming increasingly difficult to develop systems that can detect these The literature review of the recent studies that have been performed in
applications. To make progress in this area, there is a need to consider the this domain highlights how different approaches to detecting social bots have
main challenges and factors that impact social bot detection activities, as been implemented. The main methods focus on the primary components of
understanding these challenges will facilitate the development of new social networks, such as network structure, content, and behavior features. In
technologies that can address the issues that are at play. In this section, we general, the content and behavioral characteristics of bot accounts are
highlight the factors that commonly represent challenges in social bot employed in off-the-shelf machine learning algorithms [4]. This section
detection. These factors are datasets, common features, methods employed, evaluates some of the methods that were most commonly used within the
and performance measures. studied papers to detect social bot accounts. Table IV presents a summary of
the advantages and disadvantages of each method (See Table IV).
A. Datasets: Using a graph-based method, [31] studied the trust propagation of the
To study and understand the behavior of social bots in comparison to network in which the ranking of users is easier based on the trust scores.
human behavior in social networks, it is essential to maintain datasets that However, this approach is sensitive to the selection of trust seeds, and it works
consist of both human and bot accounts. In Table I, the difference in data sizes based on assumptions, which are not always accurate. Using the same method,
across the reviewed papers is obvious. Researchers encounter two issues with [32] and [30] measured the influence of social users based on the social
datasets. The first of these concerns the availability of recent public datasets network graph. This approach is useful for visualization and measuring the
on which to perform experimentation. Some studies use their own platform to influence rate of a social network based on the distance measurement of the
collect data as a solution to avoid this issue and avoid situations in which they centrality of the influencer nodes. However, the computation cost can be high
have a limited amount of Twitter API request per hour during the process of if the targeted network is large and real. Therefore, using a synthetic network
collecting the data, as was the case in [34]. However, it usually takes time to to apply this approach is useful, even though the results in the real network are
collect data that is of a decent size and contains enriched content. For sometimes unpredictable.
example, the maximum tweets per user that the API can provide is 3200. In The crowdsourcing method can be employed to effectively build ground
this regard, a good number of studies have established a system of sharing truth data and annotation tasks. This approach employs human intelligence to
processed datasets so that other researchers can then use as a baseline or for identify different patterns. The problem with this approach is that it consumes
comparative purposes. However, the datasets that are available will be limited time and is prone to human error. However, as described in Section III, there
to the features and size in which they were issued to avoid violating the are some solutions by which the annotation task can be validated during the
privacy of the users. preprocessing phase to maintain the best results.
The second issue relates to developing a trained dataset that is diverse in The survey of the existing literature revealed that researchers are more
terms of the content of the bot accounts. This is especially significant in likely to employ machine learning methods than the other two approaches.
studies that employ a machine learning approach to bot detection. Labeling a The majority of reviewed papers used tree-based approaches and Bayes-
theorem. The random forest classifier was the most commonly employed
178
classifier within the reviewed methods. The advantages of this classifier are sophisticated mechanisms to avoid detection and researchers have yet to
that it is less complex in terms of tuning and achieves a more accurate develop viable methods by which such mechanisms can be identified. As
performance. However, the complexity of the tree will generate overfitting, as such, there is a requirement for ongoing studies into bot detection approaches.
is the case with most decision tree algorithms. Bayes-theorem and random Twitter is encouraged to develop systems that can recognize automated tweets
forest were widely used in the studies described in the literature. Bayes- and tag them with a unified label so that they can be readily identified by
theorem as a statistical theorem is fast in terms of training and prediction time. users. The research community is encouraged to collaborate to build a
However, the performance of this classifier is better when the data set periodically updated public dataset that includes recently detected bots.
contains a relatively low number of features. Many researchers have
employed a support vector machine to reduce the error rate in the
classification process. However, SVM depends on the selective kernel and REFERENCES
parameter. In addition, a major disadvantage of this classifier is that it depends [1] Twitter, October 2017. [Online]. Available: https://investor.twitterinc.com/results.cfm
on the use of a large training set to increase performance. Similar to SVM,
[2] C. Smith. (2017, November) 400 amazing twitter statistics and facts. [Online]. Available:
neural network effectiveness depends on the sample size; when the sample
https://expandedramblings.com/index.php/
size is large, the vector performs well. One detection method that was applied
in the literature was that of the pairwise similarity technique [36]. This [3] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise of social bots,”
method can effectively detect bots based on the similarity of the profile Communications of the ACM, vol. 59, no. 7, pp. 96-104, 2016.
activity. However, the extent to which the method can be scaled depends on [4] O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A. Flammini, “Online human-bot
storing and analyzing the user’s history. interactions: Detection, estimation, and characterization,” arXiv preprint arXiv:1703.03107,
2017.
D. Performance Measures: [5] N. Abokhodair, D. Yoo, and D. W. McDonald, “Dissecting a social botnet: Growth, content and
Within the investigated studies, different performance measurements influence in twitter,” in Proceedings of the 18th ACM Conference on Computer Supported
have been used to evaluate social bot detection classification techniques. The Cooperative Work & Social Computing. ACM, 2015, pp. 839-851.
approach that is commonly used to measure performance in these articles is [6] V. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan, K. Lerman, L. Zhu, E. Ferrara, A.
the accuracy rate, which relates to the percentage of accounts that are Flammini, and F. Menczer, “The darpa twitter bot challenge,” Computer, vol. 49, no. 6, pp. 38-
correctly classified with respect to the whole sample. However, using the 46, 2016.
accuracy rate alone is not sufficient to evaluate the chosen classifier. The Chu, [7] C. Grier, K. Thomas, V. Paxson, and M. Zhang, “@ spam: the underground on 140 characters or
Zi et al. study evaluated the accuracy of each feature, and the result was less,” in Proceedings of the 17th ACM conference on Computer and communications security.
meaningless when compared to using a confusion matrix to evaluate the ACM, 2010, pp. 27-37.
whole features for each class [37].
Table V presents the list of the performance measures that were employed [8] G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in
Proceedings of the 26th annual computer security applications conference. ACM, 2010, pp. 1-9.
in each of the reviewed papers. The majority used classifiers and tenfold
cross-validation and fivefold cross-validation to validate their results. Five to [9] A. H. Wang, “Detecting spam bots in online social networking sites: A machine learning
six studies used F-measure, precision, and recall to measure performance. approach.” DBSec, vol. 10, pp. 335-342, 2010.
These performance measurements are appropriate for the bot detection [10] X. Zhang, S. Zhu, and W. Liang, “Detecting spam and promoting campaigns in the Twitter
problem since it is ultimately a binary classification problem. social network,” in Data Mining (ICDM), 2012 IEEE 12th International Conference on. IEEE,
2012, pp. 1194-1199.
VI. CONCLUSION [11] S. Rathore, P. K. Sharma, V. Loia, Y.-S. Jeong, and J. H. Park, “Social network security: Issues,
In this paper, we reviewed the bot detection methods that have been challenges, threats, and solutions,” Information Sciences, vol. 421, pp. 43-69, 2017.
employed in recent studies on the Twitter social network. We quantified the [12] M. Shafahi, L. Kempers, and H. Afsarmanesh, “Phishing through social bots on Twitter,” in Big
existing papers according to the detection scheme and classifiers employed. Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016, pp. 3703-3712.
We then summarized the main observations on the reviewed literature within [13] J. Zhang, R. Zhang, Y. Zhang, and G. Yan, “The rise of social botnets: Attacks and
four main subsections: dataset, analyzed features, classier, and performance countermeasures,” IEEE Transactions on Dependable and Secure Computing, 2016.
measures.
[14] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, “The socialbot network: when bots
The findings revealed that social bot detection is challenging and this
socialize for fame and money,” in Proceedings of the 27th annual computer security applications
challenge is exacerbated as the social network volume increases. Bots employ conference. ACM, 2011, pp. 93-102.
179
[15] A. Gupta, H. Lamba, and P. Kumaraguru, “$1.00 per rt# bostonmarathon# prayforboston: detection in online social networks,” in Dependable Systems and Networks (DSN), 2017 47th
Analyzing fake content on twitter,” eCrime Researchers Summit (eCRS). IEEE, 2013, pp. 1-12. Annual IEEE/IFIP International Conference on. IEEE, 2017, pp. 273-284.
[16] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, “Design and analysis of a social [32] J. Zhang, R. Zhang, J. Sun, Y. Zhang, and C. Zhang, “Truetop: A sybilresilient system for user
botnet,” Computer Networks, vol. 57, no. 2, pp. 556-578, 2013. influence measurement on twitter,” IEEE/ACM Transactions on Networking, vol. 24, no. 5, pp.
[17] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, “Fame for sale: efficient 2834-2846, 2016.
detection of fake twitter followers,” Decision Support Systems, vol. 80, pp. 56-71, 2015. [33] Z. Gilani, E. Kochmar, and J. Crowcroft, “Classification of twitter accounts into automated
[18] F. Amato, A. Castiglione, A. De Santo, V. Moscato, A. Picariello, F. Persia, and G. Sperlí, agents and human users.”
“Recognizing human behaviours in online social networks,” Computers & Security, 2017. [34] Z. Gilani, L. Wang, J. Crowcroft, M. Almeida, and R. Farahbakhsh, “Stweeler: A framework for
[19] S. Sivanesh, K. Kavin, and A. A. Hassan, “Frustrate twitter from automation: How far a user can twitter bot analysis,” in Proceedings of the 25th International Conference Companion on World
be trusted?” in Human-Computer Interactions (ICHCI), 2013 International Conference on. IEEE, Wide Web. International World Wide Web Conferences Steering Committee, 2016, pp. 37-38.
2013, pp. 1-5. [35] C. Cai, L. Li, and D. Zengi, “Behavior enhanced deep bot detection in social media,” in
[20] G. Laboreiro, L. Sarmento, and E. Oliveira, “Identifying automatic posting systems in Intelligence and Security Informatics (ISI), 2017 IEEE International Conference on. IEEE, 2017,
microblogs,” Progress in Artificial Intelligence, pp. 634-648, 2011. pp. 128-130.
[21] C. M. Zhang and V. Paxson, Detecting and Analyzing Automated Activity on Twitter. Berlin, [36] N. Chavoshi, H. Hamooni, and A. Mueen, “Debot: Twitter bot detection via warped correlation.”
Heidelberg: Springer Berlin Heidelberg, 2011, pp. 102-111. [Online]. Available: in ICDM, 2016, pp. 817-822.
https://doi.org/10.1007/978-3-642-19260-9_11 [37] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, “Detecting automation of twitter accounts: Are
[22] N. Chavoshi, H. Hamooni, and A. Mueen, “Temporal patterns in bot activities,” in Proceedings you a human, bot, or cyborg?” IEEE Transactions on Dependable and Secure Computing, vol. 9,
of the 26th International Conference on World Wide Web Companion. International World Wide no. 6, pp. 811-824, 2012.
Web Conferences Steering Committee, 2017, pp. 1601-1606. [38] K. S. Adewole, N. B. Anuar, A. Kamsin, K. D. Varathan, and S. A. Razak, “Malicious accounts:
[23] Q. Fu, B. Feng, D. Guo, and Q. Li, “Combating the evolving spammers in online social dark of the social networks,” Journal of Network and Computer Applications, vol. 79, pp. 41-67,
networks,” Computers & Security, vol. 72, pp. 60-73, 2018. 2017.
[24] S. Lee and J. Kim, “Early filtering of ephemeral malicious accounts on [39] N. Chavoshi, H. Hamooni, and A. Mueen, “Identifying correlated bots in twitter,” in
International Conference on Social Informatics. Springer, 2016, pp. 14-21.
twitter,” Computer Communications, vol. 54, pp. 48-57, 2014.
[40] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, “Who is tweeting on twitter: human, bot, or
[25] C. Freitas, F. Benevenuto, S. Ghosh, and A. Veloso, “Reverse engineering socialbot infiltration
cyborg?” in Proceedings of the 26th annual computer security applications conference. ACM,
strategies in twitter,” in Proceedings of the 2015 IEEE/ACM International Conference on
2010, pp. 21-30.
Advances in Social Networks Analysis and Mining 2015. ACM, 2015, pp. 25-32.
[41] F. Ahmed and M. Abulaish, “A generic statistical approach for spam detection in online social
[26] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer, “Botornot: A system to evaluate networks,” Computer Communications, vol. 36, no. 10, pp. 1120-1129, 2013.
social bots,” in Proceedings of the 25th International Conference Companion on World Wide
[42] M. Chakraborty, S. Pal, R. Pramanik, and C. R. Chowdary, “Recent developments in social spam
Web. International World Wide Web Conferences Steering Committee, 2016, pp. 273-274.
detection and combating techniques: A survey,” Information Processing & Management, vol. 52,
[27] A. Alarifi, M. Alsaleh, and A. Al-Salman, “Twitter turing test: Identifying social machines,” no. 6, pp. 1053-1073, 2016.
Information Sciences, vol. 372, pp. 332-346, 2016.
[43] Kwak, H., Lee, C., Park, H., & Moon, S. “What is Twitter, a social network or a news media?”.
[28] M. Kantepe and M. C. Ganiz, “Preprocessing framework for twitter bot detection,” in Computer In Proceedings of the 19th international conference on World wide web (pp. 591-600). ACM.
Science and Engineering (UBMK), 2017 International Conference on. IEEE, 2017, pp. 630-634. 2010.
[29] B. Er¸sahin, Ö. Akta¸s, D. Kılınç, and C. Akyol, “Twitter fake account detection,” in Computer [44] Lee, K., Eoff, B. D., & Caverlee, J. .” Seven Months with the Devils: A Long-Term Study of
Science and Engineering (UBMK), 2017 International Conference on. IEEE, 2017, pp. 388-392. Content Polluters on Twitter”. In ICWSM. 2011
[30] A. Mehrotra, M. Sarreddy, and S. Singh, “Detection of fake twitter followers using graph [45] Morstatter, F., Wu, L., Nazer, T. H., Carley, K. M., & Liu, H. “ A new approach to bot detection:
centrality measures,” in Contemporary Computing and Informatics (IC3I), 2016 2nd Striking the balance between precision and recall”. In Advances in Social Networks Analysis and
International Conference on. IEEE, 2016, pp. 499-504. Mining (ASONAM), 2016 IEEE/ACM International Conference on (pp. 533-540). IEEE. 2016
[31] J. Jia, B. Wang, and N. Z. Gong, “Random walk based fake account
180