Spammer Detection and Fake User Identification On Social Networks
Spammer Detection and Fake User Identification On Social Networks
net/publication/333296282
CITATIONS READS
38 10,014
7 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ikram Ud Din on 22 May 2019.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2019.DOI
ABSTRACT Social networking sites engage millions of users around the world. The users’
interactions with these social sites, such as Twitter and Facebook have a tremendous impact and
occasionally undesirable repercussions for the daily life. The prominent social networking sites
have turned into a target platform for the spammers to disperse a huge amount of irrelevant and
deleterious information. Twitter, for example, has become one of the most extravagantly used
platforms of all times and therefore allows an unreasonable amount of spamming. Fake users
send undesired tweets to users to promote services or websites that not only affect the legitimate
users but also disrupt the resource consumption. Moreover, the possibility of expanding invalid
information to users through fake identities has increased that results in the unrolling of harmful
content. Recently, the detection of spammers and identification of fake users on Twitter has become
a common area of research in contemporary online social Networks (OSNs). In this paper, we
perform a review of techniques used for detecting spammers on Twitter. Moreover, a taxonomy of
the Twitter spam detection approaches is presented that classifies the techniques based on their
ability to detect: (i) fake content, (ii) spam based on URL, (iii) spam in trending topics, and (iv)
fake users. The presented techniques are also compared based on various features, such as user
features, content features, graph features, structure features, and time features. We are hopeful
that the presented study will be a useful resource for researchers to find the highlights of recent
developments in Twitter spam detection on a single platform.
INDEX TERMS Classification, fake user detection, Online social network, spammer’s identifica-
tion.
VOLUME 4, 2019 1
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
has intensified. Many people who do not have much of their goals and results. Table 2 compares different
information regarding the OSNs can easily be tricked by features that are used for identifying spam on Twitter.
the fraudsters. There is also a demand to combat and We anticipate that this survey will help readers find
place a control on the people who use OSNs only for diverse information on spammer detection techniques at
advertisements and thus spam other people’s accounts. a single point.
Recently, the detection of spam in social networking This article is structured such that Section II presents
sites attracted the attention of researchers. Spam de- the taxonomy for the spammer detection techniques on
tection is a difficult task in maintaining the security Twitter. The comparison of proposed methods for de-
of social networks. It is essential to recognize spams tecting spammers on Twitter is discussed in Section III.
in the OSN sites to save users from various kinds of Section IV presents an overall analysis and discussion,
malicious attacks and to preserve their security and pri- whereas Section V concludes the paper and highlights
vacy. These hazardous maneuvers adopted by spammers some directions for future work.
cause massive destruction of the community in the real
world. Twitter spammers have various objectives, such II. SPAMMER DETECTION ON TWITTER
as spreading invalid information, fake news, rumors, In this article, we elaborate a classification of spammer
and spontaneous messages. Spammers achieve their ma- detection techniques. Fig. 1 shows the proposed tax-
licious objectives through advertisements and several onomy for identification of spammers on Twitter. The
other means where they support different mailing lists proposed taxonomy is categorized into four main classes,
and subsequently dispatch spam messages randomly to namely, (i) fake content, (ii) URL based spam detection,
broadcast their interests. These activities cause distur- (iii) detecting spam in trending topics, and (iv) fake user
bance to the original users who are known as non- identification. Each category of identification methods
spammers. In addition, it also decreases the repute of relies on a specific model, technique, and detection
the OSN platforms. Therefore, it is essential to design a algorithm. The first category (fake content) includes
scheme to spot spammers so that corrective efforts can various techniques, such as regression prediction model,
be taken to counter their malicious activities [3]. malware alerting system, and Lfun scheme approach. In
Several research works have been carried out in the the second category (URL based spam detection), the
domain of Twitter spam detection. To encompass the spammer is identified in URL through different machine
existing state-of-the-art, a few surveys have also been learning algorithms. The third category (spam in trend-
carried out on fake user identification from Twitter. ing topics) is identified through Naïve Bayes classifier
Tingmin et al. [4] provide a survey of new methods and language model divergence. The last category (fake
and techniques to identify Twitter spam detection. The user identification) is based on detecting fake users
above survey presents a comparative study of the cur- through hybrid techniques. Techniques related to each
rent approaches. On the other hand, the authors in [5] of the spammer identification categories are discussed in
conducted a survey on different behaviors exhibited by the following subsections.
spammers on Twitter social network. The study also
provides a literature review that recognizes the existence A. FAKE CONTENT BASED SPAMMER DETECTION
of spammers on Twitter social network. Despite all Gupta et al. [6] performed an in-depth characterization
the existing studies, there is still a gap in the exist- of the components that are affected by the rapidly
ing literature. Therefore, to bridge the gap, we review growing malicious content. It was observed that a large
state-of-the-art in the spammer detection and fake user number of people with high social profiles were re-
identification on Twitter. Moreover, this survey presents sponsible for circulating fake news. To recognize the
a taxonomy of the Twitter spam detection approaches fake accounts, the authors selected the accounts that
and attempts to offer a detailed description of recent were built immediately after the Boston blast and were
developments in the domain. later banned by Twitter due to violation of terms and
The aim of this paper is to identify different ap- conditions. About 7.9 million distinctive tweets were
proaches of spam detection on Twitter and to present collected by 3.7 million distinctive users. This dataset
a taxonomy by classifying these approaches into several is known as the largest dataset of Boston blast. The au-
categories. For classification, we have identified four thors performed the fake content categorization through
means of reporting spammers that can be helpful in temporal analysis where temporal distribution of tweets
identifying fake identities of users. Spammers can be is calculated based on the number of tweets posted per
identified based on: (i) fake content, (ii) URL based hour.
spam detection, (iii) detecting spam in trending topics, Fake tweet user accounts were analyzed by the activi-
and (iv) fake user identification. Table 1 provides a ties performed by user accounts from where the spam
comparison of existing techniques and helps users to rec- tweets were generated. It was observed that most of
ognize the significance and effectiveness of the proposed the fake tweets were shared by people with followers.
methodologies in addition to providing a comparison Subsequently, the sources of tweet analysis were ana-
2 VOLUME 4, 2019
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
lyzed by the medium from where the tweets were posted. alerting system comprises of the following components:
It was found that most of the tweets containing any (i) real time data extraction of both tweets and users,
information were generated through mobile devices and (ii) filtering system based on a pre-processing schedule
non-informative tweets were generated more through and on Naïve Bayes algorithm to discard the tweets
the Web interfaces. The role of user attributes in the containing inaccurate information, (iii) data analysis for
identification of fake content was calculated through: (i) spammer detection where the detection windows are
the average number of verified accounts that were either rigorously abolished according to the Sigmoid function
spam or non-spam and (ii) the number of followers of or when the window size reaches the maximum, (iv) alert
the user accounts. The fake content propagation was sub-system that is used when the event is established,
identified through the metrics that include: (i) social the system groups up the tweets that are relevant to
reputation, (ii) global engagement, (iii) topic engage- the same topic where tweets are distinguished with the
ment, (iv) likability, and (v) credibility. After that, the cluster barycenter and the one that is nearest to the
authors utilized regression prediction model to ensure cluster center is chosen as the representative of the whole
the overall impact of people who spread the fake content system cluster, and (v) feedback analysis. The approach
at that time and also to predict the fake content growth is claimed to be efficient and effective for the detection
in future. of some invasive and admirable malignant activities in
circulation.
Concone et al. [7] presented a methodology that pro-
vides malignant alerting by using a specified set of tweets Moreover, Eshraqi et al. [8] determined different fea-
in real-time conquered through the Twitter API. After- tures to detect the spam and then with the help of
wards, the batch of tweets considering the same topic is a den stream-based clustering algorithm, recognize the
sum up to generate an alert. The proposed architecture spam tweets. Some user accounts were selected from
is used to evaluate Twitter posting, recognizing the various datasets and afterwards random tweets were
advancement of admissible event, and reporting of that selected from these accounts. The tweets are subse-
event itself. The proposed approach utilizes the informa- quently categorized as spam and non-spam. The authors
tion contained in the tweets when a spam or malware claimed that the algorithm can divide the data into
is recognized by the users or the report of security has spam and non-spam with high accuracy and fake tweets
been released by the certified authorities. The proposed maybe recognized with high accuracy and precision.
VOLUME 4, 2019 3
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
Various features can be used to determine the spams. analyzed the impact of various features on the per-
For example, feature based on the graph is a state in formance of spam detection, for example: (i) spam to
which Twitter is shaped as a social model of a graph. If non-spam ratio, (ii) size of training dataset, (iii) time
the number of followers is low in comparison with the related data, (iv) factor discretization, and (v) sampling
number of followings, the credibility of an account is of data. To evaluate the detection, first, around 600
low and the possibility that the account is spam is rela- million public tweets were collected and subsequently
tively high. Likewise, feature based on content includes the authors applied the Trend micro’s web reputation
tweets reputation, HTTP links, mentions and replies, system to identify spam tweets as much as possible. A
and trending topics. For the time feature, if many tweets total of 12 lightweight features were also separated to
are sent by a user account in a certain time interval, distinguish non-spam and spam tweets from this iden-
then it is a spam account. The dataset of the study tified dataset. The characteristics of identified features
comprised 50,000 user accounts. The approach identified were represented by cdf figures.
the spammers and fake tweets with high accuracy. These features are grasped to machine learning based
A Lfun (learning for unlabeled tweets) scheme, which spam classification, which are later used in the experi-
is used to handle various problems in the detection ment to evaluate the detection of spam. Four datasets
of Twitter spam, has been presented by Chen et al. are sampled to reproduce different scenarios. Since no
[9]. Their framework comprises two components, i.e., dataset is available publicly for the task, few datasets
learn from detected tweets (LDT) and learn from hu- were used in previous researches. After the identifica-
man labelling (LHL). The two components are used to tion of spam tweets, 12 features were gathered. These
automatically generate spam tweets from the given set features are divided into two classes, i.e., user-based fea-
of unmarked tweets that are easily collected from the tures and tweet-based features. The user-based features
Twitter network side. Once the labelled spam tweets are identified through various objects such as account
are obtained, random forest algorithm is used to per- age and number of user favorites, lists, and tweets. The
form classification. The performance of the scheme is identified user-based features are parsed from the JSON
evaluated while detecting drifted spam tweets. The ex- structure. On the other hand, the tweet-based features
periments were performed on the real-world data of ten include the number of (i) retweets, (ii) hashtags, (iii)
continuous days with each day having 100K tweets each user mentions, and (iv) URLs. The result of evaluation
for the spam and non-spam. The F-measure and the shows that the changing feature distribution reduced the
detection rate were used to evaluate the performance performance whereas no differences were observed in the
of the presented scheme. The results of the proposed training dataset distribution.
approach showed that the methodology improves the
accuracy of spam detection significantly in the real- C. DETECTING SPAM IN TRENDING TOPIC
world situations.
Furthermore, Buntain et al. [10] introduced a method Gharge et al. [3] initiate a method, which is classified on
for detecting fake news on Twitter automatically by pre- the basis of two new aspects. The first one is the recog-
dicting accurate assessment in two credibility-focused nition of spam tweets without any prior information
datasets. The method was applied on the Twitter fake about the users and the second one is the exploration of
news dataset and the model was trained against a crowd language for spam detection on Twitter trending topic at
sourced worker based on the assessment of journal- that time. The system framework includes the following
ists. The two Twitter datasets were used to study the five steps.
integrity in OSNs. The first dataset CREDBANK, a • The collection of tweets with respect to trending
crowd-sourced dataset, was used to evaluate the accu- topics on Twitter. After storing the tweets in a
racy of events in Twitter whereas the second dataset particular file format, the tweets are subsequently
called PHEME is a journalist-labelled dataset of possible analyzed.
rumors in Twitter and journalistic evaluation of their • Labelling of spam is performed to check through all
accuracy. A total of 45 features were described that fall datasets that are available to detect the malignant
into four categories: structural feature, user feature, con- URL.
tent feature, and temporal features. Aligning labels in • Feature extraction separates the characteristics
PHEME and BUZZFEED contain classes that describe construct based on the language model that uses
whether a story is fake or true. Results of the analysis language as a tool and helps in determining whether
are helpful in studying information on social media to the tweets are fake or not.
know whether such stories support similar pattern. • The classification of data set is performed by short-
listing the set of tweets that is described by the
B. URL BASED SPAM DETECTION set of features provided to the classifier to instruct
Chen et al. [11] performed an evaluation of machine the model and to acquire the knowledge for spam
learning algorithms to detect spam tweets. The authors detection.
4 VOLUME 4, 2019
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
•The spam detection uses the classification technique Mateen et al. [13] proposed a hybrid technique
to accept tweets as the input and classify the spam that utilizes user-based, content-based, and graph-based
and non-spam. characteristics for spammer profiles detection. A model
The experimental setup was prepared for determining is proposed to differentiate between the non-spam and
the accuracy of the system. For this purpose, a random spam profiles using three characteristics. The proposed
sample set of 1,000 tweets was collected from which 60% technique was analyzed using Twitter dataset with 11K
were legal and the rest were defected. users and approximately 400K tweets. The goal is to
Stafford et al. [12] examined the degree to which the attain higher efficiency and preciseness by integrating all
trending affairs in Twitter are exploited by spammers. these characteristics. User-based features are established
Although numerous methods to detect the spam have because of relationship and properties of user accounts.
been proposed, the research on determining the effects It is essential to append user-based features for the
of spam on Twitter trending topics has attained only spam detection model. As these features are related to
limited attention of the researchers. The authors in [12] user accounts, all attributes, which were linked to user
presented a technique to cooperate with Twitter public accounts, were identified. These attributes include the
API. The aim of the implemented program was to find number of followers and following, age, FF ratio, and
10 trending topics from all over the world having a reputation. Alternatively, content features are linked to
language code within one hour and open the filtered the tweets that are posted by users as spam bots that
connection related to those topics to acquire a data post a huge amount of duplicate contents as contrast to
stream. In the next hour, the authors obtained as much non-spammers who do not post duplicate tweets.
of the tweets and linked metadata as permitted by the These features depend on messages or content that
Twitter API. Once the data has been collected, the users write. Spammers post contents to spread fake news
collected tweets were classified into two categories, i.e., and these contents contain malicious URL to promote
spam and non-spam tweets, which can be utilized to their product. The content-based features include: (i)
instruct classifiers. the total number of tweets, (ii) hashtag ratio, (iii) URLs
To develop such a collection of manual labelling, an- ratio, (iv) mentions ratio, and (v) frequencyof tweets.
other program was suggested to sample random tweets, The graph-based feature is used to control the evasion
where the idea is based upon URL filtering by Hussain strategies that are conducted by spammers. Spammers
et al. [20]. After the completion of labelling tweets, use different techniques to avoid being detected. They
they move toward the next phase of analysis method. can buy fake followers from different third-party web-
Analysis method has two separate phases, where the sites and exchange their followers to another user to look
first phase was to select and evaluate the attribute like a legal user. Graph-based features include in/out
through information retrieval metrics, while the second degree and betweenness. The evaluation of the approach
phase was to evaluate the effect of spam filtering on the is done by using the dataset of previous techniques as,
trending topics through statistical test. The result of the due to the Twitter policy, no data is available publicly.
evaluation concludes that spammer does not acquire the The results are evaluated by integrating three most
trending topic in Twitter but alternatively adopts target common approaches, namely Decorate, Naïve Bayes,
topics with required qualities. The results signify well for and J48. The result of the experiment shows that the
the sustainability of the Twitter and provide a way for detection rate of the approach is much accurate and
improvement. higher than any of the existing techniques.
Gupta et al. [14] present a policy for the detection of
D. FAKE USER IDENTIFICATION spammers in Twitter and use the popular techniques,
A categorization method is proposed by Erşahin et al. i.e., Naïve Bayes, clustering, and decision trees. The al-
[1] to detect spam accounts on Twitter. The dataset gorithms classify an account as spam or non-spam. The
used in the study was collected manually. The classi- dataset comprises 1064 Twitter users that contain 62
fication is performed by analyzing user-name, profile features, which are either user-specific or tweet-specific
and background image, number of friends and followers, information. The spammer account contains almost 36%
content of tweets, description of account, and number of the used dataset. As the behavior of spammers is dif-
of tweets. The dataset comprised 501 fake and 499 real ferent from non-spammers, some attributes or features
accounts, where 16 features from the information that are recognized in which both categories are different
were obtained from the Twitter APIs were identified. from one another. Feature identification is based on
Two experiments were performed for classifying fake the number of features at user and tweet level such as
accounts. The first experiment uses the Naïve Bayes followers or following, spam keywords, replies, hashtags,
learning algorithm on the Twitter dataset including and URLs [30], [32].
all aspects without discretization, whereas the second After the identification of features, pre-processor step
experiment uses the Naïve Bayes learning algorithm on transforms all continuous features into discrete. Subse-
the Twitter dataset after the discretization. quently, the authors developed a technique using clus-
VOLUME 4, 2019 5
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
6 VOLUME 4, 2019
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
[19] Text pre-processing The objective of the 2 large labelled dataset An inspiring result was
technique was study is to detect spam of tweets containing achieved by using the
conducted, and four tweets which enhance spam. limited feature set that
different feature sets the quantity of data is accessible in tweets,
were utilized for that needs to be as- which is better as com-
exercising the spam sembled by relying only pared to existing spam-
and non-spammer on tweet-inherent fea- mer detection systems.
classifiers. tures.
[21] Two experiments were To understand the sig- Indicates that the
conducted, i.e., edge nificance of each well- low in-degree weight,
weighting and central- defined edge in order to high betweenness
ity weighting find the opinion leader weight, and low or
and to perceive the no PageRank weight
weight that could per- could provide 100%
mit more precised opin- agreement as compared
ion based on evaluation to other evaluation
algorithms. algorithms in order to
find the opinion leader.
[9] Performance of a The goal of the study Around 30 million la- The Lfun scheme can
comprehensive range is to attain real time belled tweets were ran- enhance the precision
of conventional Twitter spam detection domly selected to form of spam detection sig-
machine learning capabilities. the ground truth data nificantly in the real-
algorithms for the set world context.
purpose of identifying
the performance of
detection and strength
based on immense
amount of truth data.
[1] Entropy minimization To detect fake accounts No public dataset is Naive Bayes can per-
discretization (EMD) on Twitter by propos- available, thus, created form well with discrete
technique was used on ing classification meth- own dataset based on values as compared to
numerical features ods and to illustrate Twitter API continuous vales.
the effect of discretiza-
tion on the basis of
Naïve Bayes algorithm
in Twitter.
[13] A hybrid Technique Achieve higher accu- Dataset of Twitter The rate of detection in
has been used for racy by combining user with 11k users and the study is more accu-
the identification of based, content based, approximately 400k rate and higher as com-
spammer on Twitter and graph-based fea- tweets were used. pared to any existing
by utilizing user based, tures for spam profile technique.
content based, and detection
graph-based features.
[6] Regression prediction To classify and About 7.8 million Approximately 29%
model has been used recommend solutions Boston marathon content, which are
in order to prove the to counter different blast related tweets more viral on Twitter
influence of users who forms of spam events extracted using Twitter during the crisis of
spread fake content. on Twitter during API. Boston blast, were
activities like Boston fake. Whereas 51%
Blast. were general views and
comments, and the
remaining were correct
Information.
VOLUME 4, 2019 7
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
In decision tree algorithm, structure of tree was de- • Time difference calculation checks all the tweets
signed, and the decisions were made at every level of the with its previous three tweets and the next three
tree. The result of the proposed approach shows that the tweets, and forms the cluster of seven tweets.
clustering algorithm’s performance to detect the non- • Adult content identification is used to construct a
spam accounts is better as compared to detection of dataset of all URLs that may contain adult content.
spam accounts. Results of these integrated algorithm The results ensure that the proposed anomalous de-
demonstrate the overall accuracy and detection of non- tection model can be used to analyze the number of
spammer with high effectiveness. Ueffectively RL spammers.
Moreover, Ghosh et al. [22] evaluate the scenarios
III. COMPARISON OF APPROACHES FOR SPAM engaged by new spammers in OSNs by recognizing a
DETECTION ON TWITTER spam account in Twitter and controlling their link-
This section provides the comparison of proposed creation plans. The analysis of the approach shows
methodology along with their goals, datasets that are that the spammers support intelligent scenarios for the
used to analyze spams, and results of the experiments formation of link to evade the detection and to raise the
of each method, as shown in Table 1. capacity of their spam that are generated. The dataset of
eight spam accounts in Twitter was used to detect other
A. ANOMALY DETECTION BASED ON URL doubtful user accounts. It is testified that the spammers
Chauhan et al. [16] proposed a methodology for the on Twitter frequently post tweets that contain URLs
detection of anomalous tweets. The type of abnormality of their associated websites, therefore, frequently used
that is distributed on Twitter is the type of URL URLs are utilized to recognize malignant users. The
anomaly. Anomalous users use various URL links for experiment shows that the spammer not only follows
creating spams. The proposed methodology, which is other spammers but also points out legal users who
used to identify various anomalous activities from social generally follow back. On the other hand, a spammer
networking sites, for example, Twitter, comprises the controls the followers of the spotted legal users and
following features. starts to follow them for following these spotted users.
Spotted users hope that they can be followed back.
• URL ranking in which the URL rank is identified This is how spammers identify other spammers and
such that how authentic a URL is. coordinate with them.
• Similarity of tweets includes posting of same tweets The following observations are considered while per-
again and again. forming this experimental study:
• Time difference between tweets involves posting of
• A total of 4491 spam accounts, which have around
five or more tweets during the time period of one
730,000 links that are directed among them, ensure
minute.
the presence of huge spam firm with the density
• Malware content consists of malware URL that can
of 0.036. It is also reported that spam accounts
damage the system.
can easily find other spam accounts within an OSN
• Adult content contains posts that consist of adult
having the size of Twitter.
content.
• It is estimated that 4.74% of the follow links on
For analyzing the anomalous behavior of Twitter based average are developed by these spammers and this
on URL, the dataset is prepared by accumulating 200 amount of fraction is as greater as 12% for some of
tweets of a user. the other accounts.
The dataset is expanded in order to enlarge the size. • It shows that spammers having greater number of
Five functions are executed on Twitter dataset, which following have greater reciprocal on an average. It
are given below: also shows that more of the spammers’ time is spend
• URL rank generation is used to get the URL that in the network to create more and more links so that
a user has used in a tweet. This URL is sent to they can filter out more users who can follow them
the website of ALEXA where the source code is back.
obtained and the tree is generated by the help of • A huge flap exists on the side of spammers, which
web scraper from the given source code. implies a large-scale participation among various
• Tweet similarity in this generation evaluates full spammers for recognizing emergent users to follow.
tweets instead of analyzing only URL. Thus, the result of the analysis recognizes the evidence
• Malware URL rank assignment is used to get the that is left by the large spam firms within OSNs and
URL from a user that s/he has shared in his/her provides various insights on the creation of link scenarios
tweet. The WebOfTrust (WOT) API is used to of the spammers that needs to be studied while creating
check the repute of the URL that whether it is a anti-spam scenarios.
good URL or contains some malware. Furthermore, a study of ambiguous information in
8 VOLUME 4, 2019
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
Twitter spam has been presented by Chen et al. [23]. are essential to be examined to develop the collection
A complete Twitter feed of two weeks with URLs is of users that can be labelled as spammers or non-
collected. A lot of spam tweets, which were analyzed spammers. At the end, user attributes are identified
during the research, only a new tweet without URLs is based on their behavior, e.g., who they interact with
considered as spam. Additionally, spammers primarily and what is the frequency of their interaction.
use encapsulated URLs for creating it more acceptable In order to confirm this instinct, features of users
for the victims to their independent sides to accomplish of the labelled collection has been checked. Two at-
their objectives such as scams, downloading malware, tribute sets are considered, i.e., content attributes and
and phishing. Two steps were applied to recognize the user behavior attributes, to differentiate one user from
spam in Twitter. The first one is using Trend Micro’s the other. Content attributes have the property of the
WRT where the false positive rate of WRT is relatively wordings of tweets that are posted by the users which
low with a likelihood of missing few spam tweets. In gather features that are relevant to the way users write
addition, a goal of the research is to achieve high level tweets. On the other hand, user behavior attributes
of understanding on the variety of ambiguous topics that gather particular features of the behavior of users in the
are used in the Twitter spam. The second step involves context of the posting frequency, interaction, and impact
clustering approach with two folds: a) the clustering on Twitter. The following attributes are considered as
approach uncategorizing non-spam and spam tweets into user characteristics, which include the total number of
various groups. b) Analyzing spam groups would be followers and following, account age, number of tags,
more helpful. fraction of followers per followings, number of times
The graphical clustering approach is used by bipar- users replied, number of tweets received, average, max-
tite Cliques rather than machine learning algorithm for imum, minimum, and median time among user tweets,
the grouping of spam tweets. These ambiguous topics and daily and weekly tweets. Overall 23 attributes of
are categorized into four groups that include malware, the user behavior have been considered. The result of
phishing, Twitter follower scam, and advertising. All the proposed methodology shows that even with the
these groups are organized and developed according to distinguished set of attributes, the framework is able for
the contrasting deceptive information available in spam detecting spammers with high frequency.
groups. The findings of this approach are helpful for Jeong et al. [17] analyzed the follow spam on Twitter
the advancement of spam detection policies. Almost 400 as an alternative of dispersion of provoking public mes-
million tweets are posted daily in which only 25% in- sages, spammers follow authorized users, and followed
clude URLs to investigate such a huge number of tweets by authorized users. Categorization techniques were pro-
where removing spams is relatively very expensive to posed that are used for the detection of follow spammers.
implement in the real world. The result of the analysis The focus of the social relation is cascaded and formu-
shows that the features used in this work face various lated into two mechanism, i.e., social status filtering and
challenges, i.e., some features are simple to be deceived trade significance profile filtering, where each of which
while others are difficult to be extracted. uses two-hop sub-networks that are centered at each
other. Assemble techniques and cascading filtering are
B. MACHINE LEARNING ALGORITHMS also proposed for combining the properties of both trade
Benevenuto et al. [2] examined the problem of spam- significance profile and social status. To check whether
mer detection on Twitter. For this, a large dataset a user is fake or not, a two-hop social network for each
of Twitter is collected that contains more than 5400 user is focused to gather social information from social
million users, 1.8 billion tweets, and 1.9 billion links. networks.
After that, the number of features, which are associated The experiment with the real-world data was per-
with tweet content, and the characteristics of users formed to check the credibility and reliability of Twitter
are recognized for the detection of spammers. These system with positive results. Both TSP and SS filtering
features are considered as the characteristics of machine were proposed by using partial data for real time and
learning process for categorizing users, i.e., to know lightweight spammer detection. Both algorithms contain
whether they are spammers or not. In order to recognize some false positive, but their true positive are not better
the approach for detecting spammers on Twitter, the to collusion rank. A hybrid approach that uses attributes
labelled collection in pre-classification of spammer and of both filtering are suggested. The experiment was
non-spammers has been done. Crawling Twitter has performed on thousand authorized users and thousand
been launched to gather the IDs of users, which are spammer accounts with social status and TSP features.
about 80 million. Twitter allocates a numeric ID to The result of the proposed approach shows that the
each user which distinctively identifies the profile of schemes are scalable because they check user centered
each user. Next, those steps are taken which are needed two-hops social network instead of examining the whole
for the construction of labelled collection and acquired network. This study significantly improves the perfor-
various desired properties. In other words, steps which mance of false and true positives than the previous
VOLUME 4, 2019 9
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
extricated from the unprocessed JSON tweets such as tweets of current topics effectively. Moreover, the study
history of tweets, social relationship, etc. also investigated the influence of size training data on
According to the observation, the indirect features can the capability of spam detection. The authors claimed
assist to enhance the rate of detection with the surrender that the Hidden Markov Model is capable of detecting
of time performance. The authors identified superior spam tweets more effectively as it is better solution to
features from the time and accuracy perspective. The have high quality recent tweets. Table 2 provides the
location under the ROC curve is employed to illustrate comparison of different techniques for spammer detec-
the significance of every individual feature. Moreover, tion.
feature selection via recursive feature elimination (RFE)
is used to select robust features. The key concept of the IV. DISCUSSION
RFE is to frequently construct models to abolish the From the survey, we analyzed that malicious activities
worst or best features. The process is iterated until the on social media are being performed in several ways.
entire feature set is visited. The most important fea- Moreover, the researchers have attempted to identify
tures include account age, friends count, retweet count, spammers or unsolicited bloggers by proposing various
hashtag count, etc. The results of the study show that solutions. Therefore, to combine all pertinent efforts, we
random forest classifier achieves high spam detection proposed a taxonomy according to the extraction and
accuracy in real-time. classification methods. The categorization is based on
Shen et al. [29] investigated issues of detecting spam- various classifications such as fake content, URL based,
mers on Twitter. The proposed method combines char- trending topics, and by identifying fake users. The first
acteristics withdrawal from text content and information major categorization in the taxonomy is of techniques
of social networks. The authors used matrix factoriza- proposed for detecting spam, which is injected in the
tion to determine the underline feature matrix or the Twitter platform through fake content. Spammers gen-
tweets and then came up with a social regularization erally combine spam data with a topic or keywords that
with interaction coefficient to teach the factorization of are malicious or contain the type of words that are likely
the underline matrix. Subsequently, the authors com- to be spam. The second categorization considers the
bined knowledge with social regularization and factor- techniques for spam detection based on URLs.
ization matrix processes, and performed experiments on Generally, because of the length-limit of tweet de-
the real-world Twitter dataset, i.e., UDI Twitter dataset. scription, spammers find it more promising to post
The dataset that was used in this experiment was ba- URL to spread malicious content than the plain normal
sically collected in May 2011 on Twitter which contains text. Therefore, URL based methods are absolutely cus-
50 million tweets in 140 thousand user profile and 284 tomized to determine tweets containing excess of URLs
million following relationships. The content of the tweets specifically on criminal accounts. The third category
for all users were scanned manually. In the end, 1,629 in the proposed taxonomy contains approaches meant
spammers were separated and 10,450 legal users from for spam identification from trending topics on Twitter.
12,079 users in their dataset were extracted. To measure Hashtag or keywords, which are often seen in tweets at a
the efficiency of the proposed approach, a conventional specific time, appear in the Twitter list of trending topics
assessment measures was used to detect the spammers. and are likely to contain spam. Various features for
The method that is proposed un-seemed to incorporate identifying spams in trending topics have been classified
the features that are obtained from the text, social with a variety of attributes. The fourth category in the
information network, and supervised information into a taxonomy is regarding techniques for the identification
single framework. The results of the study demonstrate of fake users to detect spams on Twitter. An assortment
the effectiveness of the spammer detection. of techniques has been introduced for detecting spams
Washha et al. [31] described the Hidden Markov of fake users that helps to overcome malicious activities
Model for filtering the spam related to recent time. against OSN users.
The method supports the accessible and obtainable In addition to reviewing the techniques, the study also
information in the tweet object to recognize spam tweets provides the comparison of miscellaneous Twitter spam
and the tweets that are handled previously related to detection features. These features are extracted from
the same topic. The proposed work was based on two user accounts and the tweets that can help to identify
various assumptions, which are given below. spams. These features are categorized into five classes,
• The observation that had been produced by some namely user, content, graph, structure, and time. The
state St that is hidden from the spectator at given user-based features incorporate the number of following
time t and followers, account age, reputation, FF ratio, and
• The state where the current state St is dependent number of tweets. The content-based features contain
on the previous state St-1 number of retweets, number of URLs, number of replies
The authors explored the consequences of time depen- and propagation of bidirectional, number of characters
dent learning model, which is used for detecting spam and digits, and spam words.
VOLUME 4, 2019 11
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
Ref. User feature Content feature Graph feature Structure feature Time feature
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24
[13] X X X X - - - - - X X - - - X X X X - - - - - -
[11] X X X - X X - - X X X X X X X - - - - - - - - -
[15] X X X - - - X - - - - - - - - - - - - - - - - X
[12] - - - - - - - - X X X X X X - - - - - - - - - -
[33] X - X - - X - - X - - - - - - - - - X - - - - -
[10] X - X - - - - - - - - - - - - - - - X X X X - -
[8] - - - - - - - X - - X - - - - - - - - X - - X -
[2] X X X - - - - X - X X X X - X - - - - X - - - -
[14] X X - - - - X - - X - X - - - X - - - - - - - -
[24] - - - - X - - - - X X X - - - - - - X - - X X -
F1 Number of Followers F9 Number of retweets F17 In/out degree
F2 Number of Following F10 Number of hashtags F18 Betweenness
F3 Age of account F11 Number of user mention F19 Average Tweet Length
F4 Reputation F12 Number of URL F20 Time between first - last Tweet
F5 Number of user favorites F13 Number of Characters F21 Depth of conversion Tree
F6 Number of Lists F14 Number of Digits F22 Tweet frequency
F7 Propagation of Bidirectional F15 Number of Tweets F23 Tweet sent in time interval
F8 Number of replies F16 Spam words F24 Idle time in days
The graph-based features include in/out degree and several features, such as user features, content features,
betweenness centrality whereas the structure-based fea- graph features, structure features, and time features.
tures include average tweet length, thread life time Moreover, the techniques were also compared in terms of
(number of times between first and last tweets), tweet their specified goals and datasets used. It is anticipated
frequency, and depth of conversion tree. On the other that the presented review will help researchers find the
hand, time-based features include idle time in days information on state-of-the-art Twitter spam detection
and tweet sent in specific time interval. Therefore, the techniques in a consolidated form.
survey is assembled by the classes that are categorized Despite the development of efficient and effective
according to different features that are used for ana- approaches for the spam detection and fake user identifi-
lyzing and detecting Twitter spams in various groups. cation on Twitter [34], there are still certain open areas
We further carried out a comparative study on the ex- that require considerable attention by the researchers.
isting techniques and methods that mainly capture the The issues are briefly highlighted as under:
detection of spams on Twitter social network. This study False news identification on social media networks
includes the comparison of various previous methodolo- is an issue that needs to be explored because of the
gies proposed using different datasets and with different serious repercussions of such news at individual as well
characteristics and accomplishments. as collective level [25]. Another associated topic that is
Moreover, the analysis also shows that several ma- worth investigating is the identification of rumor sources
chine learning-based techniques can be effective for on social media. Although a few studies based on statis-
identifying spams on Twitter. However, the selection tical methods have already been conducted to detect the
of the most feasible techniques and methods is highly sources of rumors, more sophisticated approaches, e.g.,
dependent on the available data. For example, Na ïve social network-based approaches, can be applied because
Bayes, random forest, bayes betwork, K-nearest neigh- of their proven effectiveness.
bor, clustering, and decision tree algorithms are used
for predicting and analyzing spams on Twitter with dif- REFERENCES
ferent classes of categorization. This comparative study [1] Erşahin, Buket, Özlem Aktaş, Deniz Kılınç, and Ceyhun Akyol.
"Twitter fake account detection." In Computer Science and
helps to identify all spam detection techniques under one Engineering (UBMK), 2017 International Conference on, pp.
umbrella, as shown in Figure 1. 388-392. IEEE, 2017.
[2] Benevenuto, Fabricio, Gabriel Magno, Tiago Rodrigues, and
V. CONCLUSION AND FUTURE RESEARCH Virgilio Almeida. "Detecting spammers on Twitter." In Collab-
oration, electronic messaging, anti-abuse and spam conference
DIRECTIONS (CEAS), vol. 6, no. 2010, p. 12. 2010.
In this paper, we performed a review of techniques [3] Gharge, Sagar, and Manik Chavan. "An integrated approach
used for detecting spammers on Twitter. In addition, for malicious tweets detection using NLP." In Inventive Com-
munication and Computational Technologies (ICICCT), 2017
we also presented a taxonomy of Twitter spam detec- International Conference on, pp. 435-438. IEEE, 2017.
tion approaches and categorized them as fake content [4] Wu, Tingmin, Sheng Wen, Yang Xiang, and Wanlei Zhou. "Twit-
detection, URL based spam detection, spam detection ter spam detection: Survey of new approaches and comparative
study." Computers & Security 76 (2018): 265-284.
in trending topics, and fake user detection techniques. [5] Soman, Saini Jacob. "A survey on behaviors exhibited by spam-
We also compared the presented techniques based on mers in popular social media networks." In Circuit, Power and
12 VOLUME 4, 2019
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
Computing Technologies (ICCPCT), 2016 International Confer- Twitter." In Proceedings of the 20th international conference
ence on, pp. 1-6. IEEE, 2016. companion on World Wide Web, pp. 41-42. ACM, 2011.
[6] Gupta, Aditi, Hemank Lamba, and Ponnurangam Kumaraguru.
"$1.00 per rt# bostonmarathon# prayforboston: Analyzing fake [23] Chen, Chao, Sheng Wen, Jun Zhang, Yang Xiang, Jonathan
content on Twitter." In eCrime Researchers Summit (eCRS), Oliver, Abdulhameed Alelaiwi, and Mohammad Mehedi Hassan.
2013, pp. 1-12. IEEE, 2013. "Investigating the deceptive information in Twitter spam." Fu-
[7] Concone, Federico, Alessandra De Paola, Giuseppe Lo Re, and ture Generation Computer Systems 72 (2017): 319-326.
Marco Morana. "Twitter analysis for real-time malware discov-
ery." In AEIT International Annual Conference, 2017, pp. 1-6. [24] David, Isaac, Oscar S. Siordia, and Daniela Moctezuma. "Fea-
IEEE, 2017. tures combination for the detection of malicious Twitter ac-
[8] Eshraqi, Nasim, Mehrdad Jalali, and Mohammad Hossein Moat- counts." In Power, Electronics and Computing (ROPEC), 2016
tar. "Detecting spam tweets in Twitter using a data stream clus- IEEE International Autumn Meeting on, pp. 1-6. IEEE, 2016.
tering algorithm." In Technology, Communication and Knowl-
edge (ICTCK), 2015 International Congress on, pp. 347-351. [25] Babcock, Matthew, Ramon Alfonso Villa Cox, and Sumeet
IEEE, 2015. Kumar. "Diffusion of pro-and anti-false information tweets: the
[9] Chen, Chao, Yu Wang, Jun Zhang, Yang Xiang, Wanlei Zhou, Black Panther movie case." Computational and Mathematical
and Geyong Min. "Statistical features-based real-time detection Organization Theory 25, no. 1 (2019): 72-84.
of drifted Twitter spam." IEEE Transactions on Information
Forensics and Security 12, no. 4 (2017): 914-925. [26] Keretna, Sara, Ahmad Hossny, and Doug Creighton. "Recognis-
[10] Buntain, Cody, and Jennifer Golbeck. "Automatically Identify- ing user identity in Twitter social networks via text mining." In
ing Fake News in Popular Twitter Threads." In Smart Cloud Systems, Man, and Cybernetics (SMC), 2013 IEEE International
(SmartCloud), 2017 IEEE International Conference on, pp. 208- Conference on, pp. 3079-3082. IEEE, 2013.
215. IEEE, 2017.
[11] Chen, Chao, Jun Zhang, Yi Xie, Yang Xiang, Wanlei Zhou, [27] Meda, Claudia, Federica Bisio, Paolo Gastaldo, and Rodolfo
Mohammad Mehedi Hassan, Abdulhameed AlElaiwi, and Ma- Zunino. "A machine learning approach for Twitter spammers
jed Alrubaian. "A performance evaluation of machine learning- detection." In Security Technology (ICCST), 2014 International
based streaming spam tweets detection." IEEE Transactions on Carnahan Conference on, pp. 1-6. IEEE, 2014.
Computational social systems 2, no. 3 (2015): 65-76.
[12] Stafford, Grant, and Louis Lei Yu. "An evaluation of the effect [28] Chen, Weiling, Chai Kiat Yeo, Chiew Tong Lau, and Bu Sung
of spam on Twitter trending topics." In Social Computing Lee. "Real-time Twitter Content Polluter Detection Based on
(SocialCom), 2013 International Conference on, pp. 373-378. Direct Features." In Information Science and Security (ICISS),
IEEE, 2013. 2015 2nd International Conference on, pp. 1-4. IEEE, 2015.
[13] Mateen, Malik, Muhammad Azhar Iqbal, Muhammad Aleem,
and Muhammad Arshad Islam. "A hybrid approach for spam [29] Shen, Hua, and Xinyue Liu. "Detecting spammers on Twitter
detection for Twitter." In Applied Sciences and Technology based on content and social interaction." In Network and In-
(IBCAST), 2017 14th International Bhurban Conference on, pp. formation Systems for Computers (ICNISC), 2015 International
466-471. IEEE, 2017. Conference on, pp. 413-417. IEEE, 2015.
[14] Gupta, Arushi, and Rishabh Kaushal. "Improving spam detec-
[30] Jain, Gauri, Manisha Sharma, and Basant Agarwal. "Spam
tion in online social networks." In Cognitive Computing and
detection in social media using convolutional and long short term
Information Processing (CCIP), 2015 International Conference
memory neural network." Annals of Mathematics and Artificial
on, pp. 1-6. IEEE, 2015.
Intelligence 85, no. 1 (2019): 21-44.
[15] Fathaliani, Farnoosh, and Mohamed Bouguessa. "A model-based
approach for identifying spammers in social networks." In Data
[31] Washha, Mahdi, Aziz Qaroush, Manel Mezghani, and Florence
Science and Advanced Analytics (DSAA), 2015. 36678 2015.
Sedes. "A Topic-Based Hidden Markov Model for Real-Time
IEEE International Conference on, pp. 1-9. IEEE, 2015.
Spam Tweets Filtering." Procedia Computer Science 112 (2017):
[16] Chauhan, Vishal, Ajay Pilaniya, Vishesh Middha, Arjit Gupta,
833-843.
Ujjain Bana, Bakshi Rohit Prasad, and Sonali Agarwal. "Anoma-
lous behavior detection in social networking." In Computing,
[32] Pierri, Francesco, and Stefano Ceri. "False News On Social
Communication and Networking Technologies (ICCCNT), 2017
Media: A Data-Driven Survey." arXiv preprint arXiv:1902.07539
8th International Conference on, pp. 1-5. IEEE, 2017.
(2019).
[17] Jeong, Sihyun, Giseop Noh, Hayoung Oh, and Chong-kwon Kim.
"Follow spam detection based on cascaded social information." [33] Sadiq, Saad, Yilin Yan, Asia Taylor, Mei-Ling Shyu, Shu-Ching
Information Sciences 369 (2016): 481-499. Chen, and Daniel Feaster. "AAFA: Associative Affinity Factor
[18] Washha, Mahdi, Aziz Qaroush, and Florence Sedes. "Leveraging Analysis for Bot Detection and Stance Classification in Twit-
time for spammers detection on Twitter." In Proceedings of ter." In Information Reuse and Integration (IRI), 2017 IEEE
the 8th International Conference on Management of Digital International Conference on, pp. 356-365. IEEE, 2017.
EcoSystems, pp. 109-116. ACM, 2016.
[19] Wang, Bo, Arkaitz Zubiaga, Maria Liakata, and Rob Procter. [34] Khan, Muhammad Usman Shahid, Mazhar Ali, Assad Abbas,
"Making the most of tweet-inherent features for social spam Samee U. Khan, and Albert Y. Zomaya. "Segregating spammers
detection on Twitter." arXiv preprint arXiv:1503.07405 (2015). and unsolicited bloggers from genuine experts on Twitter." IEEE
[20] Hussain, Mubashar, Mansoor Ahmed, Hasan Ali Khattak, Transactions on Dependable and Secure Computing 15, no. 4
Muhammad Imran, Abid Khan, Sadia Din, Awais Ahmad, (2018): 551-560.
Gwanggil Jeon, and Alavalapati Goutham Reddy. "Towards
ontology-based multilingual URL filtering: a big data problem."
The Journal of Supercomputing 74, no. 10 (2018): 5003-5021.
[21] Meda, Claudia, Edoardo Ragusa, Christian Gianoglio, Rodolfo
Zunino, Augusto Ottaviano, Eugenio Scillia, and Roberto FAIZA MASOOD received a degree of
Surlinelli. "Spam detection of Twitter traffic: A framework Bachelors in Computer Science from the
based on random forests and non-uniform feature sampling." In COMSATS University Islamabad, Islam-
Advances in Social Networks Analysis and Mining (ASONAM), abad, Pakistan. Currently, she is do-
2016 IEEE/ACM International Conference on, pp. 811-817. ing Master in Software Engineer from
IEEE, 2016. COMSATS University Islamabad, Islam-
[22] Ghosh, Saptarshi, Gautam Korlam, and Niloy Ganguly. "Spam-
abad, Pakistan. Her research interests fo-
mers’ networks within online social networks: a case-study on
cus on the social networking sites. e-mail:
VOLUME 4, 2019
[email protected] 13
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2918196, IEEE Access
F. Masood, G. Ahmad, A. Abbas et al.: Spammer Detection and Fake User Identification on Social Networks
GHANA AMMAD received a degree of IKRAM UD DIN (SM’18) received the M.Sc.
Bachelors in Computer Science from the degree in computer science and the M.S.
COMSATS University Islamabad, Islam- degree in computer networking from the
abad, Pakistan. Currently, she is doing Mas- Department of Computer Science, Univer-
ter in Software Engineer from COMSATS sity of Peshawar, Pakistan, and the Ph.D.
University Islamabad, Islamabad, Pakistan. degree in computer science from the School
Her research interests focus on the so- of Computing, Universiti Utara Malaysia
cial networking sites. e-mail: ghanaam- (UUM). He also served as the IEEE UUM
[email protected] Student Branch Professional Chair. He has
10 years of teaching and research experience
in different universities/organizations. His current research inter-
ests include resource management and traffic control in wired and
AHMAD ALMOGREN (SM) holds Ph.D.
wireless networks, vehicular communications, mobility and cache
degree in computer science from Southern
management in information-centric networking, and the Internet
Methodist University, Dallas, TX, USA, in
of Things.
2002. Previously, he was an assistant pro-
fessor of computer science and a member
of the scientific council, Riyadh college of MOHSEN GUIZANI (S’85-M’89-SM’99-
technology. He also served as the Dean of F’09) received the B.S. (with distinction)
the college of computer and information and M.S. degrees in electrical engineering,
sciences and the head for the council of the M.S. and Ph.D. degrees in computer
academic, Al Yamamah university. He is engineering from Syracuse University, Syra-
currently a Professor and the vice dean for the development and cuse, NY, USA, in 1984, 1986, 1987, and
quality with the college of computer and information sciences, king 1990, respectively. He is currently a Pro-
Saud university. His research areas of interest include mobile and fessor at the CSE Department in Qatar
pervasive computing, cyber security and computer networks. He University, Qatar. Previously, he served as
has served as a guest editor at several computer journals. the Associate Vice President of Graduate
Studies, Qatar University, University of Idaho, Western Michigan
University, and University of West Florida. He also served in
ASSAD ABBAS received Ph.D. in Electri-
academic positions at the University of Missouri-Kansas City, Uni-
cal and Computer Engineering from North
versity of Colorado-Boulder, and Syracuse University. His research
Dakota State University, USA. Currently,
interests include wireless communications and mobile computing,
he is working as an Assistant Professor of
computer networks, mobile cloud computing, security, and smart
Computer Science at COMSATS Univer-
grid. He is currently the Editor-in-Chief of the IEEE Network
sity Islamabad, Islamabad, Pakistan. His
Magazine, serves on the editorial boards of several international
research interests are mainly but not lim-
technical journals and the Founder and the Editor-in-Chief of
ited to Smart Health, Big Data Analytics,
Wireless Communications and Mobile Computing journal (Wiley).
Recommendation Systems, Patent Analy-
He is the author of nine books and more than 500 publications
sis, Software Engineering, and Social Net-
in refereed journals and conferences. He guest edited a number
work Analysis. Moreover, his research has appeared in several
of special issues in IEEE journals and magazines. He also served
reputable international venues. He is also serving as the referee
as a member, Chair, and General Chair of a number of interna-
for numerous prestigious journals and as the technical program
tional conferences. He received three teaching awards and four
committee member for several conferences. Moreover, he is a
research awards throughput his career. He received the 2017 IEEE
member of IEEE and IEEE-HKN. He can be reached at the e-
Communications Society Recognition Award for his contribution
mail: [email protected].
to outstanding research in Wireless Communications. He was the
Chair of the IEEE Communications Society Wireless Technical
HASAN ALI KHATTAK (SM’19) received Committee and the Chair of the TAOS Technical Committee. He
his PhD in Electrical and Computer En- served as the IEEE Computer Society Distinguished Speaker from
gineering degree from Politecnico di Bari, 2003 to 2005. He is a Fellow of IEEE and a Senior Member of
Bari, Italy in April 2015, Master’s degree in ACM.
Information Engineering from Politecnico di
Torino, Torino, Italy, in 2011, and B.CS. de-
MANSOUR ZUAIR is currently an Associate
gree in Computer Science from University of
Professor in the Department of Computer
Peshawar, Peshawar, Pakistan in 2006. He
Engineering, College of Computer and In-
is currently serving as Assistant Professor of
formation Sciences, King Saud University,
Computer Science at COMSATS University
Riyadh, Saudi Arabia. He received his M.S.
Islamabad since January 2016. His current research interests focus
& Ph.D. degree in Computer Engineering
on Web of Things, Data Sciences, Social Engineering for Future
from Syracuse University, his B.S. degree
Smart Cities. His perspective research areas are application of
in Computer Engineering from King Saud
Machine Learning and Data Sciences for improving and enhancing
University. He served as CEN chairman
Quality of life in Smart Urban Spaces through predictive analysis
from 2003 to 2006, vice dean 2009–2015 and
and visualization. He is a Senior Member IEEE, Professional
dean 2016–now. His research interest is in the areas of computer
Member ACM and an active member of IEEE ComSoc, IEEE VTS
architecture, Computer Networks and Signal Processing.
and Internet Society.
14 VOLUME 4, 2019
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
View publication stats http://www.ieee.org/publications_standards/publications/rights/index.html for more information.