Predicting Cyberbullying On Social Media in The Big Data Era Using Machine Learning Algorithms Review of Literature and Open Challenges PDF
Predicting Cyberbullying On Social Media in The Big Data Era Using Machine Learning Algorithms Review of Literature and Open Challenges PDF
Corresponding authors: Mohammed Ali Al-Garadi ([email protected]), Ihsan Ali ([email protected]), and
Ghulam Mujtaba ([email protected])
This work was supported in part by the Deanship of Scientific Research, King Khalid University, through Research Group Project under
Grant R.G.P. 1/166/40, and in part by the University of Malaya Postgraduate Research under Grant PG035-2016A.
ABSTRACT Prior to the innovation of information communication technologies (ICT), social interactions
evolved within small cultural boundaries such as geo spatial locations. The recent developments of com-
munication technologies have considerably transcended the temporal and spatial limitations of traditional
communications. These social technologies have created a revolution in user-generated information, online
human networks, and rich human behavior-related data. However, the misuse of social technologies such as
social media (SM) platforms, has introduced a new form of aggression and violence that occurs exclusively
online. A new means of demonstrating aggressive behavior in SM websites are highlighted in this paper. The
motivations for the construction of prediction models to fight aggressive behavior in SM are also outlined.
We comprehensively review cyberbullying prediction models and identify the main issues related to the
construction of cyberbullying prediction models in SM. This paper provides insights on the overall process
for cyberbullying detection and most importantly overviews the methodology. Though data collection and
feature engineering process has been elaborated, yet most of the emphasis is on feature selection algorithms
and then using various machine learning algorithms for prediction of cyberbullying behaviors. Finally, the
issues and challenges have been highlighted as well, which present new research directions for researchers
to explore.
INDEX TERMS Big data, cyberbullying, cybercrime, human aggressive behavior, machine learning, online
social network, social media, text classification.
and techniques from multidisciplinary and interdisciplinary having well-connected social networks, provide users with
fields. The accessibility of large-scale data produces new liberty and flexibility to post and write on their platforms.
research questions, novel computational methods, interdis- Therefore, users can easily demonstrate aggressive behav-
ciplinary approaches, and outstanding opportunities to dis- ior [9], [10]. SM websites have become dynamic social
cover several vital inquiries quantitatively. However, using communication websites for millions of users worldwide.
traditional methods (statistical methods) in this context is Data in the form of ideas, opinions, preferences, views, and
challenging in terms of scale and accuracy. These meth- discussions are spread among users rapidly through online
ods are commonly based on organized data on human social communication. The online interactions of SM users
behavior and small-scale human networks (traditional social generate a huge volume of data that can be utilized to study
networks). Applying these methods to large online social human behavioral patterns [11]. SM websites also provide an
networks (OSNs) in terms of scale and extent causes sev- exceptional opportunity to analyze patterns of social interac-
eral issues. On the one hand, the explosive growth of OSNs tions among populations at a scale that is much larger than
enhances and disseminates aggressive forms of behavior by before.
providing platforms and networks to commit and propagate Aside from renovating the means through which people are
such behavior. On the other hand, OSNs offer important influenced, SM websites provide a place for a severe form of
data for exploring human behavior and interaction at a large misbehavior among users. Online complex networks, such
scale, and these data can be used by researchers to develop as SM websites, changed substantially in the last decade,
effective methods of detecting and restraining misbehavior and this change was stimulated by the popularity of online
and/or aggressive behavior. OSNs provide criminals with communication through SM websites. Online communica-
tools to perform aggressive actions and networks to commit tion has become an entertainment tool, rather than serving
misconduct. Therefore, methods that address both aspects only to communicate and interact with known and unknown
(content and network) should be optimized to detect and users. Although SM websites provide many benefits to users,
restrain aggressive behavior in complex systems. cyber criminals can use these websites to commit differ-
The remainder of this paper is organized as follows. ent types of misbehavior and/or aggressive behavior. The
Subsection I.A presents an overview of aggressive behav- common forms of misbehavior and/or aggressive behav-
ior in SM, and a new means in which SM websites are ior on OSN sites include cyberbullying [3], phishing [12],
utilized by users to commit aggressive behavior is high- spam distribution [13], malware spreading [14], and
lighted. I.B summarizes the motivations for constructing pre- cyberbullying [15].
diction models to combat aggressive behavior in SM. I.C Users utilize SM websites to demonstrate different types
highlight the importance of constructing cyberbullying pre- of aggressive behavior. The main involvement of SM web-
diction models. I.D, provide the methodology followed in sites in aggressive behavior can be summarized in two
this paper. Section 2 presents a comprehensive review of points [9], [15].
cyberbullying prediction models for SM websites from data 1) [I.] OSN communication is a revolutionary trend that
collection to evaluation. Section 3 discusses the main issues exploits Web 2.0. Web 2.0 has new features that allow
related to the construction of cyberbullying prediction mod- users to create profiles and pages, which, in turn,
els. Research challenges, which present new research direc- make users active. Unlike Web 1.0 that limits users
tions, are discussed in Section 4, and the paper is concluded in to being passive readers of content only, Web 2.0 has
Section 5. expanded capabilities that allow users to be active as
they post and write their thoughts. SM websites have
A. RISE OF AGGRESSIVE BEHAVIOR ON SM four particular features, namely, collaboration, par-
Prior to the innovation of communication technologies, social ticipation, empowerment, and timeliness [16]. These
interaction evolved within small cultural boundaries, such characteristics enable criminals to use SM websites
as locations and families [5]. The recent development of as a platform to commit aggressive behavior with-
communication technologies exceptionally transcends the out confronting victims [9], [15]. Examples of aggres-
temporal and spatial limitations of traditional communi- sive behavior are committing cyberbullying [17]–[19]
cation. In the last few years, online communication has and financial fraud [20], using malicious applica-
shifted toward user-driven technologies, such as SM web- tions [21], and implementing social engineering and
sites, blogs, online virtual communities, and online sharing phishing [12].
platforms. New forms of aggression and violence emerge 2) [II.] SM websites are structures that enable infor-
exclusively online [6]. The dramatic increase in negative mation exchange and dissemination. They allow users
human behavior on SM, with high increments in aggres- to effortlessly share information, such as messages,
sive behavior, presents a new challenge [6], [7]. The advent links, photos, and videos [22]. However, because SM
of Web 2.0 technologies, including SM websites that are websites connect billions of users, they have become
often accessed through mobile devices, has completely trans- delivery mechanisms for different forms of aggressive
formed functionality on the side of users [8]. SM charac- behavior at an extraordinary scale. SM websites help
teristics, such as accessibility, flexibility, being free, and cybercriminals reach many users [23].
B. MOTIVATIONS FOR PREDICTING AGGRESSIVE ideation [45], [46]. Other studies [45], [46] reported an
BEHAVIOR ON SM WEBSITES association between cyberbullying victimization and suicidal
Many studies have been conducted on the contribution of ideation risk. Consequently, developing a cyberbullying pre-
machine learning algorithms to OSN content analysis in the diction model that detects aggressive behavior that is related
last few years. Machine learning research has become crucial to the security of human beings is more important than devel-
in numerous areas and successfully produced many models, oping a prediction model for aggressive behavior related to
tools, and algorithms for handling large amounts of data the security of machines.
to solve real-world problems [24], [25]. Machine learning Cyberbullying can be committed anywhere and anytime.
algorithms have been used extensively to analyze SM web- Escaping from cyberbullying is difficult because cyberbul-
site content for spam [26]–[28], phishing [29], and cyber- lying can reach victims anywhere and anytime. It can be
bullying prediction [19], [30]. Aggressive behavior includes committed by posting comments and statuses for a large
spam propagation [13], [31]–[34], phishing [12], malware potential audience. The victims cannot stop the spread of such
spread [14], and cyberbullying [15]. Textual cyberbullying activities [47]. Although SM websites have become an inte-
has become the dominant aggressive behavior in SM websites gral part of users’ lives, a study found that SM websites are the
because these websites give users full freedom to post on their most common platforms for cyberbullying victimization [48].
platforms [17], [35]–[39]. A well-known characteristic of SM websites, such as Twitter,
SM websites contain large amounts of text and/or non-text is that they allow users to publicly express and spread their
content and other information related to aggressive behavior. posts to a large audience while remaining anonymous [9].
In this work, a content analysis of SM websites is performed The effects of public cyberbullying are worse than those of
to predict aggressive behavior. Such an analysis is limited to private ones, and anonymous scenarios of cyberbullying are
textual OSN content for predicting cyberbullying behavior. worse than non-anonymous cases [49], [50]. Consequently,
Given that cyberbullying can be easily committed, it is con- the severity of cyberbullying has increased on SM websites,
sidered a dangerous and fast-spreading aggressive behavior. which support public and anonymous scenarios of cyberbul-
Bullies only require willingness and a laptop or cell phone lying. These characteristics make SM websites, such as Twit-
with Internet connection to perform misbehavior without ter, a dangerous platform for committing cyberbullying [43].
confronting victims [40]. The popularity and proliferation Recent research has indicated that most experts favor the
of SM websites have increased online bullying activities. automatic monitoring of cyberbullying [51]. A study that
Cyberbullying in SM websites is rampant due to the structural examined 14 groups of adolescents confirmed the urgent
characteristics of SM websites. Cyberbullying in traditional need for automatic monitoring and prediction models for
platforms, such as emails or phone text messages, is per- cyberbullying [52] because traditional strategies for coping
formed on a limited number of people. SM websites allow with cyberbullying in the era of big data and networks do not
users to create profiles for establishing friendships and com- work well. Moreover, analyzing large amounts of complex
municating with other users regardless of geographic loca- data requires machine learning-based automatic monitoring.
tion, thus expanding cyberbullying beyond physical location.
Anonymous users may also exist on SM websites, and this has 1) CYBERBULLYING ON SM WEBSITES
been confirmed to be a primary cause for increased aggressive Most researchers define cyberbullying as using electronic
user behavior [41]. Developing an effective prediction model communication technologies to bully people [53]. Cyberbul-
for predicting cyberbullying is therefore of practical signif- lying may exist in different types or forms, such as writ-
icance. With all these considerations, this work performs a ing aggressive posts, harassing or bullying a victim, making
content-based analysis for predicting textual cyberbullying hateful posts, or insulting the victim [54], [55]. Given that
on SM websites. cyberbullying can be easily committed, it is considered a dan-
The motivation of this review is explained in the following gerous and fast-spreading aggressive behavior. Bullies only
section. require willingness and a laptop or cell phone connected to
the Internet to perform misbehavior without confronting the
victims [40]. The popularity and proliferation of SM websites
C. WHY CONSTRUCTING CYBERBULLYING PREDICTION have increased online bullying activities. Cyberbullying on
MODELS IS IMPORTANT SM websites is performed on a large number of users due to
The motivations for carrying out this review for predict- the structural characteristics of SM websites [48].
ing cyberbullying on SM websites are discussed as follows. Cyberbullying in traditional platforms, such as emails or
Cyberbullying is a major problem [42] and has been doc- phone text messages, is committed on a limited number of
umented as a serious national health problem [43] due to people. SM websites allow users to create profiles for estab-
the recent growth of online communication and SM web- lishing friendships and interacting with other online users
sites. Research has shown that cyberbullying exerts nega- regardless of geographic location, thus expanding cyberbul-
tive effects on the psychological and physical health and lying beyond physical location. Moreover, anonymous users
academic performance of people [44]. Studies have also may exist on SM websites, and this has been confirmed to be
shown that cyberbullying victims incur a high risk of suicidal a primary cause of increased aggressive user behavior [41].
The nature of SM websites allows cyberbullying to occur be in any of the following forms: link prediction, commu-
secretly, spread rapidly, and continue easily [54]. Conse- nity, content, social influence, structured, and unstructured.
quently, developing an effective prediction model for predict- SM is now in the big data era. For example, Facebook stores
ing cyberbullying is of practical significance. SM websites 260 billion photographs in over 20 PB of storage space,
contain large amounts of text and/or non-text content and and up to one million pictures are processed per second.
information related to aggressive behavior. YouTube receives 100 hours of downloaded videos in each
minute [60].
D. METHODOLOGY The most common means of constructing cyberbullying
This section presents the methodology used in this work prediction models is to use a text classification approach that
for a literature search. Two phases were employed to involves the construction of machine learning classifiers from
retrieve published papers on cyberbullying prediction mod- labeled text instances [19], [38], [61]–[63]. Another means
els. The first phase included searching for reputable aca- is to use a lexicon-based model that involves computing
demic databases and search engines. The search engines and orientation for a document from the semantic orientation of
academic databases used for the retrieval of relevant papers words or phrases in the document [64]. Generally, the lex-
were as follows: Scopus, Clarivate Analytics’ Web of Sci- icon in lexicon-based models can be constructed manually
ence, DBLP Computer Science Bibliography, ACM Digital (similar to the approaches used in [65]) or automatically by
Library, ScienceDirect, SpringerLink, and IEEE Xplore. The using seed words to expand the list of words [66]. However,
major keywords used for the literature search were coined in cyberbullying prediction using the lexicon-based approach is
relation to social media as follows: cyberbullying, aggressive rare in literature. The primary reason is that the texts on SM
behavior, big data, and cyberbullying models. The second websites are written in an unstructured manner, thus making
phase involved searching for literature through Qatar Uni- it difficult for the lexicon-based approach to detect cyber-
versity’s digital library. The articles retrieved from the search bullying based only on lexicons [67]–[69]. However, lexi-
were scrutinized to ensure that the articles met the inclusion cons are used to extract features, which are often utilized as
criteria. According to the inclusion criteria, for an article inputs to machine learning algorithms. For example, lexicon-
to be selected for the survey, it must report an empirical based approaches, such as using a profane-based dictionary
study describing the prediction of cyberbullying on SM sites. to detect the number of profane words in a post, are adopted
Otherwise, the article would be excluded in the selection. as profane features to machine learning models [70]. The
Many articles were rejected based on titles. The abstract and key to effective cyberbullying prediction is to have a set
conclusion sections were examined to ensure that articles of features that are extracted and engineered [71]. Features
satisfied the screening criteria, and those that did not satisfy and their combinations are crucial in the construction of
the criteria were excluded from the survey. effective cyberbullying prediction models [70], [71]. Most
studies on cyberbullying prediction [19], [38], [62], [72], [73]
II. PREDICTING CYBERBULLYING ON SOCIAL MEDIA IN used machine learning algorithms to construct cyberbullying
THE BIG DATA ERA USING MACHINE LEARNING prediction models. Machine learning-based models exhibit
ALGORITHMS decent performance in cyberbullying prediction [74]. Conse-
Our world is currently in the big data era because 2.5 quin- quently, this work reviews the construction of cyberbullying
tillion bytes of data are generated daily [56]. Organizations prediction models based on machine learning.
continuously generate large-scale data. These large-scale The machine learning field focuses on the development
datasets are generated from different sources, including the and application of computer algorithms that improve with
World Wide Web, social networks, and sensor networks [57]. experience [75], [76]. The objective of machine learning is
Big data have nine characteristics, namely, volume, variety, to identify and define the patterns and correlations between
variability and complexity, velocity, veracity, value, validity, data. The importance of analyzing big data lies in discovering
verdict, and visibility [58]. For example, Flickr generates hidden knowledge through deep learning from raw data [1].
almost 3.6 TB of data, Google is believed to process almost Machine learning can be described as the adoption of compu-
20,000 TB of data per day, and the Internet gathers an esti- tational models to improve machine performance by predict-
mated 1.8 PB of data daily [59]. ing and describing meaningful patterns in training data and
SM is an online platform that provides users an oppor- the acquisition of knowledge from experience [77]. When this
tunity to create an online community, share information, concept is applied to OSN content, the potential of machine
and exchange content. SM users and the interaction among learning lies in exploiting historical data to detect, predict,
organizations, people, and products are responsible for and understand large amounts of OSN data. For example,
the massive amount of data generated on SM platforms. in supervised machine learning for classification application,
SM platforms, such as Facebook, YouTube, blogs, Insta- classification is learned with the help of suitable examples
gram, Wikipedia, and Twitter, are of different types. The from a training dataset. In the testing stage, new data are
data generated by SM outlets can be structured or unstruc- fed into the model, and instances are classified to a specified
tured in form. SM analytics is the analysis of structured and class learned during the training stage. Then, classification
unstructured data generated by SM outlets. SM analytics can performance is evaluated.
This section reviews the most common processes in the and non-cyberbullying and can thus be used to develop effec-
construction of cyberbullying prediction models for SM web- tive cyberbullying prediction models. Proposing features is
sites based on machine learning. The review covers data col- an important step toward improving the discrimination power
lection, feature engineering, feature selection, and machine of prediction models [76], [79]. Similarly, proposing a set of
learning algorithms. significant features of cyberbullying engagement on SM web-
sites is important in developing effective prediction models
A. DATA COLLECTION based on machine learning algorithms [68], [95].
Data are important components of all machine learning-based State-of-the-art research has developed features to improve
prediction models. However, data (even ‘‘Big Data’’) are the performance of cyberbullying prediction. For example,
useless on their own until knowledge or implications are a lexical syntactic feature has been proposed to deal with
extracted from them. Data extracted from SM websites the prediction of offensive language; this method is better
are used to select training and testing datasets. Supervised than traditional learning-based approaches in terms of pre-
prediction models aim to provide computer techniques to cision [18]. Dadvar et al. examined gender information from
enhance prediction performance in defined tasks on the basis profile information and developed a gender-based approach
of observed instances (labeled data) [78]. Machine learning for cyberbullying prediction by using datasets from Myspace
models for a certain task primarily aim to generalize; a as a basis. The gender feature was selected to improve the
successful model should not be limited to examples in a discrimination capability of a classifier. Age and gender were
training dataset only [79] but must include unlabeled real included as features in other studies [17], [61], but these
data. Data quantity is inconsequential; what is crucial is features are limited to the information provided by users in
whether or not the extracted data represent activities on SM their online profiles.
websites well [80]–[82]. The main data collection strategies Several studies focused on cyberbullying prediction based
in previous cyberbullying prediction studies on SM websites on profane words as a feature [35], [68], [70], [95], [96]. Sim-
can be categorized into data extracted from SM websites ilarly, a lexicon of profane words was constructed to indicate
by using either keywords, that is, words, phrases, or hash- bullying, and these words were used as features for input
tags (e.g., [19], [43], [83]–[85]), or by using user profiles to machine learning algorithms [97], [98]. Using profane
(e.g., [38], [62], [70], [86]). The issues in these data collection words as features demonstrates a significant improvement
strategies and their effects on the performance of machine in model performance. For example, the number of ‘‘bad’’
learning algorithms are highlighted in the Data Collection words and the density of ‘‘bad’’ words were proposed as
section (related issues). features for input to machine learning in a previous work [70].
The study concluded that the percentage of ‘‘bad’’ words in
B. FEATURE ENGINEERING a text is indicative of cyberbullying. Another research [85]
Feature is a measurable property of a task that is being expanded a list of pre-defined profane words and allocated
observed [87]. The main purpose of engineering feature vec- different weights to create bullying features. These features
tors is to provide machine learning algorithms with a set were concatenated with bag-of-words and latent semantic
of learning vectors through which these algorithms learn features and used as a feature input for a machine learning
how to discriminate between different types of classes [76]. algorithm.
Feature engineering is a key factor behind the success and Reference [19] proposed features, such as pronouns and
failure of most machine learning models [79]. The success skip grams, as additional features to traditional models, such
and failure of prediction may be based on several elements. as bag of words (n-gram n = 1). The authors claimed that
The most significant element is the features used to train the adding these features improved the overall classification
model [78]. Most of the effort in constructing cyberbullying accuracy. Another study [62] analyzed textual cyberbullying
prediction models using learning algorithms is devoted to this associated with comments on images in Instagram and devel-
task [61], [62], [72]. In this context, the design of the input oped a set of features from text comprising traditional bag-
space (i.e., features and their combinations that are provided of-words features, comment counts for an image, and post
as an input to the classifier) is vital. counts within less than one hour of posting the image. Fea-
Proposing a set of discriminative features, which are used tures mined from user and media information, including the
as inputs to the machine learning classifier, is the main number of followers and likes, and shared media and features
step toward constructing an effective classifier in many from image content, such as image types, were added [62].
applications [76]. Feature sets can be created based on The combination of all features improved the overall classi-
human-engineered observations, which rely on how features fication performance [62].
correlate with the occurrences of classes [76]. For example, The context-based approach is better than the list-based
recent cyberbullying studies [88]–[94] established the cor- approach in developing the feature vector [37]. However,
relation between different variables, such as age, gender, the diversity and complexity of cyberbullying do not always
and user personality, and cyberbullying occurrence. These support this conclusion. Several studies [68], [72], [96],
observations can be engineered into a practical form (feature) [99] discussed how sentiment analysis can improve the dis-
to allow the classifier to discriminate between cyberbullying crimination power of a classifier to distinguish between
cyberbullying and normal posts. These studies assumed that of textual cyberbullying. However, the set of features should
sentiment features are a good signal for cyberbullying occur- be analyzed using feature selection algorithms. Feature selec-
rence. In another study that aimed to establish ways of tion algorithms are adopted to decide which features are most
reducing cyberbullying activities by predicting troll profiles, probably relevant or irrelevant to classes.
the researchers proposed a model to identify and associate
troll profiles in Twitter; they assumed that predicting troll
C. FEATURE SELECTION ALGORITHMS
profiles is an important step toward predicting and stopping
Feature selection algorithms were rarely adopted in state-
cyberbullying occurrence on SM websites [38]. This study
of-the-art research to perform cyberbullying prediction on
proposed features based on tweeted text, posting time, lan-
SM websites via machine learning (all extracted features are
guage, and location to improve the identification of author-
used to train the classifiers). Most of the examined studies
ship of posts and determine whether a profile is troll or not.
(e.g., [18], [61], [68], [70]–[72], [85], [95], [96], [99]) did not
Reference [99] merged features from the structure of SM
use feature selection to decide which features are important in
websites (e.g., degree, closeness, betweenness, and eigenvec-
training machine learning algorithms. Two studies [19], [62]
tor centralities as well as clustering coefficient) with features
used chi-square and PCA to select a significant feature from
from users (e.g., age and gender) and content (e.g., length
extracted features. These feature selection algorithms are
and sentiment of a post). Combining these features improves
briefly discussed in following subsections.
the final machine learning accuracy [99]. Table 1 shows a
comparison of the different features used in cyberbullying
prediction literature. affect prediction performance. If the 1) INFORMATION GAIN
constructed features contain a large set of features that indi- Information gain is the estimated decrease in entropy pro-
vidually associate well with class, then the learning process duced by separating examples based on specified features.
will be effective. This condition explains why most of the Entropy is a well-known concept in information theory;
discussed studies aimed to produce many features. The input it describes the (im)purity of an arbitrary collection of
features should reflect the behavior related to the occurrence examples [100].
Information gain is used to calculate the strength or impor- coefficient measures the correlation [103] as follows:
tance of features in a classification model according to the
(xi − x) (yi − y)
P
class attribute. Information gain [101] evaluates how well rxy = , (4)
a specified feature divides training datasets with respect to (n − 1) Sx Sy
class labels, as explained in the following equations. Given a where x and y are the sample means for X and Y , respectively;
training dataset (Tr), the entropy of (Tr) is defined as. Sx and Sy are the sample standard deviations for X and Y ,
X respectively; and n is the size of the sample used to compute
I (Tr) = − Pn log2 Pn , (1) the correlation coefficient [103].
where Pn is the probability that Tr belongs to class n.
3) CHI-SQUARE TEST
For attribute Att datasets, the expected entropy is calculated
as Another common feature selection model is the chi-square
test. This test is used in statistics, among other variables,
X TrAtt
I (Att) = × I (TrAtt ) . (2) to test the independence of two occurrences. In feature selec-
Tr tion, chi-square is used to test whether the occurrences of
a feature and class are independent. Thus, the following
The information gain of attribute Att datasets is
quantity is assumed for each feature, and they are ranked by
IG(Att) = I (Tr) − I (Att) (3) their score.
N P(f , ci )P(f , ci ) − P(f , ci )P(f , ci )
2) PEARSON CORRELATION N = (5)
P(f )P(f )P(ci )P(ci )
Correlation-based feature selection is commonly used in
reducing feature dimensionality and evaluating the discrim- The chi-square test [104] assesses the independence between
ination power of a feature in classification models. It is also feature f and class ci , in which N is the total number of
a straightforward model for selecting significant features. documents.
Pearson correlation measures the relevance of a feature by
computing the Pearson correlation between it and a class. D. MACHINE LEARNING ALGORITHMS
The Pearson correlation coefficient measures the linear cor- Many types of machine learning algorithms exist, but nearly
relation between two attributes [102]. The subsequent value all studies on cyberbullying prediction in SM websites used
lies between −1 and +1, with −1 implying absolute negative the most established and widely used type, that is, supervised
correlation (as one attribute increases, the other decreases), machine learning algorithms [67], [99]. The accomplishment
+1 denoting absolute positive correlation (as one attribute of machine learning algorithms is determined by the degree
increases, the other also increases), and 0 denoting the to which the model accurately converts various types of
absence of any linear correlation between the two attributes. prior observation or knowledge about the task. Much of the
For two attributes or features X and Y, the Pearson correlation practical application of machine learning considers the details
are easy to understand and interpret; hence, the decision tree instance will be a positive class; otherwise, the prediction is
algorithm can be used to analyze data and build a graphic for the other class (negative class) [124]. Logistic regression
model for classification. The most commonly improved ver- was used in the construction of cyberbullying prediction mod-
sion of decision tree algorithms used for cyberbullying pre- els in [19] and [73].
diction is C.45 [38], [70], [95]. C4.5 can be explained as
follows. Given N number of examples, C4.5 first produces E. EVALUATION
an initial tree through the divide-and-conquer algorithm as The primary objective of constructing prediction models
follows [120]: based on machine learning is to generalize more than the
If all examples in N belong to the same class or N is training dataset [79]. When a machine learning model is
small, the tree is a leaf labeled with the most frequent class applied to a real example, it can perform well. Accord-
in N . Otherwise, a test is selected based on, for example, the ingly, the data are divided into two parts. The first part is
mostly used information gain test on a single attribute with the training data used to train machine learning algorithms.
two or more outputs. Considering that the test is the root of The second part is the testing data used to test machine
the tree creation partition of N into subsets N1 , N2 , N3 . . . . . . . learning algorithms. However, separately dividing data into
regarding the outputs for each example, the same procedure training and testing is not widely employed [79], especially
is applied recursively to each subset [120]. in applications in which deriving training and testing data
are difficult. For example, in cyberbullying prediction, most
5) K-NEAREST NEIGHBOR state-of-art studies manually labeled data. Hence, creating
K-nearest neighbor (KNN) is a nonparametric technique that labeled data is expensive. These issues can be reduced by
decides the KNNs of X0 and uses a majority vote to cal- cross validation, that is, randomly dividing the training data
culate the class label of X0 . The KNN classifier often uses into 10 subsets for example, and this process is called
Euclidean distances as the distance metric [121]. To demon- 10-fold cross validation. Cross validation involves the follow-
strate a KNN classification, classifying new input posts (from ing steps: keep a fold separate (the model does not see it) and
a testing set) is considered by using a number of known train data on the model by using the remaining folds; test each
manually labeled posts. The main task of KNN is to classify learned classifier on the fold which it did not see; and average
the unknown example based on a nominated number of its the results to see how well the particular parameter setting
nearest neighbors, that is, to finalize the class of unknown performs [79], [125].
examples as either a positive or negative class. KNN classifies
the class of unknown examples by using majority votes for F. EVALUATION METRICS
the nearest neighbors of the unknown classes. For example, Researchers measure the effectiveness of a proposed model
if KNN is one nearest neighbor [estimating the class of an to determine how successfully the model can distinguish
unknown example using the one nearest neighbor vote (k cyberbullying from non-cyberbullying by using various eval-
= 1)], then KNN will classify the class of the unknown uation measures. Reviewing common evaluation metrics
example as positive (because the closest point is positive). For in the research community is important to understand the
two nearest neighbors (estimating the class of an unknown performance of conflicting models. The most commonly
example using the two nearest neighbor vote), KNN is unable used metrics in evaluating cyberbullying classifiers for SM
to classify the class of the unknown example because the websites are as follows:
second closest point is negative (positive and negative votes
are equal). For four nearest neighbors (estimating the class 1) ACCURACY
of an unknown example using the four nearest neighbor It was used to evaluate cyberbullying prediction models
vote), KNN classifies the class of the unknown example as in [62], [70], [73] and [95], and it is calculated as follows:
positive (because the three closest points are positive and (tp + tn)
only one vote is negative). The KNN algorithm is one of the Accuracy = . (6)
(tp + fp + tn + fn)
simplest classification algorithms, but despite its simplicity,
it can provide competitive results [122]. KNN was used in the 2) PRECISION, RECALL, AND F-MEASURE
construction of cyberbullying prediction models in [38]. These were used to evaluate cyberbullying prediction models
in [18], [61], [72], and [73]. They are calculated as follows:
6) LOGISTIC REGRESSION CLASSIFICATION
tp
Logistic regression is one of the common techniques Precision = , (7)
imported by machine learning from the statistics field. (tp + fp)
tp
Logistic regression is an algorithm that builds a separating Recall = , (8)
hyperplane between two datasets by means of the logis- (tp + fn)
tic function [123]. The logistic regression algorithm takes 2 × precision × recall
F − Measure = (9)
inputs (features) and generates a forecast according to the recision + recall
probability of the input being appropriate for a class. For where tp means true positive, tn is true negative, fp denotes
example, if the probability is >0.5, the classification of the false positive, and fn is false negative.
3) AREA UNDER THE CURVE (AUC) Nevertheless, by simply tracking posts that have particular
AUC offers a discriminatory rate of the classifier at various keywords, these researches may have presented potential
operating points [3], [19], [38]. The main benefit of using sampling bias [82], [128], limited the prediction to posts
AUC as an evaluation metric is that AUC gives a more robust that contain the predefined keywords, and overlooked many
measurement than the accuracy metric in class-imbalance other posts relevant to cyberbullying. Such data collection
situations [19], [38]. methods limit the prediction model of cyberbullying to spec-
ified keywords. The identification of keywords for extract-
III. ISSUES RELATED TO CONSTRUCTING ing posts is also subject to the author’s understanding of
CYBERBULLYING PREDICTION MODELS cyberbullying. An effective method should use a complete
In this section, the issues identified from the reviewed stud- range of posts indicating cyberbullying to train the machine
ies are discussed. The main issues related to cyberbullying learning classifier and ensure the generalization capability
definition, data collection feature engineering, and evaluation of the cyberbullying prediction model [43]. An important
metric selection are identified and discussed in following objective of machine learning is to generalize and not to
subsections. limit the examples in a training dataset [79]. Researchers
should investigate whether the sampled data are extracted
from data that effectively represents all possible activities on
A. ISSUES RELATED TO CYBERBULLYING DEFINITION
SM websites [128]. Extracting well-representative data from
Traditional bullying is generally defined as ‘‘intentional SM is the first step toward building effective machine learning
behavior to harm another, repeatedly, where it is difficult for prediction models. However, SM websites’ public application
the victim to defend himself or herself’’ [126]. By extending program interface (API) only allows the extraction of a small
the definition of traditional bullying, cyberbullying has been sample of all relevant data and thus poses a potential for
defined [90] as ‘‘an aggressive behavior that is achieved using sampling bias [80]–[82]. For example, a previous study [128]
electronic platforms by a group or an individual repeatedly discussed whether data extracted from Twitter’s streaming
and over time against a victim who cannot easily defend him API is a sufficient representation of the activities in the Twit-
or herself.’’ Applying such a definition makes it difficult to ter network as a whole; the author compared keyword (words,
classify manually labeled data (the instance in which machine phrases, or hashtags), user ID, and geo-coded sampling.
learning algorithms learn from) and whether a post is cyber- Twitter’s streaming API returns a dataset with some bias
bullying or not. Two main issues make the above definition when keyword or user ID sampling is used. By contrast, using
difficult to be applied in online environments [47], [127]. geo-tagged filtering provides good data representation [128].
The first issue is how to measure ‘‘repeatedly and over time With these points in mind, researchers should ensure min-
aggressive behavior’’ on SM, and the second one is how to imum bias as much as possible when they extract data to
measure power imbalance and ‘‘a victim who cannot eas- guarantee that the examples selected to be represented in
ily defend himself or herself’’ on SM. These issues have training data are generalized and provide an effective model
been discussed by researchers to simplify the concept of when applied to testing data. Bias in data collection can
cyberbullying in the online context. First, the concept of impose bias in the selected training dataset based on specific
repetitive act in cyberbullying is not as straightforward as keywords or users, and such a bias consequently introduces
that in SM [47]. For example, SM websites can provide overfitting issues that affect the capability of a machine
cyberbullies a medium to propagate cyberbullying posts for a learning model to make reliable predictions on untrained data.
large population. Consequently, a single act by one committer
may become repetitive over time [47]. Second, power imbal- C. FEATURE ENGINEERING
ance is presented in different forms in online communication. Features are vital components in improving the effective-
Researchers [127] have suggested that the content in online ness of machine learning prediction models [79]. Most of
environments is difficult to eliminate or avoid, thus making a the discussed studies attempted to provide effective machine
victim powerless. learning solutions to cyberbullying on SM websites by pro-
These definitional aspects are under intense debate, but to viding significant features (Table 1). However, these studies
simplify the definition of cyberbullying and make this defini- overlooked other important features. For example, online
tion applicable to a wide range of applications, the researchers cyberbullies may dynamically change the way they use
in [53] and [72] defined cyberbullying as ‘‘the use of elec- words and acronyms. SM websites help create cyberbullying
tronic communication technologies to bully others.’’ Propos- acronyms that have not been commonly used in committing
ing a simplified and clear definition of cyberbullying is a traditional bullying or are beyond SM norms [129]. Recent
crucial step toward building machine learning models that can survey response studies (questionnaire-based studies) have
satisfy the definition criteria of cyberbullying engagement. reported positive correlations between different variables,
such as personality [93], [94] and sociability of a user in
B. DATA COLLECTION an online environment [130], and cyberbullying occurrences.
Many cyberbullying prediction studies extracted their The observations of these studies are important in understand-
datasets by using specific keywords or profile IDs. ing such behavior in online environments. However, these
observations are yet to be used as features with machine of machine learning algorithms to be tested. First, a specific
learning algorithms to provide significant models. These literature on machine learning for cyberbullying detection
observations can be useful when transformed to a practical is important in selecting a specified classifier. The pre-
form (features) that can be employed to develop effective eminence of the classifier may be circumscribed to a given
machine learning prediction models for cyberbullying on SM domain [134]. Therefore, general previous research and find-
websites. The abundant information provided by SM web- ings on machine learning can used as a guide to select a
sites should be utilized to convert observations into a set of machine learning algorithm. Second, a literature review of
features. For example, two studies [17], [61] attempted to text mining [135], [136] can be used as a guide. Third, a per-
improve machine learning classifier performance by includ- formance comparison of comprehensive datasets [137] can be
ing features, such as age and gender, that show improvement used as basis to select machine learning algorithms. However,
in classifier performance, but these features are extracted although these three points can be used as guide to narrow the
from direct user details mentioned in the online profiles of selection of machine learning algorithms, researchers need to
users. However, most studies found that only a few users test many machine learning algorithms to identify the optimal
provide complete details in their online profiles [131], [132]. classifier for an accurate predictive model.
These studies suggested the useful practice of utilizing words
expressed in the content (posts) to identify user age and E. IMBALANCED CLASS DISTRIBUTION
gender [131], [132]. Moreover, cyberbullying is related to In many cases of real data, datasets naturally have imbal-
the aggressive behavior of a user. A study demonstrated that anced classes in which the normal class has a large number
aggression considerably predicts cyberbullying [92]. Simi- of instances and the abnormal class has a small number of
larly, cyberbullying behavior has a strong correlation with instances in the dataset. Abnormal class instances are rare
neuroticism [93], [94]. Therefore, predicting if a user has used and difficult to be collected from real-world applications.
words related to neuroticism may provide a useful feature to Examples of imbalanced data applications are fraud detec-
predict cyberbullying engagement. tion, instruction detection, and medical diagnosis. Similarly,
A significant correlation has also been found between the number of cyberbullying posts is expected to be much
sociability of a user and cyberbullying engagement in online less than the number of non-cyberbullying posts, and this
environments [130]. Users who are highly active in online assumption generates an imbalanced class distribution in the
environments are likely to engage in cyberbullying [133]. dataset in which the instances of non-cyberbullying contain
According to these observations, SM websites possess fea- much more posts than those of cyberbullying. Such cases
tures that can be used as signals to measure the sociability can prevent the model from correctly classifying the exam-
of a user, such as number of friends, number of posts, URLs ples. Many methods have been proposed to solve this issue,
in posts, hashtags in posts, and number of users engaged in and examples include SMOTE [138] and weight adjustment
conversations (mentioned). The combination of these features (cost-sensitive technique) [139].
with traditionally used ones, such as profanity features, can The SMOTE technique [138] is applied to avoid over-
provide comprehensive discriminative features. The reviewed fitting, which occurs when particular replicas of minority
studies (Table 1) focused on using either a traditional feature classes are added to the main dataset. A subdivision of
model (e.g., bag-of-words) or information (e.g., age or gen- data is reserved from the minority class as an example, and
der) limited to user profile information (information written new synthetic similar classes are generated. These synthetic
by users in their profile). Given that such information is lim- classes are then added to the original dataset. The created
ited, comprehensive features should be proposed to improve dataset is used to train the machine learning methods. The
classifier performance. cost-sensitive technique is utilized to control the imbalance
Moreover, maintaining a precise and accurate process in class [139]. It is based on creating is a cost matrix, which
constructing machine learning models from start (data col- defines the costs experienced in false positives and false
lection) to end (evaluation metric selection) is important negatives.
in ensuring that the proposed features hold significance in
improving classifier performance. The following subsec- F. EVALUATION METRIC SELECTION
tion analyzes other issues related to constructing effective Accuracy, precision, recall, and AUC are commonly used
machine learning models for cyberbullying prediction on SM as evaluation metrics [19], [38]. Evaluation metric selec-
websites. tion is important. The selection is based on the nature of
manually labeled data. Selecting an inappropriate evaluation
D. MACHINE LEARNING ALGORITHM SELECTION metric may result in better performance according to the
A machine learning algorithm is selected to be trained on selected evaluation metric. Then, the researcher may find the
proposed features. However, deciding which classifier per- results to be significantly improved, although an investiga-
forms best for a specific dataset is difficult. More than one tion of how the machine learning model is evaluated may
machine learning algorithm should be tested to determine produce contradicting results and may not truly reflect the
the best machine learning algorithm for a specific dataset. improvement of performance. For example, cyberbullying
Three points may be used as guide to narrow the selection posts are commonly considered abnormal cases, whereas
non-cyberbullying posts are considered normal cases. The should clearly represent features that occur in current and
ratio between cyberbullying and non-cyberbullying is nor- future data to retain the context of the model. Given that big
mally large. Generally, non-cyberbullying posts comprise data are not generic and dynamic in nature, the context of
a large portion. For example, 1000 posts are manu- these data is difficult to understand in terms of scale and even
ally labeled as cyberbullying and non-cyberbullying. The more difficult to maintain when data are reduced to fit into
non-cyberbullying posts are 900, and the remaining 100 posts a machine learning model. Handling context of big data is
are cyberbullying. If a machine learning classifier classifies challenging and has been presented as an important future
all 1000 posts as non-cyberbullying and is unable to classify direction [141].
any posts (0) as cyberbullying, then this classifier is consid- Furthermore, human behavior is dynamic. Knowing when
ered impractical. By contrast, if researchers use accuracy as online users change the way of committing cyberbullying is
the main evaluation metric, then the accuracy of this classifier an important component in updating the prediction model
calculated as mentioned in the accuracy equation will yield a with such changes. Therefore, dynamically updating the
high accuracy percentage. prediction model is necessary to meet human behavioral
In the example, the classifier fails to classify any cyberbul- changes [1].
lying posts but obtains a high accuracy percentage. Knowing
the nature of manually labeled data is important in selecting B. CULTURE EFFECT
an evaluation metric. In cases where data are imbalanced, What was considered cyberbullying yesterday might not be
researchers may need to select AUC as the main evaluation considered cyberbullying today, and what was previously
metric. In class-imbalance situations, AUC is more robust considered cyberbullying may not be considered cyberbul-
than other performance metrics [140]. Cyberbullying and lying now due to the introduction of OSNs. OSNs have a
non-cyberbullying data are commonly imbalanced datasets globalized culture. However, machine learning always learns
(non-cyberbullying posts outnumber the cyberbullying ones) from the examples provided. Consequently, designing dif-
that closely represent the real-life data that machine learn- ferent examples that represent a different culture remains
ing algorithms need to train on. Accordingly, the learn- to be defined, and robust work from different disciplines is
ing performance of these algorithms is independent of data required. For this purpose, cross disciplinary coordination is
skewness [73]. Special care should be taken in selecting highly desirable.
the main evaluation metric to avoid uncertain results and
appropriately evaluate the performance of machine learning C. LANGUAGE DYNAMICS
algorithms. Language is quickly changing, particularly among the young
generation. New slang is regularly integrated into the lan-
IV. ISSUES AND CHALLENGES guage culture. Therefore, researchers are invited to propose
This section presents the issues and challenges while guiding dynamic algorithms to detect new slang and abbreviations
future researchers to explore the domain of sentiment analysis related to cyberbullying behavior on SM websites and keep
through leveraging machine learning algorithms and models updating the training processes of machine learning algo-
for detecting cyberbullying through social media. rithms by using newly introduced words.
of supervised learning [142]. This gap in literature may be Considerable research effort is required to construct
caused by the fact that nearly all current studies rely on highly effective and accurate cyberbullying detection models.
manually labeled data as the input to supervised algorithms We believe that the current study will provide crucial details
for classifying classes. Thus, finding patterns between two on and new directions in the field of detecting aggressive
classes by using unsupervised grouping remains difficult. human behavior, including cyberbullying detection in online
Intensive research is required to develop unsupervised algo- social networking sites.
rithms that can detect effective patterns from data. Traditional
machine learning algorithms lack the capability to handle REFERENCES
cyberbullying big data. [1] V. Subrahmanian and S. Kumar, ‘‘Predicting human behavior: The next
Deep learning has recently attracted the attention of many frontiers,’’ Science, vol. 355, no. 6324, p. 489, 2017.
[2] H. Lauw, J. C. Shafer, R. Agrawal, and A. Ntoulas, ‘‘Homophily in the
researchers in different fields. Natural language understand- digital world: A LiveJournal case study,’’ IEEE Internet Comput., vol. 14,
ing is a new area in which deep learning is poised to make a no. 2, pp. 15–23, Mar./Apr. 2010.
large effect over the next few years [142]. [3] M. A. Al-Garadi, K. D. Varathan, and S. D. Ravana, ‘‘Cybercrime detec-
tion in online communications: The experimental case of cyberbully-
The traditional machine learning algorithms pointed out in ing detection in the Twitter network,’’ Comput. Hum. Behav., vol. 63,
this survey lacks the capability to process big data in a stan- pp. 433–443, Oct. 2016.
dalone format. Big data have rendered traditional machine [4] L. Phillips, C. Dowling, K. Shaffer, N. Hodas, and S. Volkova, ‘‘Using
social media to predict the future: A systematic literature review,’’ 2017,
learning algorithms impotent. Cyberbullying big data gener- arXiv:1706.06134. [Online]. Available: https://arxiv.org/abs/1706.06134
ated from SM require advanced technology for the processing [5] H. Quan, J. Wu, and Y. Shi, ‘‘Online social networks & social network
of the generated data to gain insights and help in making services: A technical survey,’’ in Pervasive Communication Handbook.
Boca Raton, FL, USA: CRC Press, 2011, p. 4.
intelligent decisions.
[6] J. K. Peterson and J. Densley, ‘‘Is social media a gang? Toward a selection,
Big data are generated at a very high velocity, variety, facilitation, or enhancement explanation of cyber violence,’’ Aggression
volume, verdict, value, veracity, complexity, etc. Researchers Violent Behav., 2016.
need to leverage various deep learning techniques for pro- [7] BBC. (2012). Huge Rise in Social Media. [Online]. Available:
http://www.bbc.com/news/uk-20851797
cessing social media big data for cyberbullying behaviors. [8] P. A. Watters and N. Phair, ‘‘Detecting illicit drugs on social media using
The deep learning techniques and architectures with a poten- automated social media intelligence analysis (ASMIA),’’ in Cyberspace
tial to explore the cyberbullying big data generated from Safety and Security. Berlin, Germany: Springer, 2012, pp. 66–76.
[9] M. Fire, R. Goldschmidt, and Y. Elovici, ‘‘Online social networks:
SM can include generative adversarial network, deep belief Threats and solutions,’’ IEEE Commun. Surveys Tuts., vol. 16, no. 4,
network, convolutional neural network, stacked autoencoder, pp. 2019–2036, 4th Quart., 2014.
deep echo state network, and deep recurrent neural net- [10] N. M. Shekokar and K. B. Kansara, ‘‘Security against sybil attack in social
network,’’ in Proc. Int. Conf. Inf. Commun. Embedded Syst. (ICICES),
work. These deep learning architectures remain unexplored 2016, pp. 1–5.
in cyberbullying detection in SM. [11] J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, A. Flammini, and
F. Menczer, ‘‘Detecting and tracking political abuse in social media,’’ in
Proc. 5th Int. AAAI Conf. Weblogs Social Media, 2011, pp. 297–304.
V. CONCLUSIONS AND FUTURE DIRECTIONS [12] A. Aggarwal, A. Rajadesingan, and P. Kumaraguru, ‘‘PhishAri: Auto-
This study reviewed existing literature to detect aggres- matic realtime phishing detection on Twitter,’’ in Proc. eCrime Res.
sive behavior on SM websites by using machine learn- Summit (eCrime), Oct. 2012, pp. 1–12.
[13] S. Yardi et al., ‘‘Detecting spam in a Twitter network,’’ First
ing approaches. We specifically reviewed four aspects of Monday, Jan. 2009. [Online]. Available: https://firstmonday.org/
detecting cyberbullying messages by using machine learn- article/view/2793/2431
ing approaches, namely, data collection, feature engineer- [14] C. Yang, R. Harkreader, J. Zhang, S. Shin, and G. Gu, ‘‘Analyzing
spammers’ social networks for fun and profit: A case study of cyber
ing, construction of cyberbullying detection model, and criminal ecosystem on twitter,’’ in Proc. 21st Int. Conf. World Wide Web,
evaluation of constructed cyberbullying detection models. 2012, pp. 71–80.
Several types of discriminative features that were used to [15] G. R. S. Weir, F. Toolan, and D. Smeed, ‘‘The threats of social networking:
Old wine in new bottles?’’ Inf. Secur. Tech. Rep., vol. 16, no. 2, pp. 38–43,
detect cyberbullying in online social networking sites were 2011.
also summarized. In addition, the most effective supervised [16] M. J. Magro, ‘‘A review of social media use in e-government,’’ Administ.
machine learning classifiers for classifying cyberbullying Sci., vol. 2, no. 2, pp. 148–161, 2012.
[17] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. de Jong, ‘‘Improving
messages in online social networking sites were identified. cyberbullying detection with user context,’’ in Advances in Information
One of the main contributions of current paper is the defini- Retrieval. Berlin, Germany: Springer, 2013, pp. 693–696.
tion of evaluation metrics to successfully identify the signif- [18] Y. Chen, Y. Zhou, S. Zhu, and H. Xu, ‘‘Detecting offensive language
icant parameter so the various machine learning algorithms in social media to protect adolescent online safety,’’ in Proc. Int. Conf.
Privacy, Secur., Risk Trust (PASSAT), Sep. 2012, pp. 71–80.
can be evaluated against each other. Most importantly we [19] V. S. Chavan and S. S. Shylaja, ‘‘Machine learning approach for detection
summarized and identified the important factors for detecting of cyber-aggressive comments by peers on social media network,’’ in
cyberbullying through machine learning techniques specially Proc. Int. Conf. Adv. Comput., Commun. Inform. (ICACCI), Aug. 2015,
pp. 2354–2358.
supervised learning. For this purpose, we have used accuracy, [20] W. Dong, S. S. Liao, Y. Xu, and X. Feng, ‘‘Leading effect of social media
precision recall and f-measure which gives us the area under for financial fraud disclosure: A text mining based analytics,’’ in Proc.
the curve function for modeling the behaviors in cyberbul- AMCIS, San Diego, CA, USA, 2016.
[21] M. S. Rahman, T.-K. Huang, H. V. Madhyastha, and M. Faloutsos,
lying. Finally, the main issues and open research challenges ‘‘FRAppE: Detecting malicious Facebook applications,’’ in Proc. 8th Int.
were described and discussed. Conf. Emerg. Netw. Exp. Technol., 2012, pp. 313–324.
[22] S. Abu-Nimeh, T. Chen, and O. Alzubi, ‘‘Malicious and spam posts in [47] R. Slonje, P. K. Smith, and A. Frisén, ‘‘The nature of cyberbullying,
online social networks,’’ Computer, vol. 44, no. 9, pp. 23–28, Sep. 2011. and strategies for prevention,’’ Comput. Hum. Behav., vol. 29, no. 1,
[23] B. Doerr, M. Fouz, and T. Friedrich, ‘‘Why rumors spread so quickly in pp. 26–32, 2013.
social networks,’’ Commun. ACM, vol. 55, no. 6, pp. 70–75, Jun. 2012. [48] E. Whittaker and R. M. Kowalski, ‘‘Cyberbullying via social media,’’
[24] J. W. Patchin and S. Hinduja, Words Wound: Delete Cyberbul- J. School Violence, vol. 14, no. 1, pp. 11–29, 2015.
lying and Make Kindness Go Viral. Golden Valley, MN, USA: [49] F. Sticca and S. Perren, ‘‘Is cyberbullying worse than traditional bullying?
Free Spirit Publishing, 2013. Examining the differential roles of medium, publicity, and anonymity for
[25] J. Cheng, C. Danescu-Niculescu-Mizil, and J. Leskovec, ‘‘Antisocial the perceived severity of bullying,’’ J. Youth Adolescence, vol. 42, no. 5,
behavior in online discussion communities,’’ in Proc. 9th Int. AAAI Conf. pp. 739–750, 2013.
Web Social Media, Apr. 2015. [50] S. Wen, J. Jiang, Y. Xiang, S. Yu, W. Zhou, and W. Jia, ‘‘To shut them up
[26] S. Liu, J. Zhang, and Y. Xiang, ‘‘Statistical detection of online drifting or to clarify: Restraining the spread of rumors in online social networks,’’
Twitter spam: Invited paper,’’ in Proc. 11th ACM Asia Conf. Comput. IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 12, pp. 3306–3316,
Commun. Secur., 2016, pp. 1–10. Dec. 2014.
[27] Z. Miller, B. Dickinson, W. Deitrick, W. Hu, and A. H. Wang, ‘‘Twitter [51] K. Van Royen, K. Poels, W. Daelemans, and H. Vandebosch, ‘‘Automatic
spammer detection using data stream clustering,’’ Inf. Sci., vol. 260, monitoring of cyberbullying on social networking sites: From techno-
pp. 64–73, Mar. 2014. logical feasibility to desirability,’’ Telematics Inform., vol. 32, no. 1,
[28] M. Jiang, S. Kumar, V. S. Subrahmanian, and C. Faloutsos, ‘‘KDD 2017 pp. 89–97, 2015.
tutorial: Data-driven approaches towards malicious behavior modeling,’’ [52] K. Van Royen, K. Poels, and H. Vandebosch, ‘‘Harmonizing freedom
Dimensions, vol. 19, p. 42, 2017. and protection: Adolescents’ voices on automatic monitoring of social
[29] S. Y. Jeong, Y. S. Koh, and G. Dobbie, ‘‘Phishing detection on Twitter networking sites,’’ Children Youth Services Rev., vol. 64, pp. 35–41,
streams,’’ in Proc. Pacific–Asia Conf. Knowl. Discovery Data Mining. May 2016.
Cham, Switzerland: Springer, 2016, pp. 141–153. [53] R. M. Kowalski, ‘‘Bullying in the digital age: A critical review and
[30] I. Frommholz, H. M. Al-Khateeb, M. Potthast, Z. Ghasem, M. Shukla, meta-analysis of cyberbullying research among youth,’’ Psychol. Bull.,
and E. Short, ‘‘On textual analysis and machine learning for cyberstalking vol. 140, no. 4, pp. 1073–1137, 2014.
detection,’’ Datenbank-Spektrum, vol. 16, no. 2, pp. 127–135, 2016. [54] Q. Li, ‘‘New bottle but old wine: A research of cyberbullying in schools,’’
[31] M. McCord and M. Chuah, ‘‘Spam detection on Twitter using traditional Comput. Hum. Behav., vol. 23, no. 4, pp. 1777–1791, 2007.
classifiers,’’ in Autonomic and Trusted Computing. Berlin, Germany: [55] R. S. Tokunaga, ‘‘Following you home from school: A critical review
Springer, 2011, pp. 175–186. and synthesis of research on cyberbullying victimization,’’ Comput. Hum.
[32] X. Chen, R. Chandramouli, and K. P. Subbalakshmi, ‘‘Scam detection in Behav., vol. 26, no. 3, pp. 277–287, May 2010.
Twitter,’’ in Data Mining for Service. Berlin, Germany: Springer, 2014, [56] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, ‘‘Data mining with big data,’’
pp. 133–150. IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, Jan. 2014.
[33] A. H. Wang, ‘‘Detecting spam bots in online social networking sites: [57] Y. Liu, J. Yang, Y. Huang, L. Xu, S. Li, and M. Qi, ‘‘MapReduce
A machine learning approach,’’ in Data and Applications Security and based parallel neural networks in enabling large scale machine learning,’’
Privacy XXIV. Berlin, Germany: Springer, 2010, pp. 335–342. Comput. Intell. Neurosci., vol. 2015, p. 1, Jan. 2015.
[34] X. Zheng, Z. Zeng, Z. Chen, Y. Yu, and C. Rong, ‘‘Detecting spammers [58] C. Wu, R. Buyya, and K. Ramamohanarao, ‘‘Big data analytics = machine
on social networks,’’ Neurocomputing, vol. 159, pp. 27–34, Jul. 2015. learning + cloud computing,’’ 2016, arXiv:1601.03115. [Online]. Avail-
[35] K. Dinakar, B. Jones, C. Havasi, H. Lieberman, and R. Picard, ‘‘Common able: https://arxiv.org/abs/1601.03115
sense reasoning for detection, prevention, and mitigation of cyberbully- [59] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, ‘‘A survey on deep learning for
ing,’’ ACM Trans. Interact. Intell. Syst., vol. 2, no. 3, p. 18, 2012. big data,’’ Inf. Fusion, vol. 42, pp. 146–157, Jul. 2018.
[36] M. Dadvar and F. De Jong, ‘‘Cyberbullying detection: A step toward a [60] A. Gandomi and M. Haider, ‘‘Beyond the hype: Big data concepts,
safer Internet yard,’’ in Proc. 21st Int. Conf. Companion World Wide Web, methods, and analytics,’’ Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144,
2012, pp. 121–126. 2015.
[37] S. O. Sood, J. Antin, and E. Churchill, ‘‘Using crowdsourcing to improve [61] M. Dadvar, F. D. Jong, R. Ordelman, and D. Trieschnigg, ‘‘Improved
profanity detection,’’ in Proc. AAAI Spring Symp., Wisdom Crowd, 2012, cyberbullying detection using gender information,’’ in Proc. 25th Dutch-
pp. 69–74. Belgian Inf. Retr. Workshop, 2012, pp. 1–3.
[38] P. Galán-García, J. G. de la Puerta, C. L. Gómez, I. Santos, and [62] H. Hosseinmardi, S. A. Mattson, R. I. Rafiq, R. Han, Q. Lv,
P. G. Bringas, ‘‘Supervised machine learning for the detection of troll and S. Mishra, ‘‘Detection of cyberbullying incidents on the insta-
profiles in Twitter social network: Application to a real case of cyberbully- gram social network,’’ 2015, arXiv:1503.03909. [Online]. Available:
ing,’’ in Proc. Int. Joint Conf. SOCO-CISIS-ICEUTE. Cham, Switzerland: https://arxiv.org/abs/1503.03909
Springer, 2014, pp. 419–428. [63] G. Forman, ‘‘An extensive empirical study of feature selection metrics for
[39] Q. Huang, V. K. Singh, and P. K. Atrey, ‘‘Cyber bullying detection using text classification,’’ J. Mach. Learn. Res., pp. 1289–1305, Mar. 2003.
social and textual analysis,’’in Proc. 3rd Int. Workshop Socially-Aware [64] P. D. Turney, ‘‘Thumbs up or thumbs down?: Semantic orientation applied
Multimedia, 2014, pp. 3–6. to unsupervised classification of reviews,’’ in Proc. ACL, 2002, pp. 417–
[40] R. M. Kowalski, Cyberbullying: Bullying in the Digital Age. Hoboken, 424.
NJ, USA: Wiley, 2012. [65] R. M. Tong, ‘‘An operational system for detecting and tracking opinions
[41] T. Nakano, T. Suda, Y. Okaie, and M. J. Moore, ‘‘Analysis of cyber in on-line discussion,’’ in Proc. Notes ACM SIGIR Workshop Oper. Text
aggression and cyber-bullying in social networking,’’ in Proc. IEEE 10th Classification, 2001.
Int. Conf. Semantic Comput. (ICSC), Feb. 2016, pp. 337–341. [66] V. Hatzivassiloglou and K. R. McKeown, ‘‘Predicting the semantic ori-
[42] G. S. O’Keeffe and K. Clarke-Pearson, ‘‘The impact of social media entation of adjectives,’’ in Proc. 8th Conf. Eur. Chapter Assoc. Comput.
on children, adolescents, and families,’’ Pediatrics, vol. 127, no. 4, Linguistics, 1997, pp. 174–181.
pp. 800–804, 2011. [67] S. Nadali, M. A. A. Murad, N. M. Sharef, A. Mustapha, and S. Shojaee,
[43] J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, ‘‘Learning from bullying ‘‘A review of cyberbullying detection: An overview,’’ in Proc. 13th Int.
traces in social media,’’ in Proc. Conf. North Amer. Chapter Assoc. Conf. Intell. Syst. Design Appl. (ISDA), Dec. 2013, pp. 325–330.
Comput. Linguistics, Hum. Lang. Technol., 2012, pp. 656–666. [68] A. Kontostathis, K. Reynolds, A. Garron, and L. Edwards, ‘‘Detecting
[44] R. M. Kowalski and S. P. Limber, ‘‘Psychological, physical, and aca- cyberbullying: Query terms and techniques,’’ in Proc. 5th Annu. ACM
demic correlates of cyberbullying and traditional bullying,’’ J. Adolescent Web Sci. Conf., 2013, pp. 195–204.
Health, vol. 53, no. 1, pp. S13–S20, 2013. [69] H. Chen, S. Mckeever, and S. J. Delany, ‘‘Harnessing the power of text
[45] H. Sampasa-Kanyinga, P. Roumeliotis, and H. Xu, ‘‘Associations between mining for the detection of abusive content in social media,’’ in Advances
cyberbullying and school bullying victimization and suicidal ideation, in Computational Intelligence Systems. Cham, Switzerland: Springer,
plans and attempts among Canadian schoolchildren,’’ PLoS ONE, vol. 9, 2017, pp. 187–205.
no. 7, 2014, Art. no. e102145. [70] K. Reynolds, A. Kontostathis, and L. Edwards, ‘‘Using machine learning
[46] S. Hinduja and J. W. Patchin, ‘‘Bullying, cyberbullying, and suicide,’’ to detect cyberbullying,’’ in Proc. 10th Int. Conf. Mach. Learn. Appl.
Arch. Suicide Res., vol. 14, no. 3, pp. 206–221, 2010. Workshops (ICMLA), Dec. 2011, pp. 241–244.
[71] V. Nahar, X. Li, and C. Pang, ‘‘An effective approach for cyberbullying [98] E. Raisi and B. Huang, ‘‘Cyberbullying identification using participant-
detection,’’ Commun. Inf. Sci. Manage. Eng., vol. 3, no. 5, p. 238, 2013. vocabulary consistency,’’ 2016, arXiv:1606.08084. [Online]. Available:
[72] C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes, B. Desmet, G. De Pauw, https://arxiv.org/abs/1606.08084
and V. Hoste, ‘‘Detection and fine-grained classification of cyberbullying [99] A. Squicciarini, S. Rajtmajer, Y. Liu, and C. Griffin, ‘‘Identification and
events,’’ in Proc. Int. Conf. Recent Adv. Natural Lang. Process. (RANLP), characterization of cyberbullying dynamics in an online social network,’’
2015, pp. 672–680. in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining, 2015,
[73] A. Mangaonkar, A. Hayrapetian, and R. Raje, ‘‘Collaborative detection pp. 280–285.
of cyberbullying behavior in Twitter data,’’ in Proc. IEEE Int. Conf. [100] R. M. Gray, ‘‘Entropy and information,’’ in Entropy and Information
Electro/Inf. Technol. (EIT), Dekalb, IL, USA, May 2015, pp. 611–616. Theory. New York, NY, USA: Springer, 1990, pp. 21–55.
[74] H. Sanchez and S. Kumar, ‘‘Twitter bullying detection,’’ [101] I. Qabajeh and F. Thabtah, ‘‘An experimental study for assessing email
Tech. Rep. UCSC ISM245, 2011. classification attributes using feature selection methods,’’ in Proc. 3rd Int.
[75] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan, ‘‘An introduc- Conf. Adv. Comput. Sci. Appl. Technol. (ACSAT), Dec. 2014, pp. 125–132.
[102] J. Benesty, J. Chen, Y. Huang, and I. Cohen, ‘‘Pearson correlation coef-
tion to MCMC for machine learning,’’ Mach. Learn., vol. 50, nos. 1–2,
ficient,’’ in Noise Reduction in Speech Processing. Berlin, Germany:
pp. 5–43, 2003.
Springer, 2009, pp. 1–4.
[76] M. W. Libbrecht and W. S. Noble, ‘‘Machine learning applications
[103] M. Hall, ‘‘Correlation-based feature selection for discrete and numeric
in genetics and genomics,’’ Nature Rev. Genet., vol. 16, no. 6,
class machine learning,’’ in Proc. 17th Int. Conf. Mach. Learn., 2000,
pp. 321–332, 2015.
pp. 359–366.
[77] P. Langley and H. A. Simon, ‘‘Applications of machine learning and rule [104] Z. Zheng, X. Wu, and R. Srihari, ‘‘Feature selection for text categorization
induction,’’ Commun. ACM, vol. 38, no. 11, pp. 54–64, 1995. on imbalanced data,’’ ACM SIGKDD Explor. Newslett., vol. 6, no. 1,
[78] Z. Ghahramani, ‘‘Probabilistic machine learning and artificial intelli- pp. 80–89, 2004.
gence,’’ Nature, vol. 521, no. 7553, pp. 452–459, May 2015. [105] D. H. Wolper and W. G. Macready, ‘‘No free lunch theorems for optimiza-
[79] P. Domingos, ‘‘A few useful things to know about machine learning,’’ tion,’’ IEEE Trans. Evol. Comput., vol. 1, no. 1, pp. 67–82, Apr. 1997.
Commun. ACM, vol. 55, no. 10, pp. 78–87, 2012. [106] A. L. Buczak and E. Guven, ‘‘A survey of data mining and machine
[80] Y. Liu, C. Kliman-Silver, and A. Mislove, ‘‘The tweets they are a- learning methods for cyber security intrusion detection,’’ IEEE Commun.
changin’: Evolution of twitter users and behavior,’’ in Proc. Int. AAAI Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart., 2016.
Conf. Weblogs Social Media (ICWSM), 2014, pp. 305–314. [107] T. Joachims, ‘‘Text categorization with support vector machines: Learn-
[81] S. González-Bailón, N. Wang, A. Rivero, J. Borge-Holthoefer, and ing with many relevant features,’’ in Proc. Eur. Conf. Mach. Learn. Berlin,
Y. Moreno, ‘‘Assessing the bias in samples of large online networks,’’ Germany: Springer, 1998.
Social Netw., vol. 38, pp. 16–27, Jul. 2014. [108] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, ‘‘A practical guide to support
[82] T. Cheng and T. Wicks, ‘‘Event detection using Twitter: A spatio-temporal vector classification,’’ Dept. Comput. Sci., Nat. Taiwan Univ., Tech. Rep.,
approach,’’ PLoS ONE, vol. 9, no. 6, p. e97807, 2014. 2003. [Online]. Available: http://www.csie. ntu.edu.tw/*cjlin/papers/
[83] A. Bellmore, A. J. Calvin, J.-M. Xu, and X. Zhu, ‘‘The five W’s of guide/guide.pdf
‘bullying’ on Twitter: Who, what, why, where, and when,’’ Comput. Hum. [109] S. Tong and D. Koller, ‘‘Support vector machine active learning
Behav., vol. 44, pp. 305–314, Mar. 2015. with applications to text classification,’’ J. Mach. Learn. Res., vol. 2,
[84] H. Margono, X. Yi, and G. K. Raikundalia, ‘‘Mining Indonesian cyber pp. 45–66, Nov. 2001.
bullying patterns in social networks,’’ in Proc. 37th Australas. Comput. [110] V. Vapnik, The Nature of Statistical Learning Theory. New York, NY,
Sci. Conf., vol. 147, Australian Computer Society, 2014, pp. 115–124. USA: Springer, 2013.
[111] B. E. Boser, I. M. Guyon, and V. N. Vapnik, ‘‘A training algorithm for
[85] R. Zhao, A. Zhou, and K. Mao, ‘‘Automatic detection of cyberbullying
optimal margin classifiers,’’ in Proc. 5th Annu. Workshop Comput. Learn.
on social networks based on bullying features,’’ in Proc. 17th Int. Conf.
Theory, 1992, pp. 144–152.
Distrib. Comput. Netw., 2016, Art. no. 43. [112] A. McCallum and K. Nigam, ‘‘A comparison of event models for naive
[86] Á. García-Recuero, ‘‘Discouraging abusive behavior in privacy- Bayes text classification,’’ in Proc. AAAI Workshop Learn. Text Catego-
preserving online social networking applications,’’ in Proc. 25th Int. rization, 1998, pp. 1–8.
Conf. Companion World Wide Web, International World Wide Web [113] H. Zhang, ‘‘The optimality of naive Bayes,’’ Tech. Rep., 2004.
Conferences Steering Committee, 2016, pp. 305–309. [114] H. Zhang, ‘‘The optimality of naive Bayes,’’ in Proc. IAAA, vol. 1, no. 2,
[87] Y. Anzai, Pattern Recognition and Machine Learning. Amsterdam, 2004, p. 3.
The Netherlands: Elsevier, 2012. [115] N. Bora, V. Zaytsev, Y.-H. Chang, and R. Maheswaran ‘‘Gang networks,
[88] E. Calvete, I. Orue, A. Estévez, L. Villardón, and P. Padilla, ‘‘Cyberbul- neighborhoods and holidays: Spatiotemporal patterns in social media,’’
lying in adolescents: Modalities and aggressors’ profile,’’ Comput. Hum. in Proc. Int. Conf. Social Comput. (SocialCom), Sep. 2013, pp. 93–101.
Behav., vol. 26, no. 5, pp. 1128–1135, 2010. [116] A. H. Wang, ‘‘Don’t follow me: Spam detection in Twitter,’’ in Proc. Int.
[89] H. Vandebosch and K. Van Cleemput, ‘‘Cyberbullying among young- Conf. Secur. Cryptogr. (SECRYPT), Jul. 2010, pp. 1–10.
sters: Profiles of bullies and victims,’’ New Media Soc., vol. 11, no. 8, [117] D. M. Freeman, ‘‘Using naive bayes to detect spammy names in social
pp. 1349–1371, 2009. networks,’’ in Proc. ACM Workshop Artif. Intell. Secur., 2013, pp. 3–12.
[90] R. Slonje and P. K. Smith, ‘‘Cyberbullying: Another main type of bully- [118] L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32,
ing?’’ Scand. J. Psychol., vol. 49, no. 2, pp. 147–154, 2008. 2001.
[91] K. R. Williams and N. G. Guerra, ‘‘Prevalence and predictors of Internet [119] D. R. Cutler, D. R. Cutler, T. C. Edwards, Jr., K. H. Beard, A. Cutler,
bullying,’’ J. Adolescent Health, vol. 41, no. 6, pp. S14–S21, 2007. K. T. Hess, J. Gibson, and J. J. Lawler, ‘‘Random forests for classification
[92] O. T. Arıcak, ‘‘Psychiatric symptomatology as a predictor of cyberbully- in ecology,’’ Ecology, vol. 88, no. 11, pp. 2783–2792, 2007.
[120] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda,
ing among University Students,’’ Eurasian J. Educ. Res., vol. 34, no. 1,
G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach,
p. 169, 2009.
D. J. Hand, and D. Steinberg, ‘‘Top 10 algorithms in data mining,’’ Knowl.
[93] I. Connolly and M. O’Moore, ‘‘Personality and family relations of
Inf. Syst., vol. 14, no. 1, pp. 1–37, 2008.
children who bully,’’ Personality Individual Differences, vol. 35, no. 3, [121] P. Soucy and G. W. Mineau, ‘‘A simple KNN algorithm for text catego-
pp. 559–567, 2003. rization,’’ in Proc. IEEE Int. Conf. Data Mining (ICDM), Nov./Dec. 2001,
[94] L. Corcoran, I. Connolly, and M. O’Moore, ‘‘Cyberbullying in Irish pp. 647–648.
schools: An investigation of personality and self-concept,’’ Irish J. Psy- [122] Z. Deng, X. Zhu, D. Cheng, M. Zong, and S. Zhang, ‘‘Efficient
chol., vol. 33, no. 4, pp. 153–165, 2012. kNN classification algorithm for big data,’’ Neurocomputing, vol. 195,
[95] K. Dinakar, R. Reichart, and H. Lieberman, ‘‘Modeling the detection pp. 143–148, Jun. 2016.
of textual cyberbullying,’’ in Proc. 5th Int. AAAI Conf. Weblogs Social [123] S. Dreiseitl, L. Ohno-Machado, H. Kittler, S. Vinterbo, H. Billhardt,
Media, 2011, pp. 11–17. and M. Binder, ‘‘A comparison of machine learning methods for the
[96] D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L. Edwards, diagnosis of pigmented skin lesions,’’ J. Biomed. Informat., vol. 34, no. 1,
‘‘Detection of harassment on Web 2.0,’’ in Proc. Content Anal. Web, 2009, pp. 28–36, 2001.
pp. 1–7. [124] D. W. Hosmer, Jr., S. Lemeshow, and R. X. Sturdivant, Applied Logistic
[97] M. Ptaszynski, P. Dybala, T. Matsuba, F. Masui, R. Rzepka, and K. Araki, Regression, vol. 398. Hoboken, NJ, USA: Wiley, 2013.
‘‘Machine learning and affect analysis against cyber-bullying,’’ in Proc. [125] R. Kohavi, ‘‘A study of cross-validation and bootstrap for accuracy esti-
36th AISB, 2010, pp. 7–16. mation and model selection,’’ in Proc. IJCAI, 1995, pp. 1137–1145.
[126] P. K. Smith, The Nature of School Bullying: A Cross-National Perspective. NAWSHER KHAN received the Ph.D. degree
London, U.K.: Psychology Press, 1999. from University Malaysia Pahang (UMP),
[127] J. J. Dooley, J. Pyżalski, and D. Cross, ‘‘Cyberbullying versus face-to-face Malaysia, in 2013. He was a Postdoctoral Research
bullying: A theoretical and conceptual review,’’ Zeitschrift Psychologie/J. Fellow with the University of Malaya (UM),
Psychol., vol. 217, no. 4, pp. 182–188, 2009. Malaysia, in 2014. In 2005, he was appointed
[128] F. Morstatter, J. Pfeffer, H. Liu, and K. M. Carley, ‘‘Is the sam-
in National Database and Registration Author-
ple good enough? Comparing data from twitter’s streaming API
with Twitter’s firehose,’’ 2013, arXiv:1306.5204. [Online]. Available: ity (NADRA) under the Interior Ministry of Pak-
https://arxiv.org/abs/1306.5204 istan. In 2008, he has worked in National High-
[129] From IHML (I Hate My Life) to MOS (Mum Over Shoulder): Why This ways Authority (NHA). He has served at Abdul
Guide to Cyber-Bullying Slang May Save Your Child’s Life, Dailymail, Wali Khan University Mardan, Pakistan, as an
USA, 2014. [Online]. Available: https://www.dailymail.co.uk/news/ Assistant Professor for 3 years, from 2014 to 2017. He is currently serving
article-2673678/Why-guide-cyber-bullying-slang-save-childs-life- as an Associate Professor and the Director of Research Center, College of
From-IHML-I-hate-life-Mos-mum-shoulder.html Computer Science, King Khalid University, Abha, Saudi Arabia. He has
[130] J. N. Navarro and J. L. Jasinski, ‘‘Going Cyber: Using routine activities published more than 50 articles in various International Journals and Confer-
theory to predict cyberbullying experiences,’’ Sociol. Spectr., vol. 32, ence proceedings. His research interests include big data, cloud computing,
no. 1, pp. 81–94, 2012. data management, distributed systems, scheduling, replication, and sensor
[131] C. Peersman, W. Daelemans, and L. Van Vaerenbergh, ‘‘Predicting age
and gender in online social networks,’’ in Proc. 3rd Int. Workshop Search
networks.
Mining User-Generated Contents, 2011, pp. 37–44.
[132] D. Nguyen, R. Gravel, D. Trieschnigg, and T. Meder, ‘‘‘How old do you
think I Am?’ A study of language and age in Twitter,’’ in Proc. 7th Int.
AAAI Conf. Weblogs Social Media, AAAI Press, 2013, pp. 439–448. GHULAM MURTAZA is currently pursuing the
[133] V. Balakrishnan, ‘‘Cyberbullying among young adults in Malaysia: The Ph.D. degree with the Faculty of Computer Sci-
roles of gender, age and Internet frequency,’’ Comput. Hum. Behav., ence and Information Technology, University of
vol. 46, pp. 149–157, May 2015. Malaya, Malaysia. He is also an Assistant Profes-
[134] N. Maciá, E. Bernadó-Mansilla, A. Orriols-Puig, and T. K. Ho, ‘‘Learner sor with Sukkur IBA University, Sukkur, Pakistan.
excellence biased by data set selection: A case for data characterisation He is currently on study leave to pursue his Ph.D.
and artificial data sets,’’ Pattern Recognit., vol. 46, no. 3, pp. 1054–1066,
He has published several articles in well-reputed
Mar. 2013.
[135] F. Sebastiani, ‘‘Machine learning in automated text categorization,’’ ACM databases. His research interests include machine
Comput. Surveys, vol. 34, no. 1, pp. 1–47, 2002. learning, deep learning, digital image processing,
[136] V. Korde and C. N. Mahender, ‘‘Text classification and classifiers: big data, and information retrieval.
A survey,’’ Int. J. Artif. Intell. Appl., vol. 3, no. 2, p. 85, 2012.
[137] M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, ‘‘Do we
need hundreds of classifiers to solve real world classification problems?’’
J. Mach. Learn. Res, vol. 15, no. 1, pp. 3133–3181, 2014. HENRY FRIDAY NWEKE received the B.Sc.
[138] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, degree in computer science from Ebonyi State
‘‘SMOTE: Synthetic minority over-sampling technique,’’ J. Artif. Intell. University, Nigeria, and the M.Sc. degree in com-
Res., vol. 16, no. 1, pp. 321–357, 2002. puter science from the University of Bedford-
[139] X.-Y. Liu and Z.-H. Zhou, ‘‘The influence of class imbalance on cost-
sensitive learning: An empirical study,’’ in Proc. 6th Int. Conf. Data shire, U.K. He is currently pursuing the Ph.D.
Mining (ICDM), Dec. 2006, pp. 970–974. degree with the Department of Information Sys-
[140] N. Japkowicz and M. Shah, Evaluating Learning Algorithms: A Classifi- tems, Faculty of Computer Science and Infor-
cation Perspective. Cambridge, U.K.: Cambridge Univ. Press, 2011. mation Technology, University of Malaya, Kuala
[141] D. Boyd and K. Crawford, ‘‘Critical questions for big data: Provocations Lumpur, Malaysia. His research interests include
for a cultural, technological, and scholarly phenomenon,’’ Inf., Commun. machine learning, deep learning, biomedical sen-
Soc., vol. 15, no. 5, pp. 662–679, 2012. sor analytics, human activity recognition, multi-sensor fusion, cloud com-
[142] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521, puting, wireless sensor technologies, and emerging technology.
no. 7553, p. 436, 2015.
MOHAMMED ALI AL-GARADI received the IHSAN ALI received the M.Sc. degree from
Ph.D. degree from the Faculty of Computer Hazara University Manshera, Pakistan, in 2005,
Science and Information Technology, University and the M.S. degree in computer system engineer-
of Malaya, Kuala Lumpur, Malaysia. He has ing from the GIK Institute, in 2008. He is currently
published several articles in academic journals pursuing the Ph.D. degree with the Faculty of
indexed in well-reputed databases such as ISI and Computer Science and Information Technology,
Scopus. His research interests include cybersecu- University of Malaya. He has more than five
rity, online social networking, machine learning years teaching and research experience in different
text mining, deep learning, and the IoT. country, including Saudi Arabia, USA, Pakistan,
and Malaysia. He has served as a Technical Pro-
gram Committee Member for the IWCMC 2017, AINIS 2017, Future 5V
2017, and also an Organizer of Special session on fog computing in Future
MOHAMMAD RASHID HUSSAIN received the 5V 2017. He has published more than 30 papers in the international journals
Ph.D. degree in information technology from and conferences. His research interests include wireless sensor networks,
Babasaheb Bhimrao Ambedkar Bihar University, underwater sensor networks, sensor cloud, fog computing, and the IoT. He is
India. He is currently an Assistant Professor with also an Active Member of the IEEE, ACM, the International Association of
the College of Computer Science, King Khalid Engineers (IAENG), and the Institute of Research Engineers and Doctors
University, Abha, Saudi Arabia. His research inter- (the IRED). He is also a Reviewer of Computers & Electrical Engineering,
ests include educational development & review, KSII Transactions on Internet and Information Systems, Mobile Networks
educational data mining, cloud intelligence, and and Applications, the International Journal of Distributed Sensor Networks,
mobile cloud computing and optimizations. the IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, Computer
Networks, IEEE ACCESS, FGCS, and the IEEE Communication Magazine.
GHULAM MUJTABA received the master’s HASAN ALI KHATTAK received the B.Sc. degree
degree in computer science from FAST National in computer science from the University of
University, Karachi, Pakistan, and the Ph.D. Peshawar, Peshawar, Pakistan, in 2006, the mas-
degree from the Faculty of Computer Science and ter’s degree in information engineering from the
Information Technology, University of Malaya, Politecnico di Torino, Torino, Italy, in 2011, and
Kuala Lumpur, Malaysia. He has received the gold the Ph.D. degree in electrical and computer engi-
medal for the masterâĂŹs degree. He has been an neering from the Politecnico di Bari, Bari, Italy,
Assistant Professor with Sukkur IBA University, in 2015. He has been serving as an Assistant
Sukkur, Pakistan, since 2006. He has vast expe- Professor of computer science, since 2016. He is
rience in teaching and research. Before he joined involved in a number of funded research projects
Sukkur IBA University, he was with a well-known software house in Karachi in the Internet of Things, semantic web, and fog computing while exploring
for four years. He has also published several articles in academic journals Ontologies, Web Technologies using Contiki OS, NS 2/3, and Omnet++
indexed in well-reputed databases such as ISI and Scopus. His research frameworks. His current research interests include distributed systems, web
interests include machine learning, online social networking, text mining, of things and vehicular ad hoc networks, and data and social engineering for
deep learning, and information retrieval. smart cities.