Predicting Cyberbullying On Social Media in The Big Data Era Using Machine Learning Algorithms Review of Literature and Open Challenges PDF

This document reviews literature on predicting cyberbullying on social media using machine learning algorithms. It discusses how social media provides both opportunities for aggressive behavior through large online networks, but also data to detect such behavior. The paper comprehensively surveys cyberbullying prediction models, identifying key issues. It focuses on feature selection and machine learning algorithms for prediction, highlighting remaining challenges for future research.

Uploaded by

Bharat Ane Nenu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

229 views

Predicting Cyberbullying On Social Media in The Big Data Era Using Machine Learning Algorithms Review of Literature and Open Challenges PDF

Uploaded by

Bharat Ane Nenu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Received April 21, 2019, accepted May 14, 2019, date of publication May 22, 2019, date of current

version June 11, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2918354

Predicting Cyberbullying on Social Media in the

Big Data Era Using Machine Learning Algorithms:
Review of Literature and Open Challenges
MOHAMMED ALI AL-GARADI1 , MOHAMMAD RASHID HUSSAIN2 , NAWSHER KHAN2 ,
GHULAM MURTAZA1,3 , HENRY FRIDAY NWEKE 1 , IHSAN ALI 1 , GHULAM MUJTABA1,3 ,
HARUNA CHIROMA 4 , HASAN ALI KHATTAK 5 , AND ABDULLAH GANI 1
1 Facultyof Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia
2 Collageof Computer Science, King Khalid University, Abha 61421, Saudi Arabia
3 Department of Computer Science, Sukkur IBA University, Sukkur 65203, Pakistan
4 Department of Computer Science, Federal College of Education (Technical), Gombe 234, Nigeria
5 Department of Computer Science, COMSATS University Islamabad, Islamabad 45000, Pakistan

Corresponding authors: Mohammed Ali Al-Garadi ([email protected]), Ihsan Ali ([email protected]), and
Ghulam Mujtaba ([email protected])
This work was supported in part by the Deanship of Scientific Research, King Khalid University, through Research Group Project under
Grant R.G.P. 1/166/40, and in part by the University of Malaya Postgraduate Research under Grant PG035-2016A.

ABSTRACT Prior to the innovation of information communication technologies (ICT), social interactions
evolved within small cultural boundaries such as geo spatial locations. The recent developments of com-
munication technologies have considerably transcended the temporal and spatial limitations of traditional
communications. These social technologies have created a revolution in user-generated information, online
human networks, and rich human behavior-related data. However, the misuse of social technologies such as
social media (SM) platforms, has introduced a new form of aggression and violence that occurs exclusively
online. A new means of demonstrating aggressive behavior in SM websites are highlighted in this paper. The
motivations for the construction of prediction models to fight aggressive behavior in SM are also outlined.
We comprehensively review cyberbullying prediction models and identify the main issues related to the
construction of cyberbullying prediction models in SM. This paper provides insights on the overall process
for cyberbullying detection and most importantly overviews the methodology. Though data collection and
feature engineering process has been elaborated, yet most of the emphasis is on feature selection algorithms
and then using various machine learning algorithms for prediction of cyberbullying behaviors. Finally, the
issues and challenges have been highlighted as well, which present new research directions for researchers
to explore.

INDEX TERMS Big data, cyberbullying, cybercrime, human aggressive behavior, machine learning, online
social network, social media, text classification.

I. INTRODUCTION of applications. Machine learning algorithms provide an

Machine or deep learning algorithms help researchers under- opportunity to effectively predict and detect negative forms of
stand big data [1]. Abundant information on humans and human behavior, such as cyberbullying [3]. Big data analysis
their societies can be obtained in this big data era, but can uncover hidden knowledge through deep learning from
this acquisition was previously impossible [2]. One of the raw data [1]. Big data analytics has improved several appli-
main sources of human-related data is social media (SM). cations, and forecasting the future has even become possible
By applying machine learning algorithms to SM data, we can through the combination of big data and machine learning
exploit historical data to predict the future of a wide range algorithms [4].
An insightful analysis of data on human behavior and
The associate editor coordinating the review of this manuscript and interaction to detect and restrain aggressive behavior involves
approving it for publication was Kathiravan Srinivasan. multifaceted angles and aspects and the merging of theorems
2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 70701
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
M. A. Al-Garadi et al.: Predicting Cyberbullying on Social Media

and techniques from multidisciplinary and interdisciplinary having well-connected social networks, provide users with
fields. The accessibility of large-scale data produces new liberty and flexibility to post and write on their platforms.
research questions, novel computational methods, interdis- Therefore, users can easily demonstrate aggressive behav-
ciplinary approaches, and outstanding opportunities to dis- ior [9], [10]. SM websites have become dynamic social
cover several vital inquiries quantitatively. However, using communication websites for millions of users worldwide.
traditional methods (statistical methods) in this context is Data in the form of ideas, opinions, preferences, views, and
challenging in terms of scale and accuracy. These meth- discussions are spread among users rapidly through online
ods are commonly based on organized data on human social communication. The online interactions of SM users
behavior and small-scale human networks (traditional social generate a huge volume of data that can be utilized to study
networks). Applying these methods to large online social human behavioral patterns [11]. SM websites also provide an
networks (OSNs) in terms of scale and extent causes sev- exceptional opportunity to analyze patterns of social interac-
eral issues. On the one hand, the explosive growth of OSNs tions among populations at a scale that is much larger than
enhances and disseminates aggressive forms of behavior by before.
providing platforms and networks to commit and propagate Aside from renovating the means through which people are
such behavior. On the other hand, OSNs offer important influenced, SM websites provide a place for a severe form of
data for exploring human behavior and interaction at a large misbehavior among users. Online complex networks, such
scale, and these data can be used by researchers to develop as SM websites, changed substantially in the last decade,
effective methods of detecting and restraining misbehavior and this change was stimulated by the popularity of online
and/or aggressive behavior. OSNs provide criminals with communication through SM websites. Online communica-
tools to perform aggressive actions and networks to commit tion has become an entertainment tool, rather than serving
misconduct. Therefore, methods that address both aspects only to communicate and interact with known and unknown
(content and network) should be optimized to detect and users. Although SM websites provide many benefits to users,
restrain aggressive behavior in complex systems. cyber criminals can use these websites to commit differ-
The remainder of this paper is organized as follows. ent types of misbehavior and/or aggressive behavior. The
Subsection I.A presents an overview of aggressive behav- common forms of misbehavior and/or aggressive behav-
ior in SM, and a new means in which SM websites are ior on OSN sites include cyberbullying [3], phishing [12],
utilized by users to commit aggressive behavior is high- spam distribution [13], malware spreading [14], and
lighted. I.B summarizes the motivations for constructing pre- cyberbullying [15].
diction models to combat aggressive behavior in SM. I.C Users utilize SM websites to demonstrate different types
highlight the importance of constructing cyberbullying pre- of aggressive behavior. The main involvement of SM web-
diction models. I.D, provide the methodology followed in sites in aggressive behavior can be summarized in two
this paper. Section 2 presents a comprehensive review of points [9], [15].
cyberbullying prediction models for SM websites from data 1) [I.] OSN communication is a revolutionary trend that
collection to evaluation. Section 3 discusses the main issues exploits Web 2.0. Web 2.0 has new features that allow
related to the construction of cyberbullying prediction mod- users to create profiles and pages, which, in turn,
els. Research challenges, which present new research direc- make users active. Unlike Web 1.0 that limits users
tions, are discussed in Section 4, and the paper is concluded in to being passive readers of content only, Web 2.0 has
Section 5. expanded capabilities that allow users to be active as
they post and write their thoughts. SM websites have
A. RISE OF AGGRESSIVE BEHAVIOR ON SM four particular features, namely, collaboration, par-
Prior to the innovation of communication technologies, social ticipation, empowerment, and timeliness [16]. These
interaction evolved within small cultural boundaries, such characteristics enable criminals to use SM websites
as locations and families [5]. The recent development of as a platform to commit aggressive behavior with-
communication technologies exceptionally transcends the out confronting victims [9], [15]. Examples of aggres-
temporal and spatial limitations of traditional communi- sive behavior are committing cyberbullying [17]–[19]
cation. In the last few years, online communication has and financial fraud [20], using malicious applica-
shifted toward user-driven technologies, such as SM web- tions [21], and implementing social engineering and
sites, blogs, online virtual communities, and online sharing phishing [12].
platforms. New forms of aggression and violence emerge 2) [II.] SM websites are structures that enable infor-
exclusively online [6]. The dramatic increase in negative mation exchange and dissemination. They allow users
human behavior on SM, with high increments in aggres- to effortlessly share information, such as messages,
sive behavior, presents a new challenge [6], [7]. The advent links, photos, and videos [22]. However, because SM
of Web 2.0 technologies, including SM websites that are websites connect billions of users, they have become
often accessed through mobile devices, has completely trans- delivery mechanisms for different forms of aggressive
formed functionality on the side of users [8]. SM charac- behavior at an extraordinary scale. SM websites help
teristics, such as accessibility, flexibility, being free, and cybercriminals reach many users [23].