0% found this document useful (0 votes)
7 views

2022 Using ML and Deep Learning

Uploaded by

Prathy usha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

2022 Using ML and Deep Learning

Uploaded by

Prathy usha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

SN Computer Science (2022) 3: 401

https://doi.org/10.1007/s42979-022-01308-5

ORIGINAL RESEARCH

An Application to Detect Cyberbullying Using Machine Learning


and Deep Learning Techniques
Mitushi Raj1 · Samridhi Singh1 · Kanishka Solanki1 · Ramani Selvanambi1

Received: 25 April 2022 / Accepted: 1 July 2022 / Published online: 26 July 2022
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2022

Abstract
Nowadays, a lot of people indulge themselves in the world of social media. With the current pandemic scenario, this engage-
ment has only increased as people often rely on social media platforms to express their emotions, find comfort, find like-
minded individuals, and form communities. With this extensive use of social media comes many downsides and one of the
downsides is cyberbully. Cyberbullying is a form of online harassment that is both unsettling and troubling. It can take many
forms, but the most common is a textual format. Cyberbullying is common on social media, and people often end up in a
mental breakdown state instead of taking action against the bully. On the majority of social networks, automated detection of
these situations necessitates the use of intelligent systems. We have proposed a cyberbullying detection system to address this
issue. In this work, we proposed a deep learning framework that will evaluate real-time twitter tweets or social media posts
as well as correctly identify any cyberbullying content in them. Recent studies has shown that deep neural network-based
approaches are more effective than conventional techniques at detecting cyberbullying texts. Additionally, our application
can recognise cyberbullying posts which were written in English, Hindi, and Hinglish (Multilingual data).

Keywords Cyberbullying · Stack word embeddings · Deep learning model · Multilingual · Real-time tweets

Introduction is little to no work done in aiding the situation of increased


cyberbullying in a country like India where most Hindi
Many pieces of research work that are done in this area using speaking people use English text, comprising of Hindi words
various machine learning and deep learning techniques have written in Latin script, and many people using Hindi text
yielded significant results in detecting and preventing cyber- written in Devanagari script, we plan to proceed to combat
bully. However, most works have included mostly English this problem by incorporating such data into our suggested
data for training and testing purposes, while a few included learning algorithm so that cyberbullying can be detected in
native languages like Bangla, Arabic, and Urdu. As there real-time tweets.
Data have been collected from three sources and then
combined. One contains English texts, the other contains
This article is part of the topical collection “Predictive Artificial Hindi texts, and the last one contains a combination of Hindi
Intelligence for Cyber Security and Privacy” guest edited by Hardik and English texts. As we have acquired these three datasets
A. Gohel, S. Margret Anouncia and Anthoniraj Amalanathan.
from different sources, compiling them in their original form
* Ramani Selvanambi will not be compatible because of the difference in classifi-
[email protected] cation labels. To proceed with these datasets, we will first
Mitushi Raj have to adopt one single classification technique, and for
[email protected] this purpose, 0-1 classifier was used, which will tell us if the
Samridhi Singh text contains content of cyberbullying or not, thus making
[email protected] it a black and white area to train our model and eliminat-
Kanishka Solanki ing any gray possibilities. Data cleaning is essential before
[email protected] classification to remove the symbols, spacy tokenizer URLs,
emails, stopwords, white space, numbers, punctuation, stem-
1
School of Computer Science and Engineering, Vellore ming, lemmatization, and single tokens.
Institute of Technology, Vellore, Tamilnadu 632014, India

SN Computer Science
Vol.:(0123456789)
401 Page 2 of 13 SN Computer Science (2022) 3: 401

Word embedding is an natural language processing meth- subsequent section provides the highlights or the contribu-
odology for mapping words or phrases from a lexicon to a tion made in this work. “Literature Review” provides an
corresponding vector of real numbers, which is then used to incisive review on the related works on existing cyberbully
find word predictions and semantics. These word embed- detection techniques based on machine leaning and deep
dings are beneficial when training on data from contextual learning, and the subsequent section provides the detailed
information because they indicate implicit associations explanation of the proposed model. “Results and Discus-
between words. The main reason for choosing word embed- sion” describes the results obtained and the last section is
ding is that they do not require costly annotation and can concluding in nature.
be produced from vast, unannotated corpora that are freely
available. After that, pre-trained embeddings can be used in
tasks that need tiny amounts of labeled data. Aim and Motivation
In the collection of word embeddings, stacked embed-
dings are one of the most important topics. To combine During the time of COVID-19, people’s activity on social
distinct embeddings, a stack embedding method is utilized. media has increased tremendously, since social media plat-
A stack embedding strategy, for instance, is utilized to use forms provide us with excellent communication platforms,
both conventional and contextual string embeddings. Stack but it also makes young people more vulnerable to online
embeddings allow to mix and match and the combination of threats and harassment. Because of many active users on
embeddings has been established to produce the best results social media networks, cyberbullying has become a global
[1]. problem. The pattern suggests that cyberbullying on social
Deep learning (DL) is a subset of Machine Learning media is on the rise and according to recent studies, it is
(ML) that can be applied to a wide range of applications. becoming more prevalent among teenagers. The ability to
Deep learning models are preferred in our work over tra- recognize potentially dangerous communications is critical
ditional machine learning models because they have been to successful prevention, and the information overload on
shown to be more successful than machine learning (ML) the Internet needs intelligent systems that can automatically
or statistical methods and have a higher number of neuron identify potential threats. This research work aims to develop
layers than ML. Deep learning algorithms are proven to be a model that can accurately detect cyberbully in real-time
highly effective at text classification, with state-of-the-art tweets.
outcomes on a variety of classic academic benchmark issues. Cyberbullying has been one of the main prevailing issues
Deep learning networks have the advantage of improving since the use of technology and internet services has been
their models as the size of the data grows. Because hybrid made accessible to the general public. After the COVID-
techniques have been demonstrated to be potential models 19 pandemic hit, affecting the daily life of most people
for minimizing sentiment inaccuracies on more detailed and forcing them to isolate themselves from social groups
training data, we explored utilizing a hybrid model instead and communities, this usage of social media has only been
of a single layer model [2]. increasing. With the increased use of digital platforms for
On all types of datasets, hybrid models outperformed sin- educational reasons by most youths, an increase in cyber-
gle models in terms of cyberbullying detection, especially bully incidents seems unavoidable, as pupils who are bullied
when deep learning methods were combined. Our research are more inclined to cyberbully. Bullying and discrimination
looks at how our proposed hybrid model reacts to various based on race, religion, sex, caste, and creed can cause an
forms obtained from multiple languages. The combination adverse effect on the victim’s mental health, eventually lead-
of multiple models was studied and validated in this study. ing to anxiety, depression, and even an increase in suicide
We looked at the relationship between models and their cases.
increased abilities to extract traits, store past data and nodes, Cyberbullying has increased by 70% in just a few months,
and classify text [3]. according to a group that records online bullying and har-
On multilingual Twitter datasets, we will use the CNN- assment cases. Human monitoring becomes fruitless in the
BiLSTM model with stacked word embedding. The com- face of such huge amounts of data on the internet, as there
bined models boosted the accuracy of cyberbullying detec- is no scalable and effective means to trace out cyberbullies
tion, according to the findings of our research. After training and tackle the problem. Thus, there is a need to tackle this
the model, we plan to create a Twitter-like app using python problem on an automated level that is fast, efficient, and
and the web tech stack [4], in which real-time data will be accurate for the sake of social well-being.
fed into the system, and whether the input text is cyberbul-
lying or non-cyberbullying will be predicted.
This paper is organized as follows. The following section
provides the aim and motivation of this research work. The

SN Computer Science
SN Computer Science (2022) 3: 401 Page 3 of 13 401

Highlights neural networks and word embeddings, methodology was


proposed by [2] for detecting cyberbullying messages
The objective of this work is to build a CNN-BiLSTM in text data. The classifier’s performance is improved
deep learning detection model that can detect cyberbul- by stacking Bert and Glove embeddings together. As a
lying content in tweets posted by users in three different result, the model outperforms most classic machine learn-
languages in real time data. Subsequently we have also ing approaches, including Support Vector Machine and
developed a website much like a social media platform to Logistic Regression. A single and double ensenble-based
portray the applicability of this model. voting model was created by [3] that can divide items into
two categories: offensive and non-offensive. On a dataset
• A multilingual dataset is built, apart from English(Latin retrieved from Twitter, several machine learning classi-
script) and Hindi(Devanagari script), Hinglish (Latin fiers, three ensemble models, two feature extraction algo-
script- where Hindi words are written using English rithms, and countless n-gram evaluations were chosen.
alphabets) is also included, which makes up most of Logistic Regression and Bagging Ensemble Model Clas-
the tweets from users who tweet in Hindi. sifiers were shown to be the most effective in detecting
• A CNN-BILSTM model is proposed for cyberbullying cyberbullying in the study, however their proposed SLE
detection because an ensemble deep learning model and DLE voting classifiers outperformed them.
with multiple layers outperforms single-layer neural Substantial preprocessing was performed by [4] on
network models. To optimize the model even further, Roman Urdu micro text, including the creation of a Roman
stacking of two-word embeddings (Glove + fastext) Urdu slang-phrase dictionary and the mapping of slangs
will be done to enhance the model’s performance. following tokenization. The unstructured data was then
• This proposed model works for on real-time data and processed further to deal with encoded text formats and
the web portal that is created will be like a clone of metadata/non-linguistic elements. Extensive tests using
social media websites like Twitter, where one can post RNN-LSTM, RNN-BiLSTM, and CNN models were under-
their tweets, and the change will be reflected in the taken after the preprocessing stage. To give the compari-
feed. This site will assess if the posted tweet contains son analysis, the performance and accuracy of models were
cyberbullying content by running it through our model. assessed using several metrics. On Roman Urdu text, RNN-
LSTM and RNN-BiLSTM performed best. For cyberbully
detection, a BiGRU-CNN sentiment classification model
was presented by [5] which consists of a BiGRU layer,
Literature Review attention mechanism layer, CNN layer, complete connec-
tion layer, and classification layer. The attention mechanism
Most research papers have extracted data from a single layer has a firmer grasp of representative words and can
source and done a comparative study on various machine better allocate weight to them. To train and test the model,
learning or deep learning techniques in combination with the Kaggle text data set is used, as well as the emoji data
different word vectors or feature extraction techniques set gathered from social media. The model’s classification
and drawn out the best combination. Only a handful of accuracy is higher than that of the traditional model, accord-
research was found where the work focused on optimiz- ing to the findings.
ing the detection model by either building ensemble ml A pretrained BERT model was used by [6] which is built
models or layering up different feature preprocessing tech- on a novel deep learning network with the technique of
niques. Even in those researches, they focused on testing transformer to detect cyberbullying on social media plat-
the model on the dataset and there was no real time detec- forms. For classification, the model employs a single linear
tion involved. Most works done in this area have included layer of a neural network, which can be substituted by deep
mostly English data, while a few included native languages learning network models like CNN and RNN. The model
like Bangla, Arabic, and Urdu. has undergone extensive training based on two social media
OCDD (Optimized Twitter Cyberbullying Detection datasets, one of which is public. The first dataset is of small
based on Deep Learning) technique was used by [1] , size (Formspring dataset), whereas the second is of greater
an innovative solution to feature extraction difficulties. size (Wikipedia dataset). The model produces better and
OCDD depicts a tweet as a series of word vectors rather more consistent results for the latter without the require-
than extracting features from tweets and feeding them to ment for oversampling.
a classifier. Deep learning will be employed in the clas- To detect cyberbullying in Bangla and Romanized
sification phase, together with a metaheuristic optimi- Bangla literature, as well as to give a comparison of the
zation technique for parameter adjustment. Using deep two systems, Machine learning and deep learning algo-
rithms were used by [7]. The detection was carried out

SN Computer Science
401 Page 4 of 13 SN Computer Science (2022) 3: 401

using 5000 Bangla and 7000 Romanized Bangla texts, as memory, further extending to distributed memory concat-
well as a combined dataset. Before they were trained, the enated and distributed memory means.
datasets were preprocessed with NLP techniques and fea- A neural network architecture with a self-attention model
tures retrieved. Multinomial Naive Bayes, SVM, Logistic trained on a balanced dataset combining three different data-
Regression, and XGBoost were among the machine learn- sets from different sources was adopted by [13]. The self-
ing models utilized. Deep Learning algorithms including attention model follows an overall standard encoder-decoder
CNN, LSTM, BLSTM, and GRU were used. According architecture that replaces recurrent layers with multi-headed
to the findings, CNN outperformed all other algorithms self-attention and is tested on parameters like precision,
for the dataset containing Bangla texts, with an accuracy recall, and F1 scores achieved state-of-the-art accuracy
of 84%. In the other two datasets, the Multinomial Naive and even outperformed the BLSTM model with attention.
Bayes machine learning technique fared best, with 84% An ensemble model involving feature analysis techniques
accuracy in the Romanized Bangla dataset and 80 percent was developed by [14] for Naive Bayes-SVM and XGBoost
accuracy in the combined dataset. models and word embeddings for deep learning approaches
Using data from Twitter, Wikipedia, and Formspring, like CNN, Bi-GRU, and attention networks using a majority
a working implementation of an application that detects voting-based ensemble where the prediction from individual
cyberbullying across multiple social media platforms, was models counts as votes for class labels with a fairly based
proposed by [8]. They have used LSTM layers to detect dataset.
cyberbully. Using the backpropagation method, these mod- A classification model based on the attention techniques
els were trained. The cross-entropy loss function is used in was proposed by [15] to analyze Arabic comments, includ-
combination with the Adam optimizer. These results were ing different Arabic dialects. Inspired by human-like learn-
better than the traditional approaches. ing, the proposed attention model dynamically pays attention
Four machine learning models: LR, Gaussian Naive to certain parts of the input that aids in achieving results
Bayes, RNN, and BERT were used by [9]. For final classifi- and ignores everything irrelevant. The model is built with
cation, these are combined with a neural network. The sug- an embedding layer consisting of two LSTM layers with a
gested model’s and other accessory classification models’ dense layer and an output layer to compare impact with and
suitability was assessed using numerous assessment meas- without the recurrent networks.
ures to determine how well the model can perform. Accu- Many hybrid approaches to the test on a variety of data-
racy, Precision, and Recall were the most commonly uti- sets from various disciplines were put by [16] to detect sen-
lized measures for measuring efficacy. The model works well timental analysis. Eight textual tweets and review datasets
regardless of the social media sentence input, according to from various fields are used to develop and test hybrid deep
the results of training and testing. Though the models’ clas- sentiment analysis learning models that integrate Long
sification accuracy is good, they do have significant limits Short-Term Memory (LSTM) networks, Convolutional Neu-
that might be overcome by including a variety of additional ral Networks (CNN), and Support Vector Machine (SVM).
techniques. The effectiveness and efficacy of deep learning Each technique was evaluated based on its dependability and
systems in detecting cyberbullying were discussed by [10]. calculation time. The CNN-BiLSTM model outperformed
They worked on four deep learning models: Bidirectional all the baseline deep learning models. A cyberbully detec-
Long Short-Term Memory (BLSTM), Gated Recurrent Units tion model was proposed by [17] to improve manual moni-
(GRU), Long Short-Term Memory (LSTM), and Recurrent toring for cyberbullying on social media. In this paper OCR
Neural Network (RNN). In comparison to the RNN, LSTM, was used to analyze image character to determine the impact
and GRU models, the BLSTM model achieved good accu- of image-based cyberbullying on an individual basis, which
racy and F1-measure scores. was further tested on a dummy system. To create predictive
A cyberbullying detection framework was developed by models for cyberbullying detection, supervised learning-
[11] using reinforcement learning and combining various based techniques commonly use classifiers such as SVM
natural language processing techniques. The developed and Nave Bayes.
framework leverages human-like behavioral patterns, uses A Deep Convolutional Neural Network (DCNN) was cre-
delayed rewards, and outperforms other models with a highly ated by [18] to create an automated system as its nearly hard
dynamic and populated dataset achieving 89.5% accuracy on to physically filter any information from such a large amount
the dataset. Different machine learning (Logistic Regression, of incoming traffic in the form of tweets With the use of con-
Linear SVC, Multinomial Naive Bayes, and Bernoulli Naive volution, the proposed DCNN model uses the Twitter text
Bayes) and deep learning techniques (CNN models incor- with GloVe embedding vector to capture the semantics of
porating ) were applied by [12], by incorporating various the tweets and outperformed existing models. a strategy for
n-gram ranges to detect cyberbullying on Twitter. Vector predicting hateful speech on social media networks using a
approaches included distributed bag of words and distributed hybrid of natural language processing and machine learning

SN Computer Science
SN Computer Science (2022) 3: 401 Page 5 of 13 401

techniques was described by [19]. A powerful natural lan- of pre-processing procedures. They also used machine learn-
guage processing optimization ensemble deep learning strat- ing optimization techniques to fine tune the neural network
egy is used to analyze the collected data (KNLPEDNN). model’s parameters. They used 5 fold cross validation to
The methodology employs an active learning environment to train and evaluate each network. The CNN-LSTM had the
detect hate speech on social media platforms by classifying highest recall of 83.46% followed by CNN than BiLSTM
the text into neutral, offensive, and hate language. with attention and the last one was BiLSTM.
A comparative study of pre-existing deep learning The problem of detection of hateful comments was solved
methods was presented by [20] to detect Hate Speech and by [25], using text mining and deep learning models built
Offensive Language in textual data. These methods include using LSTM to detect and classify cyberbullying speech and
CNN, RNN, LSTM and BERT models. The authors also filter it out for us. The input layer has input in the form of
investigated the effect of class weighting technique on the sequences which are basically numbers that represent text.
enactment of the deep learning methods. It was found that The embedding layer takes each word from the input layer
the pre-trained BERT model outperformed other models in and produces appropriate word vectors. This is fed to the
case of both unweighted and weighted classification, likely LSTM model which is further connected to the dense layer.
because of the property of BERT to measure the relation Each neuron in each layer is strongly linked to the layers
between sentences by treating them as whole units. above and below it. The final model has accuracy of 94.94%
A model for detecting tweets that contain racist text and is able to detect cyberbullying or not.
was proposed by [21] by performing the sentiment analy-
sis of tweets. A stacked ensemble deep learning model
is assembled by combining GRU, CNN and RNN, called
Gated Convolutional Recurrent- Neural Networks (GCR- Proposed Methodology
NN). The performance of the model is optimized by setting
different structures in terms of the number of layers, loss This proposed model is a prototype for a cyberbullying
function, optimizer and number of neurons, etc. Proposed detection system which can be used for social media plat-
model showed substantially better performance than those forms for automated checking and control of cyberbully. The
of machine learning models. Embedding word representa- data for training is cleansed and preprocessed before being
tions and deep-learning approaches were employed by [22] fed into stacked word embeddings. Then the CNN-BiLSTM
to recognize and classify toxic speech. A Twitter corpora deep learning model is trained to perform better than regu-
was used to conduct binary and multi-class classification, lar deep learning models trained standalone. The model is
and two main approaches were investigated: extracting word saved for its use in the website. The website is similar to any
embeddings accompanied by utilizing a DNN classifier and social media platform where the user has access to many
fine-tuning the pre-trained BERT classifier. BERT fine-tun- features. Admin will have privileges to view content status.
ing was found to be substantially more effective. Even though this work is a prototype, it is still a step towards
Numerous approaches with varying data ratios at about getting a better result.
the same time were compared by [23]. As a result, when the
data is small, machine learning produces good results. When • Dataset Analysis - The acquired labeled data in 3 lan-
they used more data for the trials, they got better outcomes guages, i.e., Hindi, English and Hinglish, from numerous
by employing deep learning. When compared to the other open-source dataset sources go towards the text preproc-
approaches they examined, BiRNN produces the best results. essing stage which involves Data Cleaning, Data Inte-
They have used ML and DL models to detect hate speech gration, Data Transformation, Data Reduction and Data
using RNN. They have two datasets, one has more data than discretization.
the other one. They have strived to extract hate speech from • Data Cleaning - Any irrelevant attributes, empty cells
tweets to discover the best way for improving accuracy in and NaN values are removed. The data is also formatted
several methods. Result shows that ML works well with so that the data type across the dataset is uniform.
small data while DL works well with large datasets. Even if • Data Transformation - As the three datasets are acquired
their strategy outperforms previous models, they must con- from different sources, compiling them in their original
sider the sort of data set that will be used in the future. form will not be compatible because of the difference in
Efficiency of different neural network models such as classification labels. To proceed with these datasets, it is
CNN, BiLstm, BiLstm with attention mechanisms were important to get rid of different label sets and using one
combined with CNNLSTM models and evaluated by [24]. single classification technique, 0-1 classifier, which will
They trained these networks on a labeled dataset of YouTube tell us if the text contains content of cyberbullying or not,
comments. They employed Arabic word representations to thus making it a black and white area to train our model
depict the remarks after running the dataset through a variety and eliminating any gray possibilities.

SN Computer Science
401 Page 6 of 13 SN Computer Science (2022) 3: 401

• Data Integration - All the datesets are integrated to one A CNN BiLSTM is a bidirectional LSTM and CNN
csv file that is used for further text preprocessing. framework that is concatenated. It trains both character-level
• Data Discretization - In this stage the data was tokenized, and word-level characteristics in the initial formulation for
i.e., splitted the sentence into words for easy evaluation classification and prediction. The character-level properties
of data. are induced using the CNN layer. To derive a new feature
• Data Reduction - In this text preprocessing stage, certain vector using per-character feature vectors such as charac-
things are removed such urls, special characters, ‘@’ and ter embeddings and (preferably) character type, the model
stopped words from tweets and converted all the text into includes a convolution and a max pooling layer for each
lower case. Further, stemming is performed, which is word.
transforming a word to its root form, and lemmatization, Combining different variation yields multiple hybrid
which reduces the words to a word existing in the lan- approaches that we have tested:
guage. This stage helps in reducing data into its simplest Glove + Fasttext ⟶ CNN ⟶ BiGRU ⟶ adam
possible form. 1. (dense, conv1d = relu;out = sigmoid), maxlen = 25
Glove + Fasttext ⟶ CNN ⟶ BiLSTM ⟶ adam
After the preprocessing of data is completed, we move 2. (dense, conv1d = relu;out = sigmoid), maxlen = 25
towards the building and training of the model stage. As Glove + Fasttext ⟶ BiLSTM ⟶ BiGRU ⟶ adam
shown in Fig. 1, for building our CNN-BiLSTM model, 3. (dense, conv1d = relu;out = sigmoid), maxlen = 25
Word Embedding approach is used as it solves various Glove + Fasttext ⟶ CNN ⟶ BiGRU ⟶ adam
issues that the simple one-hot vector encodings have. Most (dense, conv1d = relu;out = sigmoid), maxlen = 25,
crucial thing is that word embeddings boost generalization 4. trainable = True
and performance. We will stack 2 word embeddings which Glove + Fasttext ⟶ CNN ⟶ BiLSTM ⟶ adam
are GloVe and FastText. A combination of embeddings has
(dense, conv1d = relu;out = sigmoid), maxlen = 25,
been established to produce the best results. After the stack-
5. trainable = True
ing of word embedding, CNN-BiLSTM model is built. As
Glove + Fasttext ⟶ BiLSTM ⟶ BiGRU ⟶ adam
a hybrid technique has shown the potential of reducing sen-
timental errors on increasingly complex data. An ensemble (dense, conv1d = relu;out = sigmoid), maxlen = 25
ML model is also built, in which feature extraction technique ⟶ Spatialdropout1D, GlobalMaxpooling1D,
and unigram feature engineering are used. The proposed 6. GlobalAveragePooling1D
CNN-BiLSTM model is compared with an ensemble ML
model to draw out a comparison on the accuracy. The CNN-BiLSTM model that is to be used has the fol-
lowing features:
CNN‑BiLSTM Architecture
• Stacked Word Embedding: A distributed representation
A single machine learning or deep learning model can pre- of words where different words that have a similar mean-
dict the outcome rather accurately when applied to specific ing (based on their usage) also have a similar representa-
domains, but each has its own set of advantages and down- tion. Two of such word embeddings are glove and fas-
sides. LSTM usually produces superior results, but it takes text and stacking of these two embeddings provide better
longer to process than CNN, and CNN has fewer hyperpa- results
rameters and requires less supervision. In the meanwhile, • Convolutional Model: A feature extraction model that
the LSTM is more accurate for long sentences but takes learns to extract salient features from documents repre-
longer to analyze. Because RNN has a major gradient loss sented using a word embedding.
issue when processing sequences, the perception of nodes in • Fully Connected Model: The interpretation of extracted
the front decreases as nodes get further back. To tackle the features in terms of a predictive output.
problem of gradient vanishing, BiLSTM is used. It solves
the problem of fixed sequence to sequence prediction. RNN Therefore, the model is comprised of the following elements
has a limitation where both input and output have the same as shown in Fig. 2:
size. So it fails in case of machine translation where input Input layer t — The length of input sequences is defined
and output have different sizes or case of text summarization by the input layer.
where input and output have a different length, which is not Embedding layer — 100-dimensional real-valued repre-
the case with BiLSTM. The concept of combining two (or sentations and an embedding layer set to the vocabulary’s
more) methods is offered as a way of implementing the ben- size.
efits of both while also addressing some of the drawbacks of
existing techniques.

SN Computer Science
SN Computer Science (2022) 3: 401 Page 7 of 13 401

Fig. 1  Proposed model

SN Computer Science
401 Page 8 of 13 SN Computer Science (2022) 3: 401

Fig. 2  CNN-BiLSTM architecture

Conv1D layer — Using 32 filters and a kernel size cor- Kernel sizes— 3
responding to the amount of words to read simultaneously. Number of filters— 100
MaxPooling1D — Merge the result of the convolutional Dropout rate— 0.5
layer with this layer. Weight regularization (L2) — 3
Flatten layer — For concatenation and to convert the Batch Size — 128
three-dimensional output to two-dimensional Update Rule — Adam
Transfer function — Rectified Linear.

SN Computer Science
SN Computer Science (2022) 3: 401 Page 9 of 13 401

The Adam optimizer is computationally more efficient, tweets, see real time updates of feed and chat with friends.
requires slight memory, is invariant to diagonal resizing of The admin feature of the app allows the admin to trace
gradients, and it is well suited for problems with a lot of cyberbullying comments and block the users for a certain
data/parameters. We will perform the best parameter using amount of time.
grid search and 10-fold cross validation. Now, Convolutional Features User Register: The system allows new users to
Neural Network (CNN) models are built to classify encoded register themselves on the app.
documents as either cyberbullying or non-cyberbullying. User account: The system allows the user to create their
Now, the CNN model can be defined as follows as shown accounts by providing their emails and setting a password.
in Fig. 2: The user can also set a username for their account as well as
view their profiles.
• One Conv layer with 100 filters, kernel size 3, and relu Admin account: The system allows admin to have separate
activation function; login with which they can perform admin related activities.
• One MaxPool layer with pool size = 2; Posting Feature: The system allows user to post their tweet
• One Dropout layer after flattened; and tag theirs friends.
• Optimizer: Adam View Feed: The system allows user to view their feed and
• Loss function: binary cross-entropy (suited for binary gives admin privileges to see all kind of tweets.
classification problem) Blocking Feature: The system has special admin features
• Dropout layers are used to solve the problem of overfit- where they allow admin to suspend user accounts if they find
ting and bring generalization into the model. As a result, their comments cyberbully.
in hidden layers, it’s best to keep the dropout value near
0.5. Results and Discussion

WebAPP Architecture The performance comparison of different configurations of


activation’s and optimizer for a baseline LSTM model on the
As shown in Fig. 3, the web-based system is a social media basis of accuracy is done. Accuracy refers to the proportion of
prototype where users will be able to use the developed pro- correct predictions made by the model.
ject like a social media platform where they can post the

Fig. 3  WebAPP architecture

SN Computer Science
401 Page 10 of 13 SN Computer Science (2022) 3: 401

Table 1  Comparison of activations and optimizer on baseline models Correct Prediction


Accuracy = .
Sl. no Model Name Hyperparameter Accuracy Total Prediction

1 LSTM Activation-sigmoid; Optimizer- 0.87 Adam and RMSProp as optimizers are used and for activa-
Adam tion layers ReLU and Sigmoid are used thus making it a total
2 LSTM Activation-relu; Optimizer-Adam 0.85 of 4 combinations.
3 LSTM Activation-sigmoid; Optimizer- 0.86 The activation function chosen has a significant effect
RMSProp on the neural network’s capabilities and performance, and
4 LSTM Activation-relu; Optimizer- 0.85 various activation functions may be utilized in various por-
RMSProp
tions of the model. The sigmoid function converts any input
into a number between 0 and 1. The function sigmoid gives
the result near to zero for small values, and a value close to
one for high values. Sigmoid is the same as a two-element
Softmax with the second element set to zero. As a result, the
sigmoid is commonly used in binary classification.
The Adam optimizer is computationally more efficient,
requires slight memory, is invariant to diagonal resizing
of gradients, and it is well suited for problems with a lot
of data/parameters, whereas the RMSProp optimization

Fig. 4  Activation and optimizer Comparison on baseline models

Table 2  Comparison of activations and optimizer on baseline models


Sl. no Model Name Accuracy Before Accuracy
Hypertuning After Hyper-
tuning

1 CNN+BIGRU​ 0.8905 0.9369


Fig. 6  Hybrid models after hyper-parameter tuning
2 CNN+BILSTM 0.9135 0.9512
3 BILSTM+BIGRU​ 0.85330 0.8853

Fig. 7  Posting tweets


Fig. 5  Hybrid models before hyper-parameter tuning

SN Computer Science
SN Computer Science (2022) 3: 401 Page 11 of 13 401

the deeper the Learning Algorithm is), the more information


is condensed and lost at every layer, and this magnifies at each
level, resulting in significant data loss throughout.
ReLU is non-linear and, unlike the sigmoid function, has
no back - propagation algorithm problems. Additionally, for
a bigger artificial neural network, the speed of creating mod-
els based on ReLU is much faster than using Sigmoids. This
is why ReLU is utilized instead of sigmoid for the hidden
layer activation.
After the experimentation with various models like CNN
and RNN models like LSTM and GRU the result obtained
is shown in Table 2. We also tweaked with hyperparam-
eters and did a comparative analysis on what worked well
for the dataset. After the comparison, it is evident that the
CNN-BiLSTM model has the best performance out of all the
Fig. 8  Updated feed various models tested as shown in Fig 6. After putting all
the layers together, the model is fitted over our data for 10
epochs and achieved an accuracy of about 98%.
algorithm keeps the sections under control the entire time
Figure 7 shows that the user can post a tweet and it will
because of the decay rate, which makes RMSProp faster
automatically get updated in the feed as shown in Fig. 8.
than Adam. Adam obtains his speed from momentum,
Admin can see tweet status as shown in Fig. 9 and also has
while RMSProp gives him the capability to adjust gradients
the privilege to block users who post cyberbullying tweets
in various directions. It’s powerful because of the mix of
as it can be seen in Fig. 10.
the two. Whereas RMSProp just uses the second moment
and speeds it up with a decay rate, Adam employs both
first and second moments and is usually the best option
(Figs. 4, 5).
Conclusion
The combination of Sigmoid activation with Adam opti-
The model for automatically detecting cyberbullying text
mizer provided best results among the four configurations
on multilingual data is addressed and proposed in this work.
as seen in Table 1.
Solving this issue is critical for controlling social media
A neural network’s design is incomplete without activa-
material in multiple languages and protecting users from the
tion functions. The hidden layer’s activation function deter-
negative impacts of toxic comments like verbal assaults and
mines how well the model understands the training data. The
offensive language. The performance of our various models
kind of predictions the model can produce is determined by
of neural networks is examined. The CNN-BiLSTM net-
the activation function used in the output layer.
work has the best accuracy. While the CNN alone can only
Other than Sigmoid, ReLU is also employed as the activa-
train local characteristics from word n-grams, with its LSTM
tion layer for our CNN-BiLSTM model’s hidden layer. The
layer, the CNN-BiLSTM can also learn global features and
main reason for this is that the Sigmoid function and its deriv-
long-term dependencies. Future research will look at both
ative are not complex and therefore help to reduce the amount
picture and video elements to see if cyberbullying can be
of time needed to design models; nevertheless, there is a sub-
detected automatically.
stantial disadvantage of information loss because of the deriv-
ative’s small range. As a result, the more layers (or perhaps

Fig. 9  Tweet status

SN Computer Science
401 Page 12 of 13 SN Computer Science (2022) 3: 401

Fig. 10  Admin feature to block users

Declarations on Electronics and Sustainable Communication Systems (ICESC),


pp. 1096–1100 (2020). IEEE
7. Ahmed, M.T., Rahman, M., Nur, S., Islam, A., Das, D.: Deploy-
Conflict of interest The authors declare that they have no conflict of
ment of machine learning and deep learning algorithms in detect-
interest.
ing cyberbullying in bangla and romanized bangla text: A com-
parative study. In: 2021 International Conference on Advances
in Electrical, Computing, Communication and Sustainable Tech-
References nologies (ICAECT), pp. 1–10 (2021). IEEE
8. Mahat, M.: Detecting cyberbullying across multiple social media
platforms using deep learning. In: 2021 International Conference
1. Al-Ajlan, M.A., Ykhlef, M.: Optimized twitter cyberbullying
on Advance Computing and Innovative Technologies in Engineer-
detection based on deep learning. In: 2018 21st Saudi Computer
ing (ICACITE), pp. 299–301 (2021). IEEE
Society National Computer Conference (NCC), pp. 1–5 (2018).
9. Jain, N., Hegde, A., Jain, A., Joshi, A., Madake, J.: Pseudo-con-
IEEE
ventional approach for cyberbullying and hate-speech detection.
2. Mahlangu, T., Tu, C.: Deep learning cyberbullying detection using
In: 2021 International Conference on Advances in Computing,
stacked embbedings approach. In: 2019 6th International Confer-
Communication, and Control (ICAC3), pp. 1–8 (2021). IEEE
ence on Soft Computing & Machine Intelligence (ISCMI), pp.
10. Iwendi, C., Srivastava, G., Khan, S., Maddikunta, P.K.R.: Cyber-
45–49 (2019). IEEE
bullying detection solutions based on deep learning architectures.
3. Alam, K.S., Bhowmik, S., Prosun, P.R.K.: Cyberbullying detec-
Multimedia Systems, 1–14 (2020)
tion: an ensemble based machine learning approach. In: 2021
11. Aind, A.T., Ramnaney, A., Sethia, D.: Q-bully: a reinforcement
Third International Conference on Intelligent Communication
learning based cyberbullying detection framework. In: 2020 Inter-
Technologies and Virtual Mobile Networks (ICICV), pp. 710–715
national Conference for Emerging Technology (INCET), pp. 1–6
(2021). IEEE
(2020). IEEE
4. Dewani A, Memon MA, Bhatti S. Cyberbullying detection:
12. Ketsbaia, L., Issac, B., Chen, X.: Detection of hate tweets using
advanced preprocessing techniques & deep learning architecture
machine learning and deep learning. In: 2020 IEEE 19th Interna-
for roman urdu data. J Big Data. 2021;8(1):1–20.
tional Conference on Trust, Security and Privacy in Computing
5. Luo, Y., Zhang, X., Hua, J., Shen, W.: Multi-featured cyberbully-
and Communications (TrustCom), pp. 751–758 (2020). IEEE
ing detection based on deep learning. In: 2021 16th International
13. Pradhan, A., Yatam, V.M., Bera, P.: Self-attention for cyberbul-
Conference on Computer Science & Education (ICCSE), pp.
lying detection. In: 2020 International Conference on Cyber Situ-
746–751 (2021). IEEE
ational Awareness, Data Analytics and Assessment (CyberSA),
6. Yadav, J., Kumar, D., Chauhan, D.: Cyberbullying detection
pp. 1–6 (2020). IEEE
using pre-trained bert model. In: 2020 International Conference
14. Sahana, B., Sandhya, G., Tanuja, R., Ellur, S., Ajina, A.: Towards
a safer conversation space: Detection of toxic content in social

SN Computer Science
SN Computer Science (2022) 3: 401 Page 13 of 13 401

media (student consortium). In: 2020 IEEE Sixth International 21. Lee E, Rustam F, Washington PB, El Barakaz F, Aljedaani W,
Conference on Multimedia Big Data (BigMM), pp. 297–301 Ashraf I. Racism detection by analyzing differential opinions
(2020). IEEE through sentiment analysis of tweets using stacked ensemble gcr-
15. Berrimi, M., Moussaoui, A., Oussalah, M., Saidi, M.: Attention- nn model. IEEE Access. 2022;10:9717–28.
based networks for analyzing inappropriate speech in arabic text. 22. d’Sa, A.G., Illina, I., Fohr, D.: Bert and fasttext embeddings
In: 2020 4th International Symposium on Informatics and Its for automatic detection of toxic speech. In: 2020 International
Applications (ISIA), pp. 1–6 (2020). IEEE Multi-Conference on:“Organization of Knowledge and Advanced
16. Dang, C.N., Moreno-García, M.N., De la Prieta, F.: Hybrid deep Technologies”(OCTA), pp. 1–5 (2020). IEEE
learning models for sentiment analysis. Complexity 2021 (2021) 23. Jiang, L., Suzuki, Y.: Detecting hate speech from tweets for senti-
17. Yuvaraj N, Chang V, Gobinathan B, Pinagapani A, Kannan S, ment analysis. In: 2019 6th International Conference on Systems
Dhiman G, Rajan AR. Automatic detection of cyberbullying using and Informatics (ICSAI), pp. 671–676 (2019). IEEE
multi-feature based artificial intelligence with deep decision tree 24. Mohaouchane, H., Mourhir, A., Nikolov, N.S.: Detecting offensive
classification. Comput Electr Eng. 2021;92: 107186. language on arabic social media using deep learning. In: 2019
18. Roy PK, Tripathy AK, Das TK, Gao X-Z. A framework for hate Sixth International Conference on Social Networks Analysis,
speech detection using deep convolutional neural network. IEEE Management and Security (SNAMS), pp. 466–471 (2019). IEEE
Access. 2020;8:204951–62. 25. Dubey, K., Nair, R., Khan, M.U., Shaikh, S.: Toxic comment
19. Al-Makhadmeh Z, Tolba A. Automatic hate speech detection detection using lstm. In: 2020 Third International Conference
using killer natural language processing optimizing ensemble on Advances in Electronics, Computers and Communications
deep learning approach. Computing. 2020;102(2):501–22. (ICAECC), pp. 1–8 (2020). IEEE
20. Yadav, Y., Bajaj, P., Gupta, R.K., Sinha, R.: A comparative study
of deep learning methods for hate speech and offensive language Publisher's Note Springer Nature remains neutral with regard to
detection in textual data. In: 2021 IEEE 18th India Council Inter- jurisdictional claims in published maps and institutional affiliations.
national Conference (INDICON), pp. 1–6 (2021). IEEE

SN Computer Science

You might also like