0% found this document useful (0 votes)
9 views12 pages

Enhancing_Hate_Speech_Detection_in_the_Digital_Age_A_Novel_Model_Fusion_Approach_Leveraging_a_Comprehensive_Dataset

This document presents a novel model fusion approach for enhancing hate speech detection using a comprehensive dataset of 0.45 million comments from 18 sources. The study employs CNN and BiLSTM models with an attention mechanism, achieving an accuracy of 89%, precision of 0.88, and recall of 0.91, outperforming existing models. The research addresses the challenges of detecting hate speech across diverse social media platforms and aims to foster safer online communities.

Uploaded by

23mp2109
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views12 pages

Enhancing_Hate_Speech_Detection_in_the_Digital_Age_A_Novel_Model_Fusion_Approach_Leveraging_a_Comprehensive_Dataset

This document presents a novel model fusion approach for enhancing hate speech detection using a comprehensive dataset of 0.45 million comments from 18 sources. The study employs CNN and BiLSTM models with an attention mechanism, achieving an accuracy of 89%, precision of 0.88, and recall of 0.91, outperforming existing models. The research addresses the challenges of detecting hate speech across diverse social media platforms and aims to foster safer online communities.

Uploaded by

23mp2109
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 26 December 2023, accepted 13 February 2024, date of publication 19 February 2024, date of current version 23 February 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3367281

Enhancing Hate Speech Detection in the Digital


Age: A Novel Model Fusion Approach
Leveraging a Comprehensive Dataset
WAQAS SHARIF 1 , SAIMA ABDULLAH 1 , SAMAN IFTIKHAR 2, (Member, IEEE),
DANIAH AL-MADANI2 , AND SHAHZAD MUMTAZ1,3
1 Faculty of Computing, The Islamia University of Bahawalpur, Bahawalpur 6300, Pakistan
2 Faculty of Computer Studies, Arab Open University, Riyadh 11681, Saudi Arabia
3 School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, AB24 3FX Scotland, U.K.
Corresponding author: Waqas Sharif ([email protected])
This work was supported and funded by Arab Open University (AOU) research fund No. (AOUKSA-524008).

ABSTRACT In the era of digital communication, social media platforms have experienced exponential
growth, becoming primary channels for information exchange. However, this surge has also amplified the
rapid spread of hate speech, prompting extensive research efforts for effective mitigation. These efforts
have prominently featured advanced natural language processing techniques, particularly emphasizing deep
learning methods that have shown promising outcomes. This article presents a novel approach to address
this pressing issue, combining a comprehensive dataset of 18 sources. It includes 0.45 million comments
sourced from various digital platforms spanning different time frames. There were two models utilized to
address the diversity in the data and leverage distinct strengths found within deep learning frameworks:
CNN and BiLSTM with an attention mechanism. These models were tailored to handle specific subsets
of the data, allowing for a more targeted approach. The unique outputs from both models were then fused
into a unified model. This methodology outperformed recent models, showcasing enhanced generalization
capabilities even when tested on the largest and most diverse dataset. Our model achieved an impressive
accuracy of 89%, while maintaining a high precision of 0.88 and recall of 0.91.

INDEX TERMS Hate speech detection, deep learning, natural language processing, CNN, BiLSTM, model
fusion.

I. INTRODUCTION rapidly, causing harm to society. Hate speech on social


Hate speech refers to language or expression that attacks media platforms can manifest in various forms, such as
an individual or community based on characteristics such posts, comments, and messages intended to intimidate,
as race, caste, ethnicity, religion, gender, sexual orientation, harass, or humiliate others. It’s worth noting that these
nationality, etc., [1], [2], [3], is a growing concern in our platforms have implemented significant measures to combat
increasingly digital world. Social media platforms, such hate speech, including enforcing policies and utilizing
as Twitter and Facebook, have become a breeding ground machine learning algorithms to detect and remove abusive
for its proliferation. These platforms enable individuals to content. Nevertheless, the problem of hate speech on social
express their opinions and engage in discussions, leveraging media remains a significant challenge that requires ongoing
their extensive and diverse user base. However, they have attention and action.
also transformed into spaces where hate speech can spread Detecting hate speech manually on social media presents
an enormous challenge due to the sheer volume of content
The associate editor coordinating the review of this manuscript and generated. Hate speech can take subtle and diverse forms,
approving it for publication was Maria Chiara Caschera . making human detection without advanced algorithms excep-

2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 27225
W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

tionally difficult. Relying solely on human moderators for Additionally, our study introduces a pioneering deep-learning
precise and timely identification of hate speech is imprac- model designed for high generalizability. This model
tical, necessitating the use of advanced natural language effectively handles the diverse dataset, resulting in marked
processing (NLP) algorithms and machine learning models. improvements in hate speech detection across multiple social
The NLP community has recently made significant progress media platforms.
in developing hate speech identification systems, with
machine learning and particularly deep learning techniques
demonstrating superior effectiveness [1], [4], [5], [6], [7], [8]. II. LITERATURE REVIEW
Deep learning is particularly valuable in swiftly identifying In recent years, detecting hate speech in online text has
hate speech, as it analyzes language and behavioral patterns become a significant focus in NLP research. Initially, studies
linked to hate speech. Moreover, deep learning continuously relied on conventional machine learning algorithms like
improves over time by integrating new data, offering a SVM, KNN, Random Forest, and Decision Tree, using
sustainable solution. various feature types (for example, syntactic, semantic,
Utilizing NLP techniques to identify hate speech on social sentiment, and lexicon) to identify hate speech [2]. However,
media presents a crucial yet technically complex challenge. the rise of deep neural networks has prompted extensive
This complexity arises due to the nuances inherent in exploration into their effectiveness for NLP-related prob-
language, where hate speech may not always be expressed lems [12]. Notably, Recurrent Neural Networks (RNNs) and
through explicit aggressive, offensive, profane, or derogatory Convolutional Neural Networks (CNNs) have emerged as
terms. Conversely, the absence of such terms does not prominent options and are frequently assessed for hate speech
guarantee the absence of hate speech [9]. The task is further detection.
compounded by the diverse language use and contexts across Researchers often choose different deep learning models
different platforms, making the development of effective tailored to the text’s characteristics. For shorter texts where
detection models a formidable task. The ever-evolving capturing detailed context matters less, CNNs have become
landscape of language and slang on social media adds popular due to their adeptness at grasping local patterns
layers of complexity to hate speech detection. Moreover, across various text classification tasks [13], [14], [15]. On the
social media text often demonstrates high sparsity, featuring other hand, when dealing with longer text sequences that
numerous elements with limited occurrences, including noisy demand a better grasp of semantic features and context,
components lacking useful information. This sparsity can RNNs like Long Short-Term Memory (LSTM) networks
impede the creation of precise models and lead to overfitting. and Bidirectional LSTMs (BiLSTMs) shine [16], [17], [18].
Additionally, those propagating hate speech constantly seek These models efficiently capture contextual information
new ways to evade detection, increasing the complexity and word dependencies, proving advantageous in tasks like
of automatic detection [10]. Further complicating matters sentiment analysis and document classification.
is the limited availability of data on social media due to In the realm of hate speech detection, Warner et al. [19]
the enforcement of hate speech codes of conduct [11]. conducted a seminal study concentrating on identifying
This scarcity poses a significant hurdle for deep learning anti-Semitic language as a form of hate speech. Alshalan
techniques, which rely on extensive labeled data for accurate and Al-Khalifa [20] delved into classifying Arabic hate
model training. tweets using CNNs, RNNs, and bidirectional encoder repre-
The challenges inherent in detecting hate speech across sentations from transformers (BERT). Employing word2vec
social media platforms underscore the critical need for a as embedding layers via the Continuous Bag of Words
robust and adaptable deep learning model. Traditionally, (CBOW) method, their findings revealed that BERT didn’t
hate speech detection relied on limited datasets from perform well for this task, resulting in an approximate
specific platforms and time periods. Our innovative approach 10% drop in performance, while the CNN achieved an
encompasses diverse data sources, enabling our model to f-score of 0.79. Another notable exploration by Waseem
learn language nuances and contextual variations adeptly. and Hovey [21] targeted hate speech on Twitter, particularly
Leveraging extensive labeled data from multiple sources, this racism and sexism. They investigated features, including
deep learning model confronts the intricacies of hate speech user demographics, lexical usage, geographic information,
detection, effectively handling subtleties, variations, sparsity, and character n-grams. Their study emphasized that using
and the adaptability of hate speech propagators. character n-grams with a maximum length of four proved
This study tackles substantial challenges and makes to be the most effective approach. Furthermore, integrating
significant contributions. A key contribution is a comprehen- gender as an additional feature led to a slight improvement in
sive dataset consolidating 18 diverse datasets, representing the obtained results.
various social media platforms and different time spans, Vashistha and Zubiaga [7] examined six publicly available
including platforms with varying word limits. With this datasets to identify hate speech in English and Hindi text.
model, our aim is to pioneer a more comprehensive and They constructed a logistic regression-based model, incor-
effective solution in combating hate speech, thereby fostering porating Term Frequency - Inverse Document Frequency
safer and more inclusive online communities. (TF-IDF) and Part-of-Speech (POS) features. This base

27226 VOLUME 12, 2024


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

model’s performance was compared with a hierarchical feature extraction methods such as TF/IDF, bag of words, and
neural network, which utilized several CNN filters and the word length. Decision tree classifiers notably emerged as the
BiLSTM model. The base model achieved an accuracy rate most effective, achieving a remarkable 97% accuracy in hate
of 85%, while the neural network attained an accuracy speech detection.
rate of 83%. In Khan et al. [22], a proposed neural Del et al. [29] introduced SocialHaterBert, a model tailored
network architecture called BiCHAT combines BERT-based for hate speech identification in English and Spanish tweets,
embedding, BiLSTM, and deep CNN with a hierarchical showcasing improvements over the earlier HaterBert model.
attention mechanism. The attention layers will apply on word Employing BertForSequenceClassification and ‘BERT’ for
and sentence levels, allowing focus on the most important hate speech classification, the model demonstrated perfor-
words and phrases in the text while ignoring irrelevant mance gains ranging from 3% to 27% compared to HaterBert.
information. The proposed approach was evaluated on several Additionally, the authors proposed a method to construct
popular Twitter hate speech datasets and performed better a hate speech user graph using user profile attributes,
than the base model. potentially enhancing hate speech detection in multilingual
Modha et al. [23] proposed a real-time model to identify social media discussions. Furthermore, Fortuna et al. [30]
and visualize hate comments from Facebook and Twitter. conducted an extensive study using a dataset for hate speech,
This model can be used as a plugin tool in web browsers to toxicity, abusive language, and offensive content classifi-
monitor online hate speech effectively. Initially, the authors cation. They experimented with various models, including
used traditional machine learning algorithms such as SVM BERT, ALBERT, fasttext, and SVM, trained on nine publicly
and logistic regression as a baseline model. Subsequently, available datasets, evaluating both intra-dataset and inter-
they experimented with more advanced models such as dataset model performance to gauge their generalizability
CNN, BiLSTM, and BERT transformers. The experimental across different hate speech categories and datasets.
results showed that the proposed models achieved an Overall, while progress has been made in detecting hate
F1-score of 0.64 on the Facebook dataset and 0.58 on speech, many studies have mainly used small datasets from
the Twitter dataset. Kapil and Ekbal [24] introduced a single platforms like Twitter, Facebook. Relying on these
multi-task learning framework designed to identify multiple limited sources might affect how well these methods work
interconnected categories of hate speech, including offensive in the real world, especially across different languages or
language, racism, and sexism. Multiple neural networks platforms. To make these methods more reliable, future
were developed, encompassing architectures such as CNNs, research should consider using more diverse and larger
LSTM networks, and a combination of CNN and GRU. datasets from various sources.
These networks were trained for both single-task and multi-
task learning scenarios. The initial training of the models III. DATASET DESCRIPTION
was carried out for individual classes, and subsequently, The dataset utilized in this study incorporates 18 distinct
a shared neural network was developed to perform the datasets sourced from various publications spanning recent
combined classification task. Rodriguez-Sanchez et al. [25] years. The curation of this dataset was conducted by a
conducted an experimental study to assess the effectiveness team of researchers, primarily selecting datasets based on
of deep learning, machine learning, and transformer learning their relevance to the study of hate speech prevalent on
approaches in detecting hate speech specifically in Spanish the web [31]. Notably, this combined dataset represents a
language text. The results indicated that the transformer pioneering effort, as no prior research, to the best of our
approach outperformed the other methods, achieving the knowledge, has employed such an extensive compilation for
highest F1-score of 0.75 for hate classification. hate speech classification tasks.
Mossie and Wang [26] introduced a method targeting This comprehensive dataset integrates diverse sources,
the recognition of vulnerable communities through hate encompassing various digital media platforms like Twitter,
speech detection techniques. They utilized word2vec word Facebook and Stormfront. Capturing data from multiple
embedding and n-grams for feature extraction, followed social media platforms and across varying time periods,
by classification using machine learning and deep learning the dataset offers a rich spectrum of content. Rigorous
algorithms. Moreover, they expanded the hate word lexicon preprocessing measures were implemented to maintain
by integrating co-occurring word vectors with the highest coherence and compatibility across this merged collection.
similarity, enabling the identification of the target ethnic Nonetheless, including data from various sources and tempo-
community based on matched hate words. Ameur et al. [27] ral spans inherently poses challenges in any text classification
presented a dataset of 10,828 Arabic tweets addressing hate endeavor.
speech related to COVID-19. They performed fundamental These challenges manifest in the form of linguistic varia-
analyses using pre-trained models, highlighting the efficacy tions, tonal disparities, and contextual nuances, posing obsta-
of these models in detecting hate speech and false informa- cles in creating classification models capable of effectively
tion in the complex Arabic language context. Meanwhile, capturing and generalizing patterns across different sources
Khanday et al. [28] investigated hate speech detection on and time frames. While enriching the dataset, this diversity
Twitter during the COVID-19 pandemic, employing various also introduces complexities that demand sophisticated

VOLUME 12, 2024 27227


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

TABLE 1. Description of datasets employed in this study.

FIGURE 1. (a) Overall distribution of data (b) SSD vs LSD distribution of


data.

FIGURE 2. (a) SSD class distribution (b) LSD class distribution.

approaches to modeling and analysis. In Table 1, detailed


descriptions of several representative subsets within the towards hate content and the LSD biased towards non-hate
complete dataset, elucidating their significance as integral content. To rectify this imbalance, strategic application of the
components of this expansive collection. Synthetic Minority Over-sampling Technique (SMOTE) was
As observed from Table 1, the curated dataset represents performed on both sub-samples [48]. Through oversampling
a subset where various types of hate are targeted and may the minority class within each sub-sample, SMOTE effec-
be categorized into multiple labels to distinguish different tively harmonized the class distributions, ensuring a more
forms of hate. However, when these diverse datasets are equitable representation of hate and non-hate content without
combined into a unified collection, the labels are standardized relying on additional specific transformations.
so that any form of hate is classified as ‘hate comment’ while Additionally, a word-ontology approach was utilized to
non-hate comments constitute the alternative category within manage the extensive vocabulary generated during the
this dataset. preprocessing stages. The dataset contained 127, 546 distinct
The dataset, initially comprising 451, 709 English- words after preprocessing, presenting a challenge due to
language samples, was categorized into hate speech its substantial size and potential computational complexity
(371, 452) and non-hate speech (80, 257), reflecting an inher- in subsequent analyses. In this study, the WordNet ontol-
ent class imbalance. To address this, the dataset underwent ogy [49], [50] technique was employed to hierarchically
meticulous preprocessing, including tokenization, removal organize words based on their semantic relationships and
of stop words and symbols, and lemmatization. Following contextual meanings. This method categorized words into
this, augmentation techniques were employed to rectify the clusters or groups according to their similarities in meaning or
class imbalance issue. Through these augmentation efforts, usage, effectively consolidating redundant or closely related
the final dataset expanded to 726,120 samples, achieving an terms. Ultimately, the word-ontology technique significantly
equal class ratio and ensuring a more balanced representation reduced the vocabulary size by up to 10.88%.
of hate and non-hate speech categories for subsequent
analysis. IV. MODEL FUSION FRAMEWORK
During exploratory data analysis, it was noted that the This section presents a detailed description of the model
dataset comprised sentences of varying lengths, reflecting architecture proposed in this manuscript(see Fig. 3). The
the distinct writing styles associated with different web model includes word embedding layers, multiple CNN
sources. This observation prompted the examination of the layers, a BiLSTM with attention mechanism layers, network
distribution of data based on the number of words per sample. merging layers, and a classification layer. Since our dataset
As a result, two distinct sub-samples emerged: the Short contains sequences of varying lengths, short text sequences
Sequence Dataset (SSD), encompassing text up to 20 words, feed to the CNN model and long text sequences to the
and the Long Sequence Dataset (LSD), containing longer text BiLSTM model followed by an attention layer. After
up to 300 words (depicted in Fig. 1). However, this division reviewing the results of the proposed methodology, it was
resulted in imbalanced subsets (Fig. 2), with the SSD skewed observe that the CNN model is particularly effective at

27228 VOLUME 12, 2024


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

the word embedding layer performs a matrix multiplication:


ei = wi × E. This results in a sequence of dense vectors
E = e1 , e2 , e3 , · · · , en of size n × d. The word embedding
matrix E is updated during training by minimizing a loss
function with respect to the parameters of the model. This
way, the word embedding layer learns to capture the semantic
relationships between words in the vocabulary.
The embedding matrix E can then be fed as input to
the subsequent layers of the neural network for further
processing and classification. This work has created two
distinct embedding layers: one for generating vectors for
short text sequences of length 20, and the other for generating
vectors for long text sequences of length 300.
Additionally, a pre-trained embedding layer using Global
Vectors for Word Representation (GloVe) [54] with 50 dimen-
sions has been included for comparison. This layer aims
to offer word vector representations derived from existing
knowledge, presenting an alternative perspective on word
relationships within the text data. Both embedding layers,
pre-train and trainable, contribute diverse perspectives in
capturing and representing the underlying semantics within
the text data, providing nuanced approaches for subsequent
FIGURE 3. Block diagram of proposed study.
analysis and classification.

B. CNN ARCHITECTURE
capturing targeted keywords. To a certain extent, these
Our CNN model integrates two convolutional layers and
keywords directly determine the polarity of short text [51].
pooling layers to extract local features, with the objective
However, the CNN model may not be as effective with
of obtaining more informative keywords that enhance the
long text sequences because its convolutional layers operate
overall performance of the model. The CNN architecture is
over fixed-sized windows. As the text sequence grows
described below.
longer, the fixed-sized window may not capture all relevant
information [52], [53]. In such cases, the BiLSTM comes
1) FIRST CONVOLUTIONAL LAYER
into play as it is better suited to handle longer sequences
by being able to learn from the entire sequence and capture This layer applies 128 filters of size 3 to the input sequence,
long-range dependencies between the words. Furthermore, producing 128 feature maps as output. Each filter slides over
an exploration was undertaken to utilize word ontology the input sequence, computing a dot product between the
for reducing the feature/word count. Subsequently, detailed filter weights and the input at each position. The output of
descriptions of the model’s component structure are provided the convolution operation is then passed through a ReLU
in the following sections. activation function (Eq. 1).

Xk−1
A. EMBEDDING LAYER H(i, j) = f ( Wk Xi+j−k−1 + B) (1)
Firstly, a word embedding layer is used to learn a dense vector k=0
representation of words from the preprocessed data. This
layer takes the tokenized text as input and maps each word where i and j denote the position of the output feature map,
to a fixed-sized dense vector. During training, the vectors are k denotes the filter index, f is the activation function, and K
learned, capturing the semantic relationships between words is the kernel size. The filter weights W are learned during
in the vocabulary. The input tokenized text can be represented training to capture meaningful patterns in the input data. The
as a sequence of n words: w1 , w2 , w3 , · · · , wn , where each bias term B is added to each output feature map to introduce
word wi is represented as a one-hot vector of vocabulary a shift in the activation function. The resulting output feature
size v. The one-hot vector for a word has a value of 1 in map contains a set of activation values representing the
the position corresponding to the index of the word in the presence of different patterns in the input data.
vocabulary and 0 elsewhere.
The embedding layer has a matrix E of size v × d where d 2) FIRST POOLING LAYER
is the dimension of the dense word vectors to be learned. Each This layer performs max pooling on the output of the previous
row of E contains the vector representation of a word in the convolutional layer, reducing the spatial dimension by a
vocabulary. To obtain the dense vector ei for each word wi , factor of 2. Max pooling computes the maximum value within

VOLUME 12, 2024 27229


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

each pooling window, which in this case has size 2(Eq. 2). the hidden state sequences computed by the forward and
backward LSTMs, respectively.
yi,j = Max H(i,2j) , H(i,2j+1)

(2)  
h = H̄f H̄b (7)
Here, y(i,j) is the jth output of the ith feature map after
max pooling and H(i,2j) and H(i,2j+1) are the outputs of
2) ATTENTION LAYER
the previous convolutional layer at positions 2j and 2j + 1,
respectively. The objective of using attention layers is to enable the model
to focus on the most important parts of the input sequence
3) SECOND CONVOLUTIONAL LAYER while ignoring the irrelevant parts. The architecture of the
This layer applies 64 filters of size 3 to the output of the first proposed model incorporates an additive attention layer,
pooling layer and producing 64 feature maps as output similar which takes input as the output from the previous BiLSTM
to the Eq. 1. layer, which is a sequence of hidden states. The attention
layer then computes a set of attention weights for each hidden
4) SECOND POOLING LAYER state in the sequence. These weights indicate the importance
of each hidden state with respect to the current context and
iv. Second Pooling Layer: This layer performs max pooling
are computed using a dense layer with a sigmoid activation
over the entire spatial dimension of the output of the previous
function, followed by a dot product operation between the
convolutional layer, resulting in a scalar value for each feature
resulting attention probabilities and the hidden states.
map.
More specifically, a dense layer with a sigmoid activation
Zj = max y1,j , y2,j , Y3,j , . . . . . . , yn,j

(3) function generates attention probabilities for each hidden
state in the BiLSTM layer. These probabilities are then multi-
5) CNN OUTPUT LAYER plied with their corresponding hidden states and summed up
A fully connected dense layer with the ReLU activation to get the context vector, representing the input sequence’s
function. It returns the maximum of 0 and the input value, most important parts.
which means that any negative values are set to 0.
  hi = tanh(W a [hi−1 ; hi+1 ] + ba ) (8)
y = f WTX + b (4)
Here, hi is the hidden state of the BiLSTM at time step i, Wa
where y and X is the output and input layer, W is a matrix is the weight matrix of the attention layer, and ba is the bias
of weights, b is a vector of biases and f is the activation vector of the attention layer. The concatenation of the hidden
function(ReLUx’), defined as f (x) = max (0, x). states of the BiLSTM at time step i−1 and i+1 is represented
as hi−1 ; hi+1 .
C. BiLSTM WITH ATTENTION LAYER ARCHITECTURE The energy score (ei ) of the attention layer for time step i
The BiLSTM model allows learning representations from is represented as ei , which is computed as the dot product of
both the forward and backward directions. The attention the weight vector va and the hidden state hi .
mechanism then weights the learned representations based on ei = vTa . hi (9)
their importance in the context of the input sequence. Finally,
the weighted representations are fed to the output layer for The attention weight (probabilities) assigned to the hidden
prediction. state at time step i is given by ai , which is computed as
SoftMax (ei ).
1) BiLSTM LAYER
ai = softmax (ei ) (10)
The BiLSTM model concatenates the output of the forward
and backward LSTM cells at each time step, producing a Lastly, the final context vector c uses the attention weights
sequence of hidden states h = {h1 , h2 , . . . , hT } where ai to combine the hidden states hi selectively. This context
T is the length of the input sequence. The forward LSTM vector c encapsulates information from the input sequence
computes the hidden state sequence H̄f for each time step t elements based on their relevance or importance determined
using the input sequence (Xt ), the previous cell state H̄t−1 and by the attention mechanism.
the hidden state Ht . n
X
H̄f (Xt ) = LSTM f Ht , H̄t−1

(5) c= ai × hi (11)
i=1
Similarly, the backward LSTM computes the hidden state
sequence hb for each time step t using the input sequence (Xt ) D. MODEL FUSION LAYER
and the next cell state H̄bt +1 expressed in Eq.6. The merging network layer in our proposed model takes
advantage of both CNNs and BiLSTM networks by com-
H̄b (Xt ) = LSTM b Xt − H̄bt +1

(6)
bining their outputs to create a new, more powerful model.
Finally, the concatenated output of the forward and backward The output of each neural network is a vector representation
LSTM layers is given by Eq. 7, where H̄f and H̄b are of the input text. To combine information from two models,

27230 VOLUME 12, 2024


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

TABLE 2. Hyper-parameters configurations. Therefore, precision 13, recall 14, and f-score 15 are
important metrics to consider as they provide insight into
how well the model can correctly identify hate content.
Precision measures the frequency of correct identification of
hate content by the model, while recall measures how well the
model can detect hate speech. F-score combines precision and
recall to give a balanced measure of the model’s performance.

TP + TN
Accuracy = (12)
TP + TN + FP + FN
TP
Percision = (13)
TP + FP
TP
an element-wise addition operation is performed between Recall = (14)
TP + FN
the output tensors of the models. Let’s assume that the
output tensor from the CNN model is denoted as yCNN 2 × (Percision × Recall)
F − score = (15)
and the output tensor from the BiLSTM model is denoted Percision + Recall
as yBiLSTM . The element-wise addition operation between
can be represented as M = yCNN ⊕ yBiLSTM where ⊕ VI. RESULTS AND DISCUSSIONS
denotes the element-wise addition operation. This operation After establishing the experimental configurations for CNN
involves adding the corresponding elements of both models to and BiLSTM models across varying data lengths and
obtain the corresponding elements in M . The resulting tensor settings, the subsequent focus shifted to analyzing the results
M represents the combined information from both models, obtained from these comprehensive evaluations. Each model
which is then fed into a final classification layer. This layer (CNN and BiLSTM) was tested under four distinct settings,
is a dense layer with a sigmoid activation function, which encompassing combinations of pre-trained and trainable
maps the input tensor to a probability distribution over the embedding layers, with and without the integration of word
hate speech or not. ontology. Initially, both the CNN and BiLSTM models
underwent testing on the complete dataset, followed by
V. EXPERIMENTAL CONFIGURATIONS subsequent evaluations where the CNN model processed SSD
The experiments conducted in this study were executed on and the BiLSTM model handled LSD. This process produced
Google Colab, utilizing a standard GPU and the Python four distinct outcomes for each dataset type, providing a
programming language. The SSD and LSD datasets were comprehensive understanding of the models’ performance
partitioned into three subsets - for training, testing, and variations. Finally, a fusion model emerged, integrating short
validation by employing the ′ train_test_split ′ method. The sequence data into the CNN and employing long sequence
training dataset comprised 80% of the total data, while the data within the BiLSTM model alongside an attention
remaining 20% was equally divided between testing and mechanism. The results of the evaluation metrics, including
validation. Both models were equipped with an embedding accuracy, precision and recall, are presented in the following
layer with an output dimension of 50 and multiple dropout Table 3, providing insight into the effectiveness of our model
layers implemented to prevent overfitting. Additionally, early in identifying instances of hate speech.
stopping with patience of 5 was employed to mitigate As one can observe from the presented results, there are
overfitting risks. The models were trained for 50 epochs notable variations in performance among the models and
using the Adam optimizer [55], featuring a learning rate set their respective configurations. When CNN and BiLSTM
at 0.01 and a batch size of 512. Throughout the training operated independently on the complete dataset (LSD +
process, binary cross-entropy was utilized to compute the SSD), their accuracies ranged moderately between 80 −
validation loss of our models. These parameter settings were 88%. Specifically, in terms of identifying hate speech, CNN
chosen based on empirical experimentation, resulting in achieved precision rates between 75 − 87%, while BiLSTM
high accuracy for our classification task. Detailed parameter exhibited precision rates from 77 − 87% for the hate class.
information is provided in Table 2. However, a significant shift occurred when these models
To assess the performance of the models, accuracy, were separately trained on SSD and LSD. The precision for
precision, recall, and f-score were employed as the evaluation hate speech notably improved by around 5 − 6% for both
metrics. The accuracy 12 is the ratio of the correctly classified CNN and BiLSTM when tailored to their respective sequence
samples to the total number of samples. Accuracy alone lengths. This showcased the effectiveness of a data-driven
may not be sufficient to evaluate the model’s performance, strategy, highlighting CNN’s suitability for shorter texts and
especially when the focus is on a particular target class. BiLSTM effectiveness for longer ones. These findings led

VOLUME 12, 2024 27231


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

TABLE 3. Performance evaluation of different model combinations of the proposed study.

FIGURE 5. Performance comparison of embedding techniques.

FIGURE 4. Validation and training loss over different epochs.

and the merged model reveals that CNN’s validation loss is


higher than BiLSTM. The merged model’s validation loss
to adopting a combined approach, leveraging CNN and falls between the two, aligning with expectations due to their
BiLSTM into a unified architecture. differing input sequence lengths.
The resultant unified model not only maintained a high Moreover, when comparing different ontology and
accuracy of 88 − 89% but also showcased an improvement in pre-trained embedding capacity settings, it was observed
precision for identifying hate speech by approximately 6−8% that using trainable embedding led to a decrease in model
compared to the individual performances on complete data. accuracy by 6 to 7%. Figure 5 illustrate the comparison
This underlines the synergy achieved by integrating their of both embedding capacity settings. On the other hand,
strengths, demonstrating a more comprehensive understand- employing word ontology had a minor effect, decreasing
ing and adeptness in identifying hate speech content. Figure 4 accuracy by only 1 to 2%.
displays the training and validation loss graphs for the The performance of the CNN model was also evaluated
selected models and their comparative analysis. In Figure 4 with different text lengths, including 10, 20, and 30 words,
(f), the validation loss comparison among CNN, BiLSTM, to determine the optimal length for the specific task. The

27232 VOLUME 12, 2024


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

TABLE 4. Performance comparison with state-of-the-art techniques using a subset of our employed datasets.

FIGURE 6. Comparison of CNN model performance with different word


sequences.
FIGURE 7. Impact of changing batch size and learning rate on the
accuracy.

model’s performance was similar for all three lengths,


as shown in Figure 6. However, splitting the data by 20 words studies, even when using the largest dataset. These results
resulted in a balanced data distribution into two halves. suggest that the proposed model serves as a global model
Further, for exploring the effectiveness of the proposed that can train on a large, diverse dataset and provide better
approach, a comparison was made with an existing state-of- predictions for the hate class.
the-art method. Table 4 compares the proposed fusion model In addition to the comparative study, further experiments
with earlier research studies that utilized any subset of the were conducted to test the performance of the proposed
dataset used in the current study. The comparison is made model under different conditions. We also investigated the
in terms of accuracy and F-score. Researchers sometimes impact of different hyper-parameters, such as batch size
presented their results separately for each dataset instead and learning rate η, on the accuracy and training time of
of combining them. For such cases, the average of their our model. We experimented with three values for each
results was compared with our study. Moreover, in the case hyper-parameter (128, 256, and 512 for batch size and
of multilingual data usage by an author, only the results for 0.01, 0.001, and 0.0001 for learning rate), resulting in nine
the English dataset were considered. combinations in total. These experiments aimed to assess the
The comparative study presented highlights the superior robustness and versatility of the proposed model in different
performance of the proposed merged model over previous settings.

VOLUME 12, 2024 27233


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

areas within sentences, thereby enhancing the BiLSTM’s


performance. After training each model individually, a model
fusion approach was employed to combine their outputs,
resulting in a unified model.
Notably, the effectiveness of proposed ensemble approach
is underscored by the results. Employing CNN for the entire
dataset yielded an accuracy of 81% with an F-score of
0.82 for hate class detection. However, when CNN was
exclusively applied to short text samples, the accuracy
soared to 88% with an F-score of 0.88. Similarly, the
exclusive use of BiLSTM for the entire dataset resulted in
an accuracy of 88% with an F-score of 0.88, while for
longer text, the accuracy reached an impressive 92% with
an F-score of 0.92. These findings vividly illustrate the
inadequacy of a single model for handling the diversity of
this problem effectively. By combining both models into a
unified framework, our approach achieved an outstanding
FIGURE 8. Impact of changing batch size and learning rate on the training
accuracy of 89%, showcasing the potential of model fusion in
time. addressing the hate speech detection challenge in the dynamic
digital landscape.
Furthermore, it is important to note that the success of our
Figure 7 shows that the accuracy was almost invariant approach extends beyond the specific task of hate speech
to the changes in these hyper-parameters, indicating that detection. The principles of leveraging diverse datasets
our model was robust and insensitive to them. However, from various digital platforms and accommodating varying
significant variability in training time based on the batch post lengths, combined with model fusion, can be further
size and learning rate was noted. Specifically, a decrease in explored and applied to a wide range of text classification
batch size or an increase in learning rate resulted in longer tasks. Whether it’s sentiment analysis, topic categoriza-
training times (refer to Figure 8). This trade-off between time tion, or content moderation, the methodology presented in
and stability was observed, yet it did not impact the model’s this study offers a promising avenue for enhancing the
performance. Furthermore, the selected embedding technique efficiency and accuracy of text classification across the
outperformed a pre-trained embedding digital landscape. Our research contributes to creating a
safer and more inclusive online environment and paves the
VII. CONCLUSION way for innovative solutions in addressing text classification
The unprecedented growth of social media platforms in challenges that span different digital platforms with varying
the digital age has introduced an alarming opportunity for post structures.
the swift dissemination of hate speech, posing a signifi-
cant threat to online discourse and community well-being. CONFLICTS OF INTEREST
To address this pressing issue, our research presents a novel The authors declare that there are no conflicts of interest
approach leveraging a comprehensive dataset comprising regarding the publication of this manuscript. Any affiliations,
over 0.45 million comments from 18 diverse sources, encom- financial involvement, or relationships with organizations or
passing various digital platforms across different time frames. entities that might pose a conflict of interest with the subject
Following thorough data preprocessing and balancing (by matter discussed in this work are hereby disclosed.
employment data augmentation), a comprehensive analysis
revealed the presence of sentences ranging from 3 to
REFERENCES
300 words in length. Recognizing the challenge of handling
such variable-length text, the dataset divided into two distinct [1] F. M. Plaza-Del-Arco, M. D. Molina-González, L. A. Ureña-López,
and M. T. Martín-Valdivia, ‘‘A multi-task learning approach to hate
subsets based on sentence length—short sequence data (SSD) speech detection leveraging sentiment analysis,’’ IEEE Access, vol. 9,
and long sequence data (LSD). Our approach leveraged pp. 112478–112489, 2021.
previous research findings indicating that CNN performs [2] N. S. Mullah and W. M. N. W. Zainon, ‘‘Advances in machine learning
algorithms for hate speech detection in social media: A review,’’ IEEE
exceptionally well in classifying short sequence text, and
Access, vol. 9, pp. 88364–88376, 2021.
capturing local features effectively, while BiLSTM excels [3] Z. Zhang and L. Luo, ‘‘Hate speech detection: A solved problem? The
in understanding the context of long sentences. To harness challenging case of long tail on Twitter,’’ Semantic Web, vol. 10, no. 5,
these strengths, CNN models were trained for SSD and pp. 925–945, Sep. 2019.
BiLSTM models for LSD. Acknowledging the potential [4] S. Mishra, S. Prasad, and S. Mishra, ‘‘Exploring multi-task multi-lingual
learning of transformer models for hate speech and offensive speech
for very long sequences in the LSD subset, an attention identification in social media,’’ Social Netw. Comput. Sci., vol. 2, no. 2,
mechanism was introduced to focus on the most relevant pp. 1–19, Apr. 2021.

27234 VOLUME 12, 2024


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

[5] M. Mozafari, R. Farahbakhsh, and N. Crespi, ‘‘Cross-lingual few-shot [28] A. M. U. D. Khanday, S. T. Rabani, Q. R. Khan, and S. H. Malik,
hate speech and offensive language detection using meta learning,’’ IEEE ‘‘Detecting Twitter hate speech in COVID-19 era using machine learning
Access, vol. 10, pp. 14880–14896, 2022. and ensemble learning techniques,’’ Int. J. Inf. Manage. Data Insights,
[6] K. T. Mursi, M. D. Alahmadi, F. S. Alsubaei, and A. S. Alghamdi, vol. 2, no. 2, Nov. 2022, Art. no. 100120.
‘‘Detecting Islamic radicalism Arabic tweets using natural language [29] G. D. Valle-Cano, L. Quijano-Sánchez, F. Liberatore, and J. Gómez,
processing,’’ IEEE Access, vol. 10, pp. 72526–72534, 2022. ‘‘SocialHaterBERT: A dichotomous approach for automatically detecting
[7] N. Vashistha and A. Zubiaga, ‘‘Online multilingual hate speech detection: hate speech on Twitter through textual analysis and user profiles,’’ Expert
Experimenting with Hindi and English social media,’’ Information, vol. 12, Syst. Appl., vol. 216, Apr. 2023, Art. no. 119446.
no. 1, p. 5, Dec. 2020. [30] P. Fortuna, J. Soler-Company, and L. Wanner, ‘‘How well do hate speech,
[8] R. Singh, S. Subramani, J. Du, Y. Zhang, H. Wang, K. Ahmed, and toxicity, abusive and offensive language classification models generalize
Z. Chen, ‘‘Deep learning for multi-class antisocial behavior identification across datasets?’’ Inf. Process. Manage., vol. 58, no. 3, May 2021,
from Twitter,’’ IEEE Access, vol. 8, pp. 194027–194044, 2020. Art. no. 102524.
[9] T. Davidson, D. Warmsley, M. Macy, and I. Weber, ‘‘Automated hate [31] D. Mody, Y. Huang, and T. E. Alves de Oliveira, ‘‘A curated dataset for
speech detection and the problem of offensive language,’’ in Proc. Int. hate speech detection on social media text,’’ Data Brief, vol. 46, Feb. 2023,
AAAI Conf. Web Social Media, 2017, vol. 11, no. 1, pp. 512–515. Art. no. 108832.
[10] S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and [32] O. de Gibert, N. Perez, A. García-Pablos, and M. Cuadros, ‘‘Hate speech
O. Frieder, ‘‘Hate speech detection: Challenges and solutions,’’ PLoS dataset from a white supremacy forum,’’ 2018, arXiv:1809.04444.
ONE, vol. 14, no. 8, Aug. 2019, Art. no. e0221152. [33] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar,
[11] N. A. Ghani, S. Hamid, I. A. Targio Hashem, and E. Ahmed, ‘‘Social ‘‘Predicting the type and target of offensive posts in social media,’’ 2019,
media big data analytics: A survey,’’ Comput. Hum. Behav., vol. 101, arXiv:1902.09666.
pp. 417–428, Dec. 2019. [34] N. Ljubešić, D. Fišer, and T. Erjavec, ‘‘Offensive language dataset of
[12] X. Sun, D. Yang, X. Li, T. Zhang, Y. Meng, H. Qiu, G. Wang, E. Hovy, and Croatian, English and Slovenian comments FRENK 1.0,’’ 2021. [Online].
J. Li, ‘‘Interpreting deep learning models in natural language processing: Available: http://hdl.handle.net/11356/1433
A review,’’ 2021, arXiv:2110.10470. [35] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. R. Pardo, P. Rosso,
[13] H. Wang, J. He, X. Zhang, and S. Liu, ‘‘A short text classification and M. Sanguinetti, ‘‘SemEval-2019 task 5: Multilingual detection of hate
method based on N -gram and CNN,’’ Chin. J. Electron., vol. 29, no. 2, speech against immigrants and women in Twitter,’’ in Proc. 13th Int.
pp. 248–254, 2020. Workshop Semantic Eval., 2019, pp. 54–63.
[14] Y. Zhou, J. Li, J. Chi, W. Tang, and Y. Zheng, ‘‘Set-CNN: A text [36] N. Ousidhoum, Z. Lin, H. Zhang, Y. Song, and D.-Y. Yeung, ‘‘Multilingual
convolutional neural network based on semantic extension for and multi-aspect hate speech analysis,’’ 2019, arXiv:1908.11049.
short text classification,’’ Knowl.-Based Syst., vol. 257, Dec. 2022, [37] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, and
Art. no. 109948. A. Patel, ‘‘Overview of the HASOC track at FIRE 2019: Hate speech and
[15] J. Xu, Y. Cai, X. Wu, X. Lei, Q. Huang, H.-F. Leung, and Q. Li, offensive content identification in Indo–European languages,’’ in Proc.
‘‘Incorporating context-relevant concepts into convolutional neural net- 11th Forum for Inf. Retr. Eval., Dec. 2019, pp. 14–17.
works for short text classification,’’ Neurocomputing, vol. 386, pp. 42–53, [38] T. Mandl, S. Modha, A. Kumar M, and B. R. Chakravarthi, ‘‘Overview
Apr. 2020. of the HASOC track at FIRE 2020: Hate speech and offensive language
[16] J. Du, C.-M. Vong, and C. L. P. Chen, ‘‘Novel efficient RNN and LSTM- identification in Tamil, Malayalam, Hindi, English and German,’’ in Proc.
like architectures: Recurrent and gated broad learning systems and their Forum for Inf. Retr. Eval., Dec. 2020, pp. 29–32.
applications for text classification,’’ IEEE Trans. Cybern., vol. 51, no. 3,
[39] A. Gautam, P. Mathur, R. Gosangi, D. Mahata, R. Sawhney, and
pp. 1586–1597, Mar. 2021.
R. R. Shah, ‘‘#MeTooMA: Multi-aspect annotations of tweets related to
[17] W. K. Sari, D. P. Rini, and R. F. Malik, ‘‘Text classification using long the MeToo movement,’’ in Proc. Int. AAAI Conf. Web Social Media,
short-term memory with glove,’’ Jurnal Ilmiah Teknik Elektro Komputer vol. 14, 2020, pp. 209–216.
dan Informatika (JITEKI), vol. 5, no. 2, pp. 85–100, 2019.
[40] R. Agarwal. Twitter Hate Speech. Accessed: Nov. 20, 2023. [Online].
[18] M. Shi, K. Wang, and C. Li, ‘‘A C-LSTM with word embedding model for Available: https://www.kaggle.com/datasets/vkrahul/twitter-hate-speech
news text classification,’’ in Proc. IEEE/ACIS 18th Int. Conf. Comput. Inf.
[41] M. Alberight. Classified Tweets. Accessed: Nov. 20, 2023. [Online]. Avail-
Sci. (ICIS), Jun. 2019, pp. 253–257.
able: https://www.kaggle.com/datasets/munkialbright/classified-tweets
[19] W. Warner and J. Hirschberg, ‘‘Detecting hate speech on the world wide
[42] S. Reddy. Malignant Comment Classification. Accessed: Nov. 20, 2023.
web,’’ in Proc. 2nd Workshop Lang. Social Media, 2012, pp. 19–26.
[Online]. Available: https://www.kaggle.com/datasets/surekharamireddy
[20] R. Alshalan and H. Al-Khalifa, ‘‘A deep learning approach for automatic
/malignant-comment-classification
hate speech detection in the Saudi Twittersphere,’’ Appl. Sci., vol. 10,
no. 23, p. 8614, Dec. 2020. [43] A. Toosi. Twitter Sentiment Analysis. Accessed: Nov. 20, 2023. [Online].
Available: https://www.kaggle.com/datasets/arkhoshghalb/twitter-sentim
[21] Z. Waseem and D. Hovy, ‘‘Hateful symbols or hateful people? Predictive
ent-analysis-hatred-speech
features for hate speech detection on Twitter,’’ in Proc. NAACL Student
Res. Workshop, 2016, pp. 88–93. [44] Usharengaraju. Dynamically Generated Hate Speech Dataset. Accessed:
[22] S. Khan, M. Fazil, V. K. Sejwal, M. A. Alshara, R. M. Alotaibi, A. Kamal, Nov. 20, 2023. [Online]. Available: https://www.kaggle.com/datasets/
and A. R. Baig, ‘‘BiCHAT: BiLSTM with deep CNN and hierarchical usharengaraju/dynamically-generated-hate-speech-dataset
attention for hate speech detection,’’ J. King Saud Univ. Comput. Inf. Sci., [45] A. Samshyn. Hate Speech and Offensive Language Dataset. Accessed:
vol. 34, no. 7, pp. 4335–4344, Jul. 2022. Nov. 20, 2023. [Online]. Available: https://www.kaggle.com/datasets/
[23] S. Modha, P. Majumder, T. Mandl, and C. Mandalia, ‘‘Detecting mrmorj/hate-speech-and-offensive-language-dataset
and visualizing hate speech in social media: A cyber watchdog for [46] A. Edwards, L. Edwards, and A. Martin, ‘‘Cyberbullying perceptions
surveillance,’’ Expert Syst. Appl., vol. 161, Dec. 2020, Art. no. 113725. and experiences in diverse youth,’’ in Proc. Conf. Human Factors
[24] P. Kapil and A. Ekbal, ‘‘A deep neural network based multi-task Cybersecurity (AHFE). New York, NY, USA: Springer, Jul. 2020,
learning approach to hate speech detection,’’ Knowl.-Based Syst., vol. 210, pp. 9–16.
Dec. 2020, Art. no. 106458. [47] Mendeley. Cyberbullying Dataset. Accessed: Nov. 20, 2023. [Online].
[25] F. M. Plaza-del-Arco, M. D. Molina-González, L. A. Ureña-López, and Available: https://data.mendeley.com/datasets/jf4pzyvnpj/1
M. T. Martín-Valdivia, ‘‘Comparing pre-trained language models for [48] B. Wei, J. Li, A. Gupta, H. Umair, A. Vovor, and N. Durzynski, ‘‘Offensive
Spanish hate speech detection,’’ Expert Syst. Appl., vol. 166, Mar. 2021, language and hate speech detection with deep learning and transfer
Art. no. 114120. learning,’’ 2021, arXiv:2108.03305.
[26] Z. Mossie and J.-H. Wang, ‘‘Vulnerable community identification using [49] C. Fellbaum, ‘‘WordNet,’’ in Theory and Applications of Ontology:
hate speech detection on social media,’’ Inf. Process. Manage., vol. 57, Computer Applications. Dordrecht, The Netherlands: Springer, 2010,
no. 3, May 2020, Art. no. 102087. pp. 231–243.
[27] M. S. H. Ameur and H. Aliane, ‘‘AraCOVID19-MFH: Arabic COVID-19 [50] X. Liu, Q. Tong, X. Liu, and Z. Qin, ‘‘Ontology matching: State of the
multi-label fake news & hate speech detection dataset,’’ Proc. Comput. Sci., art, future challenges, and thinking based on utilized information,’’ IEEE
vol. 189, pp. 232–241, Jan. 2021. Access, vol. 9, pp. 91235–91243, 2021.

VOLUME 12, 2024 27235


W. Sharif et al.: Enhancing Hate Speech Detection in the Digital Age

[51] B. Liang, Q. Liu, J. Xu, Q. Zhou, and P. Zhang, ‘‘Aspect-based sentiment SAIMA ABDULLAH received the Ph.D. degree
analysis based on multi-attention CNN,’’ J. Comput. Res. Development. from the Department of Computer Science and
Chin., vol. 54, no. 8, pp. 1724–1735, 2017. Electronic Engineering, University of Essex, U.K.
[52] S. Lai, L. Xu, K. Liu, and J. Zhao, ‘‘Recurrent convolutional neural Currently, she holds the position of an Assistant
networks for text classification,’’ in Proc. AAAI Conf. Artif. Intell., Professor with the Department of Computer Sci-
Feb. 2015, vol. 29, no. 1, pp. 2267–2273. ence and Information Technology, The Islamia
[53] J. Cai, J. Li, W. Li, and J. Wang, ‘‘Deeplearning model used in text University of Bahawalpur, Pakistan. As a member
classification,’’ in Proc. 15th Int. Comput. Conf. Wavelet Act. Media of the Multimedia Research Group, DCS, she
Technol. Inf. Process. (ICCWAMTIP), Dec. 2018, pp. 123–126.
focuses on efficient and secure communication of
[54] J. Pennington, R. Socher, and C. Manning, ‘‘Glove: Global vectors for
multimedia data over future generation network
word representation,’’ in Proc. Conf. Empirical Methods Natural Lang.
Process. (EMNLP), 2014, pp. 1532–1543. technologies. Her primary research interests include wireless networks
[55] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’ and communications, future internet technology, and network performance
2014, arXiv:1412.6980. analysis. She has authored ten articles in these areas. She serves as a reviewer
[56] Y. Zhou, Y. Yang, H. Liu, X. Liu, and N. Savage, ‘‘Deep learning for various international journals.
based fusion approach for hate speech detection,’’ IEEE Access, vol. 8,
pp. 128923–128929, 2020.
[57] H. S. Alatawi, A. M. Alhothali, and K. M. Moria, ‘‘Detecting white
supremacist hate speech using domain specific word embedding with deep
learning and BERT,’’ IEEE Access, vol. 9, pp. 106363–106374, 2021.
[58] B. Gambäck and U. K. Sikdar, ‘‘Using convolutional neural networks SAMAN IFTIKHAR (Member, IEEE) received
to classify hate-speech,’’ in Proc. 1st Workshop Abusive Lang., 2017,
the M.S. and Ph.D. degrees in information tech-
pp. 85–90.
nology from the National University of Sciences
[59] S. Khan, A. Kamal, M. Fazil, M. A. Alshara, V. K. Sejwal, R. M. Alotaibi,
and Technology (NUST), Islamabad, Pakistan, in
A. R. Baig, and S. Alqahtani, ‘‘HCovBi-Caps: Hate speech detection
using convolutional and bi-directional gated recurrent unit with Capsule 2008 and 2014, respectively. She is currently an
network,’’ IEEE Access, vol. 10, pp. 7881–7894, 2022. Assistant Professor with Arab Open University,
[60] A. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn, Saudi Arabia. She has published 12 research arti-
G. Stringhini, A. Vakali, M. Sirivianos, and N. Kourtellis, ‘‘Large scale cles in various reputed journals on her credit. She
crowdsourcing and characterization of twitter abusive behavior,’’ in Proc. has presented nine research papers in prestigious
Int. AAAI Conf. Web Social Media, 2018, vol. 12, no. 1, pp. 491–500. conferences in Pakistan, Dubai, Japan, Malaysia,
[61] C. Baydogan and B. Alatas, ‘‘Metaheuristic ant lion and moth flame and America. She also published one book chapter. Her research interests
optimization-based novel approach for automatic detection of hate speech include networking, information security, cyber security, machine learning,
in online social networks,’’ IEEE Access, vol. 9, pp. 110047–110062, 2021. data mining, distributed computing, and semantic web. She was a member of
[62] Kaggle. Detecting Insults in Social Commentary. Accessed: Nov. 20, 2023. IEEE WIE, IEEE IAS, IEEE Computer Society, and IEEE Communication
[Online]. Available: https://www.kaggle.com/competitions/detecting- Society. She was with the IEEE Academic Pakistan Initiative, as a Speaker
insults-in-social-commentary/data and a Coordinator.
[63] M. Gaikwad, S. Ahirrao, K. Kotecha, and A. Abraham, ‘‘Multi-ideology
multi-class extremism classification using deep learning techniques,’’
IEEE Access, vol. 10, pp. 104829–104843, 2022.
[64] K. A. Qureshi and M. Sabih, ‘‘Un-compromised credibility: Social media
based multi-class hate speech classification for text,’’ IEEE Access, vol. 9,
pp. 109465–109477, 2021.
[65] Z. Waseem, ‘‘Are you a racist or Am I seeing things? Annotator influence DANIAH AL-MADANI received the master’s degree in computer science
on hate speech detection on Twitter,’’ in Proc. 1st Workshop NLP Comput. from Ryerson University, Toronto, Canada. Currently, she is with Arab Open
Social Sci., Nov. 2016, pp. 138–142. University, Jeddah, Saudi Arabia, as a Lecturer. She has several publications
[66] M. ElSherief, S. Nilizadeh, D. Nguyen, G. Vigna, and E. Belding, ‘‘Peer in reputed journals and conferences. Her research interests include the IoT,
to peer hate: Hate speech instigators and their targets,’’ in Proc. Int. AAAI data science, and data mining.
Conf. Web Social Media, 2018, vol. 12, no. 1, pp. 52–61.
[67] P. Mathur, R. Sawhney, M. Ayyar, and R. Shah, ‘‘Did you offend me?
Classification of offensive tweets in Hinglish language,’’ in Proc. 2nd
Workshop Abusive Lang. (ALW2), 2018, pp. 138–148.
[68] S. D. Swamy, A. Jamatia, and B. Gambäck, ‘‘Studying generalisability
across abusive language detection datasets,’’ in Proc. 23rd Conf. Comput.
Natural Lang. Learn. (CoNLL), 2019, pp. 940–950. SHAHZAD MUMTAZ received the master’s
[69] M. Karan and J. Šnajder, ‘‘Cross-domain detection of abusive language degree in computer science from The Islamia
online,’’ in Proc. 2nd Workshop Abusive Lang. Online (ALW2), 2018, University of Bahawalpur, Pakistan, in 2005,
pp. 132–137. and the Ph.D. degree in computer science from
Aston University, U.K., in 2015. He was the
Assistant Director (computer) with the National
Highway Authority, Pakistan, from December
WAQAS SHARIF received the master’s degree in 2005 to October 2007. Recently, he has worked
computer science from The Islamia University of in other areas, such as natural language process-
Bahawalpur, Punjab, Pakistan, in 2018, where he ing/analytics and high-performance computing.
is currently pursuing the Ph.D. degree in computer His research projects including Probabilistic Modeling of Blood Glucose
science. He is also a Lecturer with the Department Through Eye Parameters, An Analysis of the Protein Family of Major Histo-
of Computer Science, The Islamia University of compatibility Complex, Predictive Modeling of Accidents and Emergency
Bahawalpur. He has over six years of professional Arrivals and Admissions, Patient-Specific Recommendation Systems for
experience, with four years dedicated to teaching, HIP Joint Patients, and Predictive Modeling of Extreme Content from Twitter
alongside two years in software development. in the Context of Afghanistan. His research interests include the areas of
His research interests include natural language machine learning and data mining and their application to health informatics
processing, machine learning, and bioinformatics. His expertise extends to domains.
serving as a Reviewer for various journals, including IEEE ACCESS and IJIST.

27236 VOLUME 12, 2024

You might also like