0% found this document useful (0 votes)

84 views15 pages

A Deep Learning-Based Framework For Offensive Text Detection in Unstructured Data For Heterogeneous Social Media

This document presents a deep learning framework for detecting offensive text in memes from social media. It introduces a new dataset called KAU-Memes containing 2582 labeled memes for model training and evaluation. Three deep learning models (YOLOv4, YOLOv5, SSD MobileNetV2) are compared on this dataset. Experimental results show that YOLOv5 achieved the best performance with 88.5% mAP, 88.8% F1-score, 90.2% precision, and 87.5% recall for offensive text detection in memes. This framework provides an effective approach for automatically filtering harmful memes from social media platforms.

Uploaded by

Dr. Farman ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views15 pages

A Deep Learning-Based Framework For Offensive Text Detection in Unstructured Data For Heterogeneous Social Media

Uploaded by

Dr. Farman ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Received 28 September 2023, accepted 24 October 2023, date of publication 6 November 2023,

date of current version 9 November 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3330081

A Deep Learning-Based Framework for Offensive

Text Detection in Unstructured Data for
Heterogeneous Social Media
JAMSHID BACHA1,2 , FARMAN ULLAH3 , JEBRAN KHAN 4, ABDUL WASAY SARDAR2,5 ,
AND SUNGCHANG LEE 2,6 , (Member, IEEE)
1 School of Electrical Engineering and Computer Science, Technische Universität Berlin, 10623 Berlin, Germany
2 School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, South Korea
3 College of Information Technology, United Arab Emirates University (UAEU), Abu Dhabi, United Arab Emirates
4 Department of Artificial Intelligence, Ajou University, Suwon, Gyeonggi 16499, South Korea
5 Natural Computing Research and Applications Group, Smurfit School of Business, University College of Dublin, Dublin 4, D04 V1W8 Ireland
6 Thingswell Inc., Dongan-gu, Anyang-si, Gyeonggi-do 14056, South Korea

Corresponding authors: Sungchang Lee ([email protected]) and Farman Ullah ([email protected])

This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government through the
Ministry of Science and ICT under Grant NRF-2022R1F1A1074652.

ABSTRACT Social media such as Facebook, Instagram, and Twitter are powerful and essential platforms
where people express and share their ideas, knowledge, talents, and abilities with others. Users on social
media also share harmful content, such as targeting gender, religion, race, and trolling. These posts may be
in the form of tweets, videos, images, and memes. A meme is one of the mediums on social media which
has an image and embedded text in it. These memes convey various views, including fun or offensiveness,
that may be a personal attack, hate speech, or racial abuse. Such posts need to be filtered out immediately
from social media. This paper presents a framework that detects offensive text in memes and prevents such
nuisance from being posted on social media, using the collected KAU-Memes dataset 2582. The latter
combines the ‘‘2016 U.S. Election’’ dataset with the newly generated memes from a series of offensive and
non-offensive tweets datasets. In fact, this model uses the KAU-Memes dataset containing symbolic images
and the corresponding text to validate the proposed model. We compare the performance of three proposed
deep-learning algorithms to train and detect offensive text in memes. To the best of the authors knowledge
and literature review, this is the first approach based on You Only Look Once (YOLO) for offensive text
detection in memes. This framework uses YOLOv4, YOLOv5, and SSD MobileNetV2 to compare the
model’s performance on the newly labeled KAU-Memes dataset. The results show that the proposed model
achieved 81.74%, 84.1%, mAP, and F1-score, respectively, for SSD-MobileNet V2, and 85.20%, 84.0%,
mAP, and F1-score, respectively for YOLOv4. YOLOv5 had the best performance and achieved the highest
possible mAP, F1-score, precision, and recall which are 88.50%, 88.8%, 90.2%, and 87.5%, respectively, for
YOLOv5.

INDEX TERMS Cyberbully, unstructured data, deep learning, YOLO, social media, offensive,
MobileNet-SSD, image processing.

I. INTRODUCTION offensively. Memes on social media can be in any form,

The meme spreads via different social media platforms posted via images, videos, or tweets, likely to have a
and shows some fun or targets something humorously or significant impact on communication [1], [2]. However, the
main form of memes on social media are images that include
The associate editor coordinating the review of this manuscript and some text and a background image having multi-model nature
approving it for publication was Maria Chiara Caschera . and causing confusion in understanding the contents of the
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
124484 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

image [3]. On social media, hate speech is one of the common cyberbullying [13]. Figure 1 shows examples of offensive
content [4]. This is one of the important reasons to understand and non-offensive memes. Where Figure 1 (a) and (b) are the
the meaning and intention of memes and identify whether memes, there is no offensive text that makes the meme to be
they can be offensive or non-offensive. Memes can spread offensive. While on the other hand, Figure 1 (c), (d), and (e)
hatred in society via social media: a legitimate concern are images where some offensive text is used and makes
justifying the need to filter such content automatically and the memes offensive. There are a lot of images that include
immediately. offensive text in images. The text associated with images
A meme can be a racial, religious, personal attack, can make clear that the meme is offensive or non-offensive.
or maybe an attack on the community. personal attack, That’s why this framework focuses on the text and detecting
or maybe an attack on the community. The literature revealed offensive content in unstructured data.
several interesting works on memes: emotion analysis in [5], Therefore to address such problems and overcome the error
sarcastic meme detection in [6], and hateful meme detection rate, this proposed approach is based on YOLO to detect the
in [7]. Where they discussed the multi-model nature of offensive text inside the memes on social media. Accordingly,
memes which makes them very difficult to understand and this paper proposed a new dataset with the addition of an
classify. This is also difficult for a machine learning model to existing dataset on the 2016 U.S. Election and the offensive
classify whether a meme can be offensive or non-offensive. and non-offensive tweets dataset from [14].
The reason is that memes depend on the context and focus on The contributions to this paper are as follows:
the image and text. Without relevant knowledge of the context 1) A new framework based on the computer vision model
in which the meme was created, it is rather risky to speculate is presented in this study for the detection of offensive
on whether the meme is offensive or not. Similarly, it is hard content in unstructured data.
for a standard OCR to extract and detach texts from the meme, 2) This paper studies text detection in unstructured data
because memes can be noisy. Another critical factor is that and formulated two kinds of text detection, i.e.,
since the text in the meme is overlaid on top of the image, offensive and non-offensive.
the text needs to be extracted using OCR, which can result in 3) Generated a new KAU-Memes dataset consists
errors that require additional manual post-editing [8]. of 2582 memes and is labeled for YOLO and
The deeper meaning of memes can be funny for one; but SSD-MobileNet algorithms individually.
can be offensive for another. These memes are usually spread 4) This paper presented a performance comparison of
on social media such as Facebook, Instagram, Twitter, and YOLOv4, YOLOv5, and SSD MobileNet-V2 algo-
Pinterest. However, some people use it to target a person, rithms based on training, detection time, mAP,
a specific religion, or an entire community. These memes F1-score, precision vs recall curve, and confusion
can elicit depressive behaviors and should be filtered out matrix.
from social media. Even some political campaign managers 5) Extensive experiments on 2016.US.Election and the
have already turned to memes on social media in their KAU-Memes dataset prove that algorithms perfor-
quest to directly or indirectly influence election results: mance improves with a high number of memes.
because people can see those memes and accept the idea they The paper is organized as follows: Related work to
promote. Many researchers are trying to solve this problem offensive and hateful memes is discussed in Section II.
by identifying offensive memes, but millions of memes The proposed model is described in Section III. Results
on social media are hard to remove manually. According and discussion of the model are explained in Section IV.
to,1 an average of 95 million images are uploaded daily. Section V discussed the conclusion of the proposed model
On Twitter, for instance, there is nearly 40% post that has and future work plan.
visual contents.2 Also, the tweets with images can get 150%
higher retweets than the tweet which don’t have images.3 II. LITERATURE REVIEW
There are multiple approaches for memes classification, There are different approaches used for offensive, cyber-
like OCR technique [9] extracts the text from the images. bullying, toxic comments, and hateful speech classification
However, using the OCR text extraction can extract all the and detection. Bad behavior became a big issue on social
text from the images, like watermarks, implicit and explicit media platforms [15]. On social media, there are different
entities which can lead us to the incorrect classification of rumors [16], hateful content [17], and cyberbullying [18]
the memes. The meme’s typo graphic text extraction using contents people share. However, memes perform a big role
optical character recognizer OCR is explained in [6] for in such kind of situations on social media. Some approaches
sarcasm detection in memes. have been proposed to overcome these problems of hate
Offensive memes can be dangerous and insult people speech and offensive content. Such as the troll memes
[10]. A meme can be aggressive [11], troll [12], and classification has been developed based on pre-trained
models i.e. EffNet, VGG16, and Resnet [19]. Two models
1 https://www.wired.co.uk/article/instagram-doubles-to-half-billion-users are proposed by [20] among them one works as a text
2 https://unionmetrics.com/blog/2017/11/include-image-video-tweets/ features extraction and the second is on image-based features
3 https://blog.hubspot.com/marketing/visual-content-marketing-strategy extraction while sending the memes to the transformer

VOLUME 11, 2023 124485

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

FIGURE 1. Offensive memes examples on social media.

TABLE 1. Brief summary of offensive memes classification models and performance results. Where A = Accuracy, F = F1-Score, WF = Weighted F1-score,
and R = Recall.

model, however, VGG16 has been used for feature extraction hateful memes, [24] developed an ensemble learning
from memes. A framework by [21] is based on deep approach by including classification results from multiple
learning to automatically detect the harmful speech in memes classifiers.
based on the fusion of visual and linguistic contents of DisMultiHate model is proposed in [25] for the classifi-
the memes. To simultaneously classify memes into five cation of multimodal hateful content. For the improvement
different categories like offensiveness, sarcasm, sentiment, of hateful content classification and explainability, they
motivational, and humor, a multi-task framework via BERT target the entities in memes. A combination of a Feature
and ResNet is proposed by [22]. Concatenation Model (FCM), a Textual Kernels Model
A model based on the visual-linguistic transformer (TKM), and a Spatial Concatenation Model (SCM) can be
is integrated with the pre-trained visual and linguis- used to boost the multimodel memes classification [26].
tic features to detect the abusiveness in memes is A framework named deep learning-based Analogy-aware
explained in [23]. To enhance the performance of Offensive Meme Detection (AOMD) by [27] is proposed

124486 VOLUME 11, 2023

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

FIGURE 2. Proposed framework for offensive and non-offensive text detection in Memes.

which learns the implicit analogy from the memes to detect existing datasets named Kaggle, McIntire, Reuters, and
offensive analogy memes. KnowMeme model, which is BuzzFeed;
based on a knowledge-enriched graph neural network that A dataset of images with their comments is collected from
uses the information facts from human commonsense can Instagram and labeled with the help of Crawdflower workers.
accurately detect offensive memes [28]. Reference [29] Where the criteria for labeling were to i) does the example
proposed that convolutional neural networks (CNN), VGG16, create cyber aggression which means the image intentionally
and bidirectional long short-term memory (BiLSTM) can harms someone, or ii) does it create cyberbullying which
be used for the offensive and non-offensive classification in mean is there any aggressiveness that contains against a
multimodal memes. Reference [30] proposed a joint model to person who can not defend herself or himself [35]. This
classify undesired memes based on counteractive unimodal dataset is also used by [36] for the detection of cyberbullying
features and multimodal features. For making the constituent detection. Another dataset from [37] is collected from
module of the framework they employed multilingual-BERT, Instagram posts and their comments which consist of
multilingual-DistilBERT, XLM-R for textual and VGG19, 3000 examples. They asked two questions i) do the comments
VGG16, and ResNet50 for visual. A textual, visual, and contain any bullying ii) If yes, is the bullying due to the
info-graphic cyberbully is detected based on a deep neural contents of the image to label the dataset?
architecture which includes Capsule network deep neural Some state-of-the-art papers’ summary has explained in
network with dynamic routing for textual bullying content Table 1. This shows us how each of the models performs
detection and CNN for visual bullying content prediction and toward offensive memes classification.
discretizing the info-graphic content by separating image and
text from each other by Google Lens [31]. A deep learning- III. PROPOSED FRAMEWORK FOR OFFENSIVE MEMES
based framework for a bully or non-bully identification based FILTERING ON SOCIAL MEDIA
on residual BiLSTM and RCNN architecture is discussed The proposed model for offensive and non-offensive text
in [32]. Reference [33] explained that hate speech detection detection in memes is depicted in Figure 2. The goal is to train
can be improved by augmenting text with image-embedding the model with the training dataset and then test the model
information. with the test dataset to check the performance comparison
A new approach by [34] named WELFake is suggested. of the YOLOv4, YOLOv5, and SSD MobileNet-V2 models.
They used 20 linguistic features and then combined these This platform can be used as a plug-in for heterogeneous
features with word embeddings and implemented voting social media to filter out offensive memes. As millions of
classification. This model is based on a count vectorizer memes on social media can not be filtered out manually. This
and Tf-idf word embedding and uses a machine learning approach can help us to overcome the spread of offensive
classifier. For unbiased dataset creation, they merged four memes that are already posted and will be posted on social

VOLUME 11, 2023 124487

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

media. After the data preprocessing, YOLOv4, YOLOv5, and Algorithm 1 Algorithm for Detecting Offensive Text in
SSD MobileNet-V2 models are used to detect the offensive Image
and non-offensive text in memes. The model trains over the 1: Images ← ImagesInDatabase
dataset and generates weights and checkpoints. When the 2: for image in Images do
YOLO model is trained over the labeled image dataset it Ofensive ← Checkpoints(image)
generates weights files. These files are usually named yolov- 3: if Ofensive == "offensive" then
final.weights with the extension of weights. The weights file 4: Delete image
can be used as a plug-in for any social media in the future. 5: else
Plug-ins also known as extensions, add-ons, or computer 6: Keep image
software can be added to a host program to add new functions 7: end if
without making any changes in the host program. In our 8: end for=0
case, it can be added to Facebook, Twitter, Instagram, etc.
It enables programmers to update a main program while
keeping the user within the program’s environment. So, the automatic detection from X-ray images is explained in [45].
model will discard memes to upload on social media when YOLO is used for electrical component recognition in real-
there is offensive content. time [46]. Pedestrian detection in real time but at night is
Let’s consider an image that contains blood, a gun, explained in [47]. By using these models the images having
private parts of the body, or something else in the image. offensive text can be detected immediately and accurately
If someone uploads such kinds of images, the Facebook before it goes viral. Even this trained AI model can be
algorithm discards that image or just shows us that ‘‘this installed in a camera and the camera can fit in a two-tire
photo may show violent or graphic content’’ as everyone has vehicle. There are many offensive texts in the streets as can
experienced this while using social media. Let’s consider an be seen on this website [48]. This can be used as a smart city
image that contains offensive text. What if someone uploads and when the camera detects offensive text on the street walls
any of the images from Figure 1 1 as we can see in these there should be some action to clean that wall from those
images there is offensive text targeting politicians. No one offensive words.
has experienced that Facebook or any other social media can
do the same for those images or videos that contain offensive A. DATA GENERATION
text. These models have trained with the training labeled This section explains the KAU-Memes dataset which con-
(bounding boxes) images KAU-Memes dataset. When the tains the images having text embedded in these images.
training process is finished YOLO or SSD models generate The text in the images is offensive and non-offensive
a final weight or a checkpoint. Consider these weights or memes. Before the data generation, the algorithms were
checkpoints as a trained AI model. Now, let’s assume an tested in 2016.U.S.Election 738 memes dataset got higher
image with some text and when we pass it from the trained performance, but the dataset consisted of fewer memes, so the
AI model (weight or checkpoint). The trained AI model gives model performance was poor. To improve the performance,
us a bounding box on the text inside the image and decides this approach generates memes by a third-party website.4
whether the text is offensive or not. As there are thousands However, for meme generation, there is a need for a text
of images in any social media database so this can be used dataset that can be embedded in images. So, this approach
by just executing a for loop over that database and passing used the offensive tweets dataset from [14] embedded it on
images one by one from this trained AI model (weight or famous images, and generated a new KAU-Memes dataset.
checkpoints) and the trained AI model makes a decision in the This dataset consists of 24802 labeled tweets; however, only
image as a bounding box and if the bounding box labeled is a few of them were used to generate 2850 memes. While
offensive then delete that image and if the bounding box is not generating the memes, the text on images was embedded in
offensive then keep the image in the database. Social media different colors, fonts, and angles. The model can filter every
and their databases are filled with such kinds of images. So, kind of offensive meme on social media.
this AI-trained model helps us to delete those images that
have offensive text from the database of any social media, B. TEXT VARIATION IN MEMES
rather than checking images one by one manually because There are hundreds of text fonts, colors, and orientations in
there are millions of images uploaded This is explained in memes on social media. Memes can have any form of text
Algorithm 1 1. This plugin can be installed on any social and background image. This section explains different types
media and every future image should be passed over it if there of variations of text in memes. Figure 3 (a) shows the most
is offensive text discard it and do not allow it to upload on challenging and common variation of text which is found
social media. in the dataset. While generating the data, text in different
YOLOv4, YOLOv5, and SSD MobileNet are famous for orientations was embedded over images to make the model
their robustness, accuracy, and real-time object detection. more robust and accurate. This model also tried to take care
Here these models are used for the first time to detect
offensive text inside the images. YOLO for COVID-19 4 https://imgflip.com/memegenerator

124488 VOLUME 11, 2023

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

of the image background clutter because every time, there can To remove such kind of duplicate images from the dataset,
be a different image in the memes which is shown in Figure 3 the duplication tool5 [49] is used. This is upto date repository
(b). The text position in the image can be seen in Figure 3 based on CNN, Perceptual hashing (PHash), Difference
(c). Sometimes the text can be in the center of the meme, hashing (DHash), Wavelet hashing (WHash), and Average
below the image, or maybe to the left or right side of the hashing (AHash). Also, the memes were removed manually
image. The size of the text also varies in memes. So KAU- from the dataset, which only consisted of text, and there was
Memes dataset also exists in such kind of variation in the text no image in the background.
as shown in Figure 3 (d). Last but not least, the dataset also
consists of different formats and colors of text and also blur D. DATA ANNOTATION AND LABELING
text, which can be seen in Figure 3 (e), (f) respectively. There For the annotation procedure, the dataset in [4], and for new
are yellow, black, white, etc., color formats for offensive and memes which were generated, they are labeled according to
non-offensive text. the tweets dataset of [14]. For the manual data annotation, the
labeling tool Roboflow6 is used. The bounding boxes around
the text in the memes are made in a manner allowing users
to decide whether that text is offensive or non-offensive. This
bounding box helps the model because it localizes the area
for the YOLO and SSD MobileNet. Roboflow tool generates
a text file for each meme with the same file name as the image.
Roboflow generates the coordinate in the form of (x1,y1) and
(x2, y2) with the label 0 if offensive and 1 if non-offensive,
in a text file.

E. THE PRINCIPLE OF YOU ONLY LOOK ONCE (YOLO)

MODEL
You Only Look Once (YOLO) is usually used for object
detection. It detects the object in the image as a regression
problem. Unlike other models, YOLO doesn’t do the sliding
window, and YOLO looks at the entire image when it
is training and testing and implicitly encodes the class
information. Many deep learning algorithms are available;
however, they cannot detect the object in a single run. YOLO
also makes the detection in a single forward propagation
through a neural network which makes it suitable for real-
time applications. YOLO outperforms the top detection
models like DPM and R-CNN [50].
The YOLO detector analyzes the image at once, so the
detection obtained by YOLO is based on all the information
in the image. Using the input image features, the algorithm
splits an image into an SxS grid. The rectangle box is then
produced using the confidence score of the detected object
extracted from each grid in the introduced image, as shown in
Figure 4. Each cell predicts the bounding box and confidence
score. The bounding box contains five prediction parameters,
which are determined by (x, y, w, h) and the confidence value,
where (x, y) coordinates represent the center of the bounding
box, and (h, w) reflects the height and the width of the entire
FIGURE 3. Common variations KAU-Memes data showing different image. The confidence scores represent the measurement of
background clutters, scales, colors, and orientations.
how confident the architecture is that the box contains the
object (text) to be predicted.
C. PREPROCESSING AND CLEANING OF THE DATASET
1) YOU ONLY LOOK ONCE VERSION 4 (YOLOV4)
This section highlights the importance of data preprocessing YOLOv4 architecture has some improvements to the older
and cleaning. Usually, in a huge dataset, it is nearly common versions. The backbone for YOLOv4 is CSPDarkent53.
that the data may have repetition. There are two or more
than two time repetitive images in the 2016.US.Election 5 https://github.com/idealo/imagededup
dataset. As the data repetition can make the model overfit. 6 https://roboflow.com/

VOLUME 11, 2023 124489

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

FIGURE 4. A generalize illustration of YOLO pipeline for offensive and non-offensive text detection in memes.

FIGURE 5. YOLOv4 architecture.

Because of this new network, the model can keep the accuracy 10% and 12%, respectively. They made many changes in
and reduce the computation. Also, the path aggregation the architecture of YOLO models, but the major changes
network (PANet) is used in YOLOv4, which can help the are the adjustment of network structure and an increasing
model to boost the information flow networks [51]. number of applied tricks. YOLOv4 changed the backbone
to CSPDarknet53 from the old Darknet53. Some data
augmentation techniques were also adopted, i.e., Cutout, Grid
2) YOU ONLY LOOK ONCE VERSION 5 (YOLOV5) Mask, Random Erase, Hide and Seek, Class label smoothing,
On the other side, the YOLOv5 is compiled by PyTorch. MixUp, Self-Adversarial Training, Cutmix, and Mosaic data
Due to the application features of PyTorch, the model has augmentation.
high productivity and flexibility. YOLOv5 uses the same After a few months, another company named Ultralytics
CSPDarknet and PANet, as can be seen in Table 2. For the released a new version of YOLO named YOLOv5. Instead
activation function, YOLOv5 uses a sigmoid function rather of publishing research and comparison with other models of
than the Mish function for YOLOv4 [52]. YOLO, the company just released YOLOv5’s source code
YOLO algorithms are robust in real-time object detection on GitHub [54]. However, the main changes in architecture
and represented by Redmon in 2016 [50]. The 4th version between YOLOv4 and v5 and the advancement in YOLOv5
of YOLO was released in 2020 [53], compare to the old are presented in Figure 5 and 6, respectively. In YOLOv5
version of YOLO, the mAP and FPS were improved to leaky Relu is adopted as an activation function (CBL module)

124490 VOLUME 11, 2023

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

FIGURE 6. YOLOv5 architecture.

in the hidden layers, while in YOLOv4, there are two modules friendly than Darknet.
with leaky Relu and mish activation functions (CBL and |C−B ∪ Bgt |
CBM). Secondly, in the backbone, YOLOv5 adopted a new LGIoU = 1−IoU + (1)
|C|
module at the beginning that is named Focus. Focus makes
four slices of an input image and concatenates all of them where Bgt represents the ground truth box, B is the predicted
for convolution operation. For example, an image of 608 × box, C is equal to the smallest box which covers B and Bgt ,
608 × 3 is divided into four small images with 304 × and IoU = (B∩Bgt )/B∪Bgt is the intersection over the union.
304 × 3, concatenated into a 304 × 304 × 12 image. ρ 2 (p, pgt )
Third, for the backbone and neck, YOLOv5 designed two LCIoU = 1−IoU + (2)
c2
CSPNet modules. To maintain processing accuracy and
reduce computation power, CSPNet combines feature maps where pgt and p are the central points of boxes B and Bgt ,
from the start and at the end of a network stage [55]. is represented Euclidean distance, c is the diagonal length of
Compared to the standard convolution module in YOLOv4, the smallest box C, V and are the consistency of the aspect
YOLOv5 adopted the CSPNet module, i.e., CSP2_x in the ratio.
neck, to strengthen the network feature fusion. Besides TABLE 2. Architecture comparison of YOLOv4 and YOLOv5.
the structure adjustment, YOLOv5 adopted an algorithm to
automatically learn bounding box anchors in the input stage,
which could help calculate the anchor box size for other
image sizes and improve the detection quality. Except this,
YOLOv5 uses Generalize Intersection Over Union (GIoU)
as a loss as shown in Equation 1 [56] for the regression
loss function in the bounding box instead of Complete
Intersection Over Union (CIoU) loss in YOLOv4 as shown
in Equation 2. GIoU can solve the imperfect calculation
of non-overlapping bounding boxes that remain in the
previous Intersection Over Union (IoU) loss function. CIoU
incorporates all three geometric factors: including distance,
aspect ratio, and overlapping area. To better determine F. SINGLE SHOT DETECTOR (SSD)
difficult regression cases, CIoU enhances the accuracy and In the field of computer vision, the models become more
speed. YOLOv5 is constructed under a new environment complex and deeper for more accurate results and perfor-
at PyTorch [56], which makes the training procedure more mance. However, the advancement makes the model latency

VOLUME 11, 2023 124491

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

and size bigger, which cannot be used in a system that has the mean average precision is the mean of average precision
computational challenges. SSD-MobileNet can help in such (AP) as shown in Equation 4. Where n is the number of
kinds of challenges. This model is basically designed for classes while the AP is the average precision for that given
those situations that require high speed. The MobileNetV2 class n. mAP returns a score after comparing the ground truth
provides an inverted residual structure for better modularity. bounding box with the detected box. After taking the mean
MobileNet eliminates the non-linearities in tight layers and of AP, we can get the mAP which can be used to calculate
results in higher performance for previous applications. The the accuracy of machine learning algorithms. F1-score [59]
MobileNet-SSD detector inherits the design of VGG16- measures a model’s accuracy over the dataset and can use
SSD, and the front-end MobileNet-v2 network provides six to evaluate binary classification problems. Equation 5 can be
feature maps with different dimensions for the back-end used for the F1-score calculation using precision and recall.
detection network to perform multi-scale object detection. The highest possible value for the F1-score is 1 and the lowest
Since the backbone network model is changed from VGG-16 is 0. The precision is the ratio of true prediction with the total
to MobileNet-v2, the MobileNet-SSD detector can achieve number of predictions, while the recall is the ratio of true
real-time performance and is faster than other existing object prediction to the total number of objects in the image [60],
detection networks. which are shown in Equation 6 and Equation 7 respectively.
1 Xn
G. EVALUATION MATRIX mAP = APk (4)
n k=1
Usually, a basic matrix intersection over union (IoU) is used precision ∗ recall
to evaluate the performance of object detection models, which F1Score = 2 ∗ (5)
precision + recall
can be seen in Figure 7. IoU is the overlap of the detection box TruePositive
(D) and the ground truth box (G), which can be calculated Precision = (6)
TruePositive + FalsePositive
by using Equation 3 [57]. When we obtain the IoU, then TruePositive
we use the confusion matrix, i.e., False Positive (FP), True Recall = (7)
TruePositive + FalseNegative
Positive (TP), False Negative (FN), and True Negative (TN)
for accuracy measurement. For TP, a specific class ground IV. RESULTS AND DISCUSSION
truth must be the class of detection, also the IoU must be A. EXPERIMENTAL SETUPS
greater than 50%. As TP is the correct detection of the class. All the models were trained on Colab Pro+, and the resources
In case the detection owns the same class as the ground truth, used for the training are shown in Table 3. To train the
and the IoU is less than 50%, then it is considere FP. Which models properly, YOLOv4, YOLOv5, and SSD MobileNet
means the detection is not corrected. If the model does not with different parameters were trained to achieve the highest
make detection and there is a ground truth, then it is considere possible mAP. The parameters for each of the models are
FN, which means that the instance is not detected. In many shown in Table 4.
cases, the background does not have any ground truth and also
no detection, so that is classified as TN. TABLE 3. Colab specification used for the training of YOLOv4, YOLOv5
and SSD MobileNet.
Intersection G∩P
IoU = = (3)
Union G∪P

TABLE 4. Training hyperparameters for each of the models to achieve the

highest possible performance.

FIGURE 7. Examples of intersection over union on text in a meme.

For the performance comparison of YOLOv4, YOLOv5, B. PERFORMANCE BASED ON EVALUATION MATRIX
and SSD MobileNet-V2 algorithms mAP, F1-Score, Preci- The ML models achieved the highest possible results for the
sion, and Recall can be used as criteria. Where mAP [58] is public online dataset in [4] consisting of 743 memes, and

124492 VOLUME 11, 2023

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

the models performed poorly because of the small number of

memes. The results for the best possible parameters for each
of the models are shown in Table 5.

TABLE 5. Performance of ML models on 743 memes.

After the models were appropriately trained, YOLOv5

achieved the lowest 17 MB weight, the highest mAP of
88.50%, and consumed 11.60 milliseconds on average for
FIGURE 8. Precision × recall plot on the validation set considering all the
152 offensive and non-offensive text detection in memes. predictions.
Similarly, for YOLOv4, the weight was 244 MB and the
text detection time was 42.68 milliseconds. SSD MobileNet
text detection time is lower than YOLOv4 and higher than good performance than YOLOv4 and SSD MobileNet, even
YOLOv5 with an IoU of 0.5. Also, YOLOv5 consumed for the 80% and 70% training of the dataset. Because of the
the lowest time while training. However, among all models, high mAP and F1-score, the offensive and non-offensive text
YOLOv5 performance was found to be the best based on detection in memes by YOLOv5 are more accurate compared
training and detection time and smallest weight/checkpoint to YOLOv4 and SSD-MobileNet.
size as shown in Table 6. In the case of YOLOv4 and
MobileNet, the training time consumed by YOLOv4 was C. PRECISION VS RECALL
recorded less than MobileNet but the size of checkpoints of Using all the predictions for the detection of offensive text in
YOLOv4 was higher. Also, the detection time on GPU by memes, a curve for precision vs. recall is plotted as shown in
MobileNet was 31.28 milliseconds which is less than the Figure 8. The curve established the settlement between the
YOLOv4 detection time. precision and recall rate. Higher confidence means higher
precision in their predictions but a lower recall. YOLOv4
TABLE 6. Weight and checkpoint size of YOLOv4, YOLOv5 and and YOLOv5 had nearly 90% recall rates. The best model
SSD-MobileNet, training and detection time.
is the one whose Area Under the Curve (AUC) is the highest.
Therefore, it can be seen that YOLOv5 AUC is the highest
compared to YOLOv4 and SSD MobileNet.

D. DETECTION RESULTS
Some offensive and non-offensive text detection were per-
formed to find the model’s performance. In Figure 9 (a), the
YOLOv5 predicts the offensive text with a high confidence
of 0.96, and also, for non-offensive, the confidence value is
almost 0.93. Similarly, Figure 9 (b) shows the prediction of
offensive text and non-offensive using the YOLOv4 model
Using the KAU-Memes dataset, this approach performs with the confidence of 0.86 and 0.80, respectively.
three different experiments. In the first experiment, the data is Other than YOLOv4 and v5, the performance of SSD
split into 90% training and 10% validation sets, in the second MobileNet-V2 is also good for the detection of offensive
experiment 80% and 20%, and in the third experiment 70% memes. The SSD-MobileNet V2 detects the offensive text
and 30% training and validation. with a confidence of 0.83 for the offensive meme and
The results of both models can be seen in Table 7 using the 0.78 confidence for non-offensive, which can be seen in
KAU-Memes dataset for offensive text detection in memes. Figure 9 (c).
It is clear from the table that YOLOv5 shows the highest
mAP of 91.40%, the precision of 86.2%, recall of 91.9%, and E. MODELS PERFORMANCE LOSS
F1-score of 88.4% than YOLOv4 while splitting the dataset To explore the performance of the algorithm in more detail,
in 90% training and 10% validation. Also, the YOLOv5 has it is necessary to find their incorrect detection. Which

VOLUME 11, 2023 124493

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

TABLE 7. Results of YOLOv4, YOLOv5, and SSD MobileNet V2 algorithms for offensive and non-offensive memes text detection with a train-validation
split of 90%-10%, 80%-20%, and 70%-30%.

FIGURE 9. Offensive text detection by YOLOv5, YOLOv4, and SSD-MobileNeta.

124494 VOLUME 11, 2023

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

FIGURE 10. Offensive small text detection by YOLOv5, YOLOv4, and SSD-MobileNet.

FIGURE 11. Confusion matrix from YOLOv5, YOLOv4, and SSD-MobileNet.

can help in future research for improvement. YOLOv5s is matrix for YOLOv5s is shown in Figure 11 (a) where the
the best detection among other models. However, when it offensive text is detected 251 times correctly, but the model
comes to detecting the offensive text in a meme that has confused 35 times with non-offensive text. Similarly, the
small size text, the performance goes down for each of the non-offensive text is detected 202 times correctly, but it is
models. In Figure 10, all the models detect the offensive text confused with offensive text around 25 times. In the YOLOv5
with a different confidence score. Among all the models, confusion matrix, the False Positive (FP) is divided into two
YOLOv5s still performs better for small text detection than parts based on the value of IOU. If IOU = 0, the false
YOLOv4 and SSD-MobileNet. YOLOv5 detects small text positive prediction is far from the ground truth. Also, if IOU is
with a confidence score of 0.88, YOLOv4 achieves 0.81, between 0 and 0.5 then the overlap between the ground truth
and SSD MobileNet achieves a 0.75 confidence score. The and prediction is not enough to decide it as a true positive. For
performance can be improved by adding more images having YOLOv4, the offensive text is detected 221 times and non-
offensive text in small sizes. offensive text 210 times correctly, but 48 times the offensive
text is confused with non-offensive text, and 42 times the
F. ANALYSIS BASED ON CONFUSION MATRIX non-offensive text is confused with offensive text detection
A confusion matrix can be used for the performance of differ- as shown in Figure 11 (b). Similarly, for SSD-MobileNet
ent models. The confusion matrix also provides information V2, the model detected offensive text 213 times but confused
on the type and source of errors. Where the elements on 58 times with non-offensive and non-offensive text confused
the diagonal represent all the correct classes. The confusion with offensive text 56 times, as shown in Figure 11 (c).

VOLUME 11, 2023 124495

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

V. CONCLUSION belongs to some religion, so it will be stronger to detect the

In this approach, a new framework for better detection of target of the offensive meme.
offensive content in unstructured data for heterogeneous
social media is proposed. In terms of accuracy and speed, REFERENCES
the newly tested framework was systematically applied to [1] J. H. French, ‘‘Image-based memes as sentiment predictors,’’ in Proc. Int.
two versions of YOLO and SSD MobileNet with different Conf. Inf. Soc. (i-Soc.), Jul. 2017, pp. 80–85.
[2] M. R. Mirsaleh and M. R. Meybodi, ‘‘A Michigan memetic algorithm
parameters. For the SSD MobileNet using different param- for solving the community detection problem in complex network,’’
eters. Hence, it was observed: (1) for the SSD MobileNet Neurocomputing, vol. 214, pp. 535–545, Nov. 2016.
model, the increased number of training images size could [3] S. He, X. Zheng, J. Wang, Z. Chang, Y. Luo, and D. Zeng, ‘‘Meme
extraction and tracing in crisis events,’’ in Proc. IEEE Conf. Intell. Secur.
not contribute to better performance; (2)as shown in Table 5, Informat. (ISI), Sep. 2016, pp. 61–66.
a big gap in the mAP and F1 scores between SSD MobileNet [4] S. Suryawanshi, B. R. Chakravarthi, M. Arcan, and P. Buitelaar, ‘‘Multi-
and YOLO versions. modal meme dataset (MultiOFF) for identifying offensive content in image
and text,’’ in Proc. 2nd Workshop Trolling, Aggression Cyberbullying,
In addition, YOLOv5s achieved the highest mAP of 2020, pp. 32–41.
88.50%, a faster training time of 32 minutes for 80% and [5] A. Sengupta, S. K. Bhattacharjee, M. S. Akhtar, and T. Chakraborty,
20% of training and validation data, and a faster processing ‘‘Does aggression lead to hate? Detecting and reasoning offensive traits
speed for multiple meme detection of 11.6 milliseconds. The in hinglish code-mixed texts,’’ Neurocomputing, vol. 488, pp. 598–617,
Jun. 2022.
YOLOv5 model still had the best performance in comparison [6] A. Kumar and G. Garg, ‘‘Sarc-M: Sarcasm detection in typo-graphic
with YOLOv4 and MobileNet. memes,’’ in Proc. Int. Conf. Adv. Eng. Sci. Manag. Technol. (ICAESMT),
In this approach mAP, F1-score, precision, and recall were Dehradun, India: Uttaranchal Univ., 2019.
[7] Y. Zhou, Z. Chen, and H. Yang, ‘‘Multimodal learning for hateful
used to evaluate the feasibility of the proposed framework. memes detection,’’ in Proc. IEEE Int. Conf. Multimedia Expo. Workshops
As explained in the results section, trained the models on (ICMEW), Jul. 2021, pp. 1–6.
publicly available datasets to verify all the corresponding [8] R. N. Nandi, F. Alam, and P. Nakov, ‘‘Detecting the role of an entity
in harmful memes: Techniques and their limitations,’’ in Proc. Workshop
outputs for accuracy. In the final analysis, it was determined Combating Online Hostile Posts Regional Lang. During Emergency
that the results were not good enough to use the model for Situations, 2022, pp. 43–54.
future and unknown datasets. [9] N. Islam, Z. Islam, and N. Noor, ‘‘A survey on optical character recognition
system,’’ J. Inf. Commun. Technol., vol. 10, no. 2, pp. 1–4, Dec. 2016.
In parallel, a new KAU-Meme dataset was generated to [10] J. Drakett, B. Rickett, K. Day, and K. Milnes, ‘‘Old jokes, new media—
detect the desired two classes of offensive and non-offensive Online sexism and constructions of gender in Internet memes,’’ Feminism
text in unstructured data. This dataset contained 2582 high, Psychol., vol. 28, no. 1, pp. 109–127, Feb. 2018.
[11] S. T. Aroyehun and A. Gelbukh, ‘‘Aggression detection in social media:
average, and low-quality memes from the combination Using deep neural networks, data augmentation, and pseudo labeling,’’
of 2016 US Election memes and tweets datasets. These in Proc. 1st Workshop Trolling, Aggression Cyberbullying (TRAC), 2018,
selected images contain some of the most popular memes pp. 90–97.
[12] L. G. M. de la Vega and V. Ng, ‘‘Modeling trolling in social media
used on social media and are labeled based on strict criteria.
conversations,’’ in Proc. 7th Int. Conf. Lang. Resour. Eval. (LREC), 2018.
To encourage future novel research, the dataset is available [Online]. Available: https://aclanthology.org/L18-1585/
on GitHub as well as by email to the corresponding author [13] I. Arroyo-Fernndez, D. Forest, J.-M. Torres-Moreno, M. Carrasco-Ruiz,
and primary author of this paper. T. Legeleux, and K. Joannette, ‘‘Cyberbullying Detection Task: The
EBSI-LIA-UNAM System (ELU) at COLING’18 TRAC-1,’’ in Proc. 1st
Compared the performance of all models based on training Workshop Trolling, Aggression Cyberbullying (TRAC), 2018, pp. 140–149.
and detection times, evaluation matrix, detection of text in the [14] T. Davidson, D. Warmsley, M. Macy, and I. Weber, ‘‘Automated hate
memes, precision and recall curve, small text detection, and speech detection and the problem of offensive language,’’ in Proc. Int.
AAAI Conf. Web Social Media, vol. 11, no. 1, 2017, pp. 512–515.
confusion matrices. After evaluating all the results, YOLOv5 [15] D. Wang, B. K. Szymanski, T. Abdelzaher, H. Ji, and L. Kaplan, ‘‘The age
performance was the best based on training, detection time, of social sensing,’’ Computer, vol. 52, no. 1, pp. 36–45, Jan. 2019.
mAP, precision vs. recall curve, the detection confidence [16] D. Choi, S. Chun, H. Oh, J. Han, and T. Kwon, ‘‘Rumor propagation is
amplified by echo chambers in social media,’’ Sci. Rep., vol. 10, no. 1,
score of normal and small text in memes, and confusion pp. 1–10, Jan. 2020.
matrix. [17] M. H. Ribeiro, P. H. Calais, Y. A. Santos, V. A. F. Almeida, and W. Meira,
However, there are some limitations to this approach in that ‘‘‘Like sheep among wolves’: Characterizing hateful users on Twitter,’’ in
Proc. WSDM Workshop Misinformation Misbehavior Mining Web (MIS),
the model performed poorly when the meme contained small 2018, Paper e0203794.
text. Expanding the dataset with more small text memes can [18] C. Van Hee, G. Jacobs, C. Emmery, B. Desmet, E. Lefever, B. Verhoeven,
help to improve the performance. Secondly, the number of G. De Pauw, W. Daelemans, and V. Hoste, ‘‘Automatic detection of
cyberbullying in social media text,’’ PloS One, vol. 13, no. 10, 2018,
classes can improve because this approach is only limited Art. no. e0203794.
to two classes however, there are other classes, such as [19] D. Kiela, H. Firooz, A. Mohan, V. Goswami, A. Singh, C. A. Fitzpatrick,
harassment, propaganda, sexual aggression, violence, and and P. Bull, ‘‘The hateful memes challenge: competition report,’’ in Proc.
NeurIPS, 2021, pp. 344–360.
racism which spread via Facebook, Twitter, WhatsApp, and
[20] M. Das, S. Banerjee, and A. Mukherjee, ‘‘Hate-
Reddit. Thirdly, the model is limited to detecting offensive alert@DravidianLangTech-ACL2022: Ensembling multi-modalities
English text, and future, it can be improved for other for Tamil TrollMeme classification,’’ in Proc. 2nd Workshop Speech Lang.
languages. Fourthly, It is also possible to detect offensive text Technol. Dravidian Lang., 2022, pp. 51–57.
[21] B. O. Sabat, C. C. Ferrer, and X. Giro-I-Nieto, ‘‘Hate speech in pixels:
and also to detect the image inside the meme because it is Detection of offensive memes towards automatic moderation,’’ in Proc.
possible that the photo may be some famous personality or NeurIPS, 2019, pp. 281–290.

124496 VOLUME 11, 2023

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

[22] D. S. Chauhan, S. R. Dhanush, A. Ekbal, and P. Bhattacharyya, ‘‘All-in- [43] R. K. Giri, S. C. Gupta, and U. K. Gupta, ‘‘An approach to detect offence
one: A deep attentive multi-task learning framework for humour, sarcasm, in memes using natural language processing(NLP) and deep learning,’’ in
offensive, motivation, and sentiment on memes,’’ in Proc. 1st Conf. Asia– Proc. Int. Conf. Comput. Commun. Informat. (ICCCI), Jan. 2021, pp. 1–5.
Pacific chapter Assoc. Comput. Linguistics 10th Int. Joint Conf. Natural [44] R. Nayak, B. S. U. Kannantha, K. S, and C. Gururaj, ‘‘Multimodal
Lang. Process., 2020, pp. 281–290. offensive meme classification u sing transformers and BiLSTM,’’ Int. J.
[23] R. Zhu, ‘‘Enhance multimodal transformer with external label and in- Eng. Adv. Technol., vol. 11, no. 3, pp. 96–102, Feb. 2022.
domain pretrain: Hateful meme challenge winning solution,’’ 2020, [45] A. Karacı, ‘‘VGGCOV19-NET: Automatic detection of COVID-19 cases
arXiv:2012.08290. from X-ray images using modified VGG19 CNN architecture and YOLO
[24] D. Kiela, ‘‘The hateful memes challenge: Competition report,’’ in Proc. algorithm,’’ Neural Comput. Appl., vol. 34, no. 10, pp. 8253–8274,
NeurIPS, 2021, pp. 5138–5147. May 2022.
[25] R. K.-W. Lee, R. Cao, Z. Fan, J. Jiang, and W.-H. Chong, ‘‘Disentangling [46] H. Chen, Z. He, B. Shi, and T. Zhong, ‘‘Research on recognition method
hate in online memes,’’ in Proc. 29th ACM Int. Conf. Multimedia, of electrical components based on YOLO V3,’’ IEEE Access, vol. 7,
Oct. 2021, pp. 5138–5147. pp. 157818–157829, 2019.
[47] Y. Xue, Z. Ju, Y. Li, and W. Zhang, ‘‘MAF-YOLO: Multi-modal attention
[26] R. Gomez, J. Gibert, L. Gomez, and D. Karatzas, ‘‘Exploring hate speech
fusion based YOLO for pedestrian detection,’’ Infr. Phys. Technol.,
detection in multimodal publications,’’ in Proc. IEEE Winter Conf. Appl.
vol. 118, Nov. 2021, Art. no. 103906.
Comput. Vis. (WACV), Mar. 2020, pp. 1459–1467.
[48] Age Foto Stock. Accessed: Mar. 2, 2023. [Online]. Available:
[27] L. Shang, Y. Zhang, Y. Zha, Y. Chen, C. Youn, and D. Wang, ‘‘AOMD: An https://www.agefotostock.com/age/en/details-photo/offensive-graffiti-
analogy-aware approach to offensive meme detection on social media,’’ on-shop-shutter-in-rome-italy/Y5G-1951508
Inf. Process. Manage., vol. 58, no. 5, Sep. 2021, Art. no. 102664. [49] T. Jain, C. Lennan, Z. John, and D. Tran, ‘‘Imagededup,’’ 2019. [Online].
[28] L. Shang, C. Youn, Y. Zha, Y. Zhang, and D. Wang, ‘‘KnowMeme: A Available: https://github.com/idealo/imagededup
knowledge-enriched graph neural network solution to offensive meme [50] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
detection,’’ in Proc. IEEE 17th Int. Conf. eScience (eScience), Sep. 2021, Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
pp. 186–195. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
[29] S. Khedkar, P. Karsi, D. Ahuja, and A. Bahrani, ‘‘Hateful memes, offensive [51] S. Li, X. Gu, X. Xu, D. Xu, T. Zhang, Z. Liu, and Q. Dong, ‘‘Detection
or non-offensive,’’ in Proc. Int. Conf. Innov. Comput. Commun. Singapore: of concealed cracks from ground penetrating radar images based on deep
Springer, 2022, pp. 609–621. learning algorithm,’’ Construct. Building Mater., vol. 273, Mar. 2021,
[30] E. Hossain, O. Sharif, M. M. Hoque, M. A. A. Dewan, N. Siddique, Art. no. 121949.
and M. A. Hossain, ‘‘Identification of multilingual offense and troll from [52] D. Thuan, ‘‘Evolution of YOLO algorithm and YOLOv5: The state-of-
social media memes using weighted ensemble of multimodal features,’’ the-art object detention algorithm,’’ Tech. Rep., 2021. [Online]. Available:
J. King Saud Univ. Comput. Inf. Sci., vol. 34, no. 9, pp. 6605–6623, https://doi.org/10.48550/arXiv.2004.10934
Oct. 2022. [53] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal
[31] A. Kumar and N. Sachdeva, ‘‘Multimodal cyberbullying detection using speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
capsule network with dynamic routing and deep convolutional neural [54] Ultralytics. GitHub. Accessed: Nov. 22, 2022. [Online]. Available:
network,’’ Multimedia Syst., vol. 28, no. 6, pp. 2043–2052, Dec. 2022. https://github.com/ultralytics/yolov5
[32] S. Paul, S. Saha, and M. Hasanuzzaman, ‘‘Identification of cyberbullying: [55] C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and
A deep learning based multimodal approach,’’ Multimedia Tools Appl., I.-H. Yeh, ‘‘CSPNet: A new backbone that can enhance learning capability
vol. 81, no. 19, pp. 26989–27008, Aug. 2022. of CNN,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
[33] F. Yang, X. Peng, G. Ghosh, R. Shilon, H. Ma, E. Moore, and G. Workshops (CVPRW), Jun. 2020, pp. 1571–1580.
Predovic, ‘‘Exploring deep multimodal fusion of text and photo for hate [56] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, and T.
speech classification,’’ in Proc. 3rd Workshop Abusive Lang. Online, 2019, Killeen, ‘‘Pytorch: An imperative style, high-performance deep learning
pp. 11–18. library,’’ in Proc. Adv. Neural Inf. Process. Syst., 2019.
[34] P. K. Verma, P. Agrawal, I. Amorim, and R. Prodan, ‘‘WELFake: Word [57] R. Padilla, S. L. Netto, and E. A. B. da Silva, ‘‘A survey on performance
embedding over linguistic features for fake news detection,’’ IEEE Trans. metrics for object-detection algorithms,’’ in Proc. Int. Conf. Syst., Signals
Computat. Social Syst., vol. 8, no. 4, pp. 881–893, Aug. 2021. Image Process. (IWSSIP), Jul. 2020, pp. 237–242.
[35] H. Hosseinmardi, S. A. Mattson, R. I. Rafiq, R. Han, Q. Lv, and S. Mishra, [58] L. Liu and M. T. Zsu, Eds., Encyclopedia of Database Systems, vol. 6.
‘‘Detection of cyberbullying incidents on the Instagram social network,’’ New York, NY, USA: Springer, 2009.
2015, arXiv:1503.03909. [59] J. Davis and M. Goadrich, ‘‘The relationship between precision-recall
[36] V. K. Singh, S. Ghosh, and C. Jose, ‘‘Toward multimodal cyberbullying and ROC curves,’’ in Proc. 23rd Int. Conf. Mach. Learn. (ICML), 2006,
detection,’’ in Proc. CHI Conf. Extended Abstr. Hum. Factors Comput. pp. 233–240.
Syst., May 2017, pp. 2090–2099. [60] I. D. Melamed, R. Green, and J. P. Turian, ‘‘Precision and recall of
[37] H. Zhong, H. Li, A. C. Squicciarini, S. M. Rajtmajer, C. Griffin, machine translation,’’ in Proc. Conf. North Amer. Chapter Assoc. Comput.
D. J. Miller, and C. Caragea, ‘‘Content-driven detection of cyberbullying Linguistics Hum. Lang. Technol. Companion (HLT-NAACL) Short Papers
on the Instagram social network,’’ in Proc. IJCAI, vol. 16, 2016, (NAACL), 2003, pp. 61–63.
pp. 3952–3958.
[38] M. Balaji J and C. Hs, ‘‘TrollMeta@DravidianLangTech-EACL2021:
Meme classification using deep learning,’’ in Proc. 1st Workshop Speech
Lang. Technol. Dravidian Lang., 2021, pp. 277–280.
[39] K. Perifanos and D. Goutsos, ‘‘Multimodal hate speech detection in Greek
social media,’’ Multimodal Technol. Interact., vol. 5, no. 7, p. 34, Jun. 2021.
[40] K. Kumari, J. P. Singh, Y. K. Dwivedi, and N. P. Rana, ‘‘Multi-modal
aggression identification using convolutional neural network and binary
particle swarm optimization,’’ Future Gener. Comput. Syst., vol. 118, JAMSHID BACHA received the B.Sc. degree
pp. 187–197, May 2021. in computer systems engineering from UET
[41] E. Hossain, O. Sharif, and M. M. Hoque, ‘‘NLP- Peshawar, Pakistan, in 2020, and the mas-
CUET@DravidianLangTech-EACL 2021: Investigating Visual and ter’s degree in computer information systems
Textual Features to Identify Trolls from Multimodal Social Media and networks from Korea Aerospace University,
Memes,’’ in Proc. 1st Workshop Speech Lang. Technol. Dravidian Lang., South Korea. He is currently pursuing the Ph.D.
2021, pp. 300–306. degree with Technische Universität Berlin. His
[42] A. K. Mishra and S. Saumya, ‘‘Identifying troll meme in Tamil using current research interests include machine learn-
a hybrid deep learning approach,’’ in Proc. 1st Workshop Speech Lang. ing, deep learning, computer vision, and wireless
Technol. Dravidian Lang., 2021, pp. 243–248. communication.

VOLUME 11, 2023 124497

J. Bacha et al.: Deep Learning-Based Framework for Offensive Text Detection in Unstructured Data

FARMAN ULLAH received the M.S. degree in ABDUL WASAY SARDAR received the B.Sc.
computer engineering from CASE, Islamabad, degree in computer engineering from COMSATS
Pakistan, in 2010, and the Ph.D. degree from Korea University Islamabad, Pakistan, in 2020, and the
Aerospace University, South Korea, in 2016. master’s degree in computer information systems
He worked and collaborated on various projects and networks from Korea Aerospace Univer-
funded by the Ministry of Economy, the Korea sity, Goyang, South Korea. His current research
Research Foundation, and ETRI, South Korea. interests include artificial intelligence, machine
In 2007, he joined AERO, Pakistan, as an Assistant learning, deep learning, and computer vision.
Manager of telemetry. He is currently an Assistant
Professor with the College of IT, United Arab
Emirates University (UAEU), Abu Dhabi, Al Ain, United Arab Emirates.
Before Joining UAEU, he was an Assistant Professor with the Department
of Electrical and Computer Engineering, COMSATS University Islamabad,
Attock Campus, Pakistan, and a Postdoctoral Researcher with the High Pro-
cessing Computing Laboratory, Jeonbuk National University, South Korea.
He has authored/coauthored more than 40 peer-reviewed publications. His
SUNGCHANG LEE (Member, IEEE) received the
current research interests include embedded, wearable, the IoT applications,
B.S. degree from Kyungpook National University,
intelligent resource management for high-performance computing, and
in 1983, the M.S. degree in electrical engineering
artificial intelligence and machine learning.
from the Korea Advanced Institute of Science
and Technology (KAIST), in 1985, and the Ph.D.
degree in electrical engineering from Texas A&M
University, in 1991. From 1985 to 1987, he was
with KAIST, as a Researcher, where he worked on
image processing and pattern recognition projects.
JEBRAN KHAN received the B.Sc. and M.Sc. From 1992 to 1993, he was a Senior Researcher
degrees in computer systems engineering from with the Electronics and Telecommunications Research Institute (ETRI),
the University of Engineering and Technology South Korea, and the Director of the Government Project on Intelligent Smart
at Peshawar, Peshawar, Pakistan, and the Ph.D. Home Security and Automation Service Technology, from 2004 to 2009.
degree in electronics and information engineer- In 2009, he was the Vice President of the Institute of Electronics and
ing from Korea Aerospace University, Goyang, Information Engineers (IEIE), South Korea, and also the Director of the
South Korea. He is currently a Postdoctoral Telecommunications Society, South Korea. Since 1993, he has been a Faculty
Researcher with Ajou University, South Korea. of Korea Aerospace University, Goyang, South Korea, where he is currently a
His current research interests include social net- Professor with the School of Electronics, Telecommunication and Computer
work analysis, modeling, frameworks, and their Engineering.
applications.