0% found this document useful (0 votes)
3 views

Comparison_of_Deep_Learning_and_Ensemble_Learning_in_Classification_of_Toxic_Comments

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Comparison_of_Deep_Learning_and_Ensemble_Learning_in_Classification_of_Toxic_Comments

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Comparison of Deep Learning and Ensemble

Learning in Classification of Toxic Comments


2023 21st International Conference on Emerging eLearning Technologies and Applications (ICETA) | 979-8-3503-7069-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICETA61311.2023.10343820

K. Machova*, and T. Tomcik*


* Technical University of Košice/Department of Cybernetics and Artificial Intelligence, Košice, Slovakia
[email protected], [email protected]

Abstract—The paper is focused on recognition of various racial diversity, skin colour, religion, gender, or sexual
forms of toxic comments on social networks, particularly on orientation [17]. Companies that have hate speech policies
the offensive and hate speech and on the cyberbullying. The in place include Facebook and YouTube. In 2018, there
spreading of toxic content through social networks is was a post containing part of the United Declaration of
nowadays very serious problem, which can be harmful for a Independence of the States, which refers to Native
democratic society functioning. The paper presents an Americans as "ruthless Indian savages", marked as
experiment with various machine learning methods to Facebook hate speech and removed from his page [11]. In
discover which from them would be the most suitable for 2019, plataform for videos sharing YouTube has shut
building recognition models. Primarily, we will compare the down channels such as American radio host Jesse Lee
deep learning and ensemble learning. Peterson, based on his politically hateful speech.
The offensive speech may involve various forms of
I. INTRODUCTION toxicity. It can be difficult to distinguish general offensive
language from hate speech. The offensive language can
As the Internet expands, so does the amount of content contain any kind of profanity or insult, so the hate speech
on it. In addition to content based on facts, a large amount can be classified as a subset of offensive language. When
of content is a toxic type, which negatively affects the web detecting offensive posts, the type and purpose of
users, particularly teenagers. While most web users offensive expression is considered. And therefore, the
respect the norms of social behaviour, some users do not, criteria for detection should capture the attributes of
and their comments reflect their antisocial behaviour. The offensive expression in general [3].
anonymity provided by social networks, simplicity of
The cyberbullying represents content posted online by
contribution and easiness of toxic content spreading
an individual or a group who is aggressive, humiliating, or
represent an extremely topical problem today. An
hurtful towards the victim who does not know or cannot
automated detection of various forms of toxicity can be
easily defend himself. It may be described on a basis of
helpful in the process of regulation and limitation of them
three criteria: intention, repetition, and superiority. Leaked
by moderators of web discussions but also by social
information means a big problem for the bullied person,
networks users.
since if any defamatory or confidential information is
Social media has seen increased use as a source of published on the web, it is very difficult to remove it [14].
information and is mainly used to search for information
Machine learning and its methods are very popular
on serious topics. There has also been great use by those
today and useful. They are used for various forms of
who seek health information. People use social "tools" to
classification, regression and solution problems associated
gather information, share stories, but also discuss issues.
with text or image detection. They have a wide range of
Similarly, healthcare organizations see benefits of social
uses through detection of antisocial behavior, cyber
media because they give them access to healthcare
security, healthcare, IoT and various others [13]. We
information.
focused our research on the automatic detection of three
Social media comes to the fore as a source of forms of the antisocial behaviour: hate speech, offensive
information in times of disaster and risk situations, speech, and cyberbullying using machine learning.
although the accuracy of the information that is shared
The main objectives of the study intended to achieve
through these channels is unclear. Therefore, it is essential
are as follows:
to learn more about how people evaluate the information
they receive on social media websites, especially in terms x A comparison of various machine learning
of their credibility. methods in generation of models for recognition
There are many kinds of uncredible information and of various antisocial behaviour forms (hate,
toxic comments, which can be and is harmful for users as offensive, and cyberbullying) on social networks,
fake news, conspiracy theories, trolling, hatreds, offensive but similar in impact on social networks users.
posts, cyberbullying, and phishing. We have concentrated x A meta-level comparison by offering an
on detection of hate speech, offensive speech and evaluation of the success of classical learning
cyberbullying in our research. contra deep learning, and ensemble learning in
The hate speech is defined as public speech that building detection models.
expresses hatred or promotes prejudice and violence
against a person or a group based on something such as

979-8-3503-7069-0/23/$31.00 ©2023 IEEE 353


Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on January 22,2025 at 08:43:30 UTC from IEEE Xplore. Restrictions apply.
x Answering the question “How the volume of data same concept is already associated with target labels,
available for training affects the results of such as classes that define concept identities [2].
machine learning models”. Unsupervised learning: is about understanding or
x A creation of three datasets in English language: finding a concise description of data passively by
Hate dataset, Offensive dataset, and mapping or grouping data according to certain organizing
Cyberbullying dataset suitable for intended principles. That means the data they only form a set of
experiments. objects for which no label is available to define a
particular one related concept, as in supervised teaching.
A. Related Works
The goal of unsupervised learning is to create groups,
Detection models for the toxicity recognition based on clusters of similar objects, according to the criterion of
machine learning have been researched in latest years. similarity and then derive a concept that is shared
Work [3] uses TF-IDF weighting scheme, part-of-speech
between these objects. The unsupervised learning also
tags, and other linguistic features for the representation of
text inputs for the very successful machine learning includes algorithms that aim to provide a representation
method SVM. The best results achieved were from high dimensional to low-dimensional spaces, while
Accuracy=0.91 for offensive posts recognition but only preserving the initial information about the data and a
0.61 for hate speech recognition. Also results achieved in more efficient calculation is offered [4].
the paper [10] showed the SVM model as the most Reinforcement learning: involves taking actions to
successful (Accuracy=0.89) in the recognition of degrees achieve a goal. During reinforcement learning, the agent
of toxicity in the conversational content comparing with learns by trial and error to perform an action in the
NB-Multinomial, RF and Bagging. The approach environment to obtain a reward, thus providing an
presented in [18] uses the average results of 10 neural effective method for developing goal-directed action
networks with different initializations of weights. The strategies. Reinforcement learning was inspired by related
ensemble model achieved the best result F1= 0.94, but the phycological theories and is closely related to the basal
means of ensembles achieved only F1= 0.83. The work
[5] focuses on neural networks (LSTM and GRU) in ganglia in the brain. Reinforcement learning
the recognition of abusive posts on Twitter. They methodologies are concerned with problems where the
achieved AUC values to range from 0.92 to 0.98. In learning agent essentially does not know what to do Thus,
general, the machine learning approach can be used for the agent must discover an appropriate way to maximize
sentiment analysis, which can be helpful in toxicity the expected profit defined by the rewards that agents
recognition as the toxicity in online space usually receive in each state [15]. Reinforcement learning differs
represents a negative opinion. The work [8] has developed from of supervised learning because no input/output is
Ensemble Learning Scheme using DT, SVM, RF and presented in reinforcement learning pairs, neither are
KNN (K-Nearest Neighbours) for sentiment analysis of explicitly modified actions, but agents at a specific time ‫ݐ‬
CoViD-19 related offensive comments. The results were fall into a certain state and based on this information they
beyond Accuracy=0.90. choose an action. As a result, the agent receives for his
activity a reinforcing punishment or reward.
II. USED METHODS
We have used supervised learning method for
To detect antisocial behaviour in text data, methods of generation recognition model because we had labeled
machine learning are often used. The main goal of our datasets available. The scheme of supervised learning is
work was to compare suitability of various approaches to illustrated in the Figure 1.
machine learning for recognition toxicity on social
networks. Thus, we concentrated on deep learning, and
ensemble learning in a comparison with some classic
machine learning methods.
A. Machine Learning Paradighms
Currently, three categories of machine learning from
examples are recognized, according to the way in which
they use the training data to find the context or inference
or general description of the set data in a specific
problem. These are the following categories: Figure 1. The supervised learning schemes [2].
x Supervised Learning
x Unsupervised Learning
x Reinforcement Learning. B. Deep Learning
Deep learning is a type of a supervised machine
Supervised learning includes prediction and learning of artificial neural networks with large number of
classification tasks and represents the main category hidden layers. For text data processing, recurrent neural
learning. In supervised learning, the objects that are networks are the best choice, because they can transfer
related to a particular problem are represented by input- information from the processing of one input to the
output pairs. This means that data that belongs to the processing of the next input and thus model the
relationship between words in a text.

979-8-3503-7069-0/23/$31.00 ©2023 IEEE 354


Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on January 22,2025 at 08:43:30 UTC from IEEE Xplore. Restrictions apply.
The most known recurrent network is LSTM (Long interpret and offer good accuracy on multiple data forms
Short-Term Memory) – it is also a specific network [1].
among all recurrent networks, which can re-store
information a longer time and that is why they can process D. Ensemble Learning
longer sequence of words. LSTM networks are composed From the ensemble learning we have used two more
of repeating modules (LSTM blocks) in the form of a successful methods: random forests and boosting. The
chain. The basis of LSTM is a horizontal line through Random Forests (RF) of decision trees. RF method tries
which a vector passes between individual blocks. Using to minimize the variance by creating more decision trees
this line, information passes through the entire structure of in different parts of the same training data. Individual trees
the LSTM network. There are three gates (input, forget are de-correlated using a random selection of a subset of
and output gate) in individual cells. These gates are used attributes. The method achieves the final classification by
to remove or add information to the state of the block. voting of the individual trees. RF method is used mainly
Information (vector) passes through these gates, which are in cases where a limited amount of data is available,
composed of neurons with a sigmoidal activation function. which significantly reduces memory requirements when
Depending on the value of the output on these neurons, generating many trees [9].
certain amount of information passes through it, while 0 AdaBoost is an algorithm based on boosting learning
means that no information passes through the gate and 1 strategy. The boosting is a method, which combines
means that everything passes through the gate [6]. predictive models results to improve the accuracy of the
There are several variations of the LSTM network that final prediction. In this method, training examples are
use this basis, but with some variations in some parts of weighted. A set of models are learned on the same dataset,
the block. One of the most famous is GRU (Gated but each time with different weights of training examples.
Recurrent Unit). This variation combines input gate and The weights are adapted from model to model, to achieve
forget gate into one gate [12]. This means that the GRU is better results [16].
simpler because it has only two gates in total.
III. IMPLEMENTATION AND TESTING
C. Clasic Machine Learning
From classic supervised methods of machine learning, A. Methodology
we have chosen Naïve Bayes as a baseline method, We extracted short texts from social media and
Support Vector Machine because of its big efficiency in available datasets related to antisocial behavior and joined
text data processing and Decision Tree classifier for them to three sets: Hate speech, Offensive and
comparison with Random Forest of decision trees as an Cyberbullying datasets. Those rough datasets were
example of the ensemble approach to learning. Naïve preprocessed and labelled. All final datasets were used for
Bayes classifier (NB) is a probabilistic classifier based on the training of detection models using classic machine
Bayes’ theorem and independence assumption between learning methods (NB, SVM and DT), deep learning
features. NB is often outperformed by support vector (LSTM and GRU) and ensemble methods (RF and
machines [7]. Adaboost). The models were evaluated using the
Support vector machines (SVM) separates the sample Accuracy, and F1-rate. Figure 2 illustrates the
space into two or more classes with the widest margin methodology of our approach to building models for the
possible, which increase the accuracy of classification. toxicity recognition.
The method is originally a linear classifier. For text All machine learning models were implemented in the
processing is more suitable the SVM using kernel programming language Python (in version 3). Particularly,
functions [7]. the Jupiter Notebook was used, which was created
Decision Tree (DT) method generates a tree graph in employing Scikit-learn, Numpy, Matplotlib and Scipy
which each path starting from the root is described by a libraries. All machine learning methods were trained on
sequence separating the data until a Boolean result is all datasets to enable the detailed comparison of suitability
reached at the end - leaf node. DT models are easy to of selected methods in detection of all forms of antisocial
behaviour as hatred, sensitivity, and cyberbullying.

979-8-3503-7069-0/23/$31.00 ©2023 IEEE 355


Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on January 22,2025 at 08:43:30 UTC from IEEE Xplore. Restrictions apply.
Figure 2. The methodology schemes.
where and .
B. Data Description
Selected methods of machine learning were used for The following insight can be derived from the
model training. The models were tested on three datasets definition of these metrics. For simplicity we consider TP
containing data of three various forms of antisocial > 0, TN > 0, FP > 0 and FN > 0. Then, in multiplications
behaviour. We prepared three datasets in English there is no change of the sign of values. Then the
language. The first Hate Speech Dataset contained 2000 relationship between Accuracy (Acc) and F1 can be
hate comments and 4000 neutral comments. The second described as in Figure 3.
Offensive Speech Dataset contained 23548 offensive
comments and 12239 neutral ones. The third
Cyberbullying Dataset contained data that captured
cyberbullying in conversational content on the social
network Twitter, Wikipedia, and YouTube platform. The
data were labelled manually into cyberbullying/not
cyberbullying classes. This dataset was unbalanced
because it contained 803 cyberbullying and 65060 neutrals
comments. So, the number of neutral comments was
decreased before training. The datasets were created by
finding data, joining them, preprocessing, and labelling.
The datasets were pre-processed through removal spaces, Figure 3. Relationship between Acc and F1.
social network hashtags, hyperlinks, and capital letters and
through tokenization (using a tokenizer from Scikit-learn
library) and vectorization (using TF-IDF weighting From the description of relationship between Accuracy
scheme and Word2Vect representation). The ratio of and F1, following insight can be derived:
training and testing sets was 3:1.
IF TN ≥ TP THEN Accuracy ≥ F1-rate
C. Measures of Effectiveness of Models IF TN < TP THEN Accuracy< F1-rate.
We have used two the most known measures of the
model’s effectiveness – Accuracy and F1-rate. Accuracy
is defined by the following formula: IV. TEST RESULTS AND DISCUSSION
Testing results of classic machine learning models (NB,
SVM and DT), deep neural networks (LSTM, GRU), and
ensemble learning (RF and AdaBoost) trained on the Hate
Dataset are in Table I. (CML represents Classic Machine
learning, DL represents Deep Learning and EL represents
where TP is the number of true positive classifications, Ensemble Learning).
TN is the number of true negative classifications, FP is the
number of mistaken classifications – false positive
classifications and FN is the number of false negative
classifications. F1- rate is defined by the following way:

979-8-3503-7069-0/23/$31.00 ©2023 IEEE 356


Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on January 22,2025 at 08:43:30 UTC from IEEE Xplore. Restrictions apply.
TABLE I. effectiveness of models trained by all used machine
THE RESULTS OF MODELS TRAINED USING CLASSIC MACHINE LEARNING
METHODS (NB, SVM, DT), DEEP LEARNING NN (LSTM, GRU), AND
learning methods on all three datasets by only in Accuracy
ENSEMBLE LEARNING (RF, ADABOOST) ON HATE SPEECH DATASET measure in one overview Table IV.
REPRESENTED BY ACCURACY AND F1 MEASURES

Hate Speech TABLE IV.


THE RESULTS OF MODELS TRAINED USING CLASSIC MACHINE LEARNING
Dataset METHODS (NB, SVM, DT), DEEP LEARNING NN (LSTM, GRU), AND
ENSEMBLE LEARNING (RF, ADABOOST) ON HATE, OFFENSIVE AND
Support 6000 CYBERBULLYING DATASET REPRESENTED BY ACCURACY
Effectivity measure Acc. F1
CML NB 0.649 0.560 Dataset Hate Offensive Cyberbullying
SVM 0.893 0.840 (support) (6000) (35787) (11863)
DT 0.900 0.857 CML NB 0.649 0.893 0.923
DL NN (LSTM) 0.431 0.426 SVM 0.893 0.926 0.928
NN (GRU) 0.363 0.359 DT 0.900 0.924 0.922
EL RF 0.902 0.860 DL LSTM 0.431 0.956 0.513
AdaBoost 0.904 0.870 GRU 0.363 0.964 0.616
EL RF 0.902 0.916 0.932
AdaBoost 0.904 0.908 0.912
Testing results of classic machine learning models (NB,
SVM and DT), deep neural networks (LSTM, GRU), and
ensemble learning (RF and AdaBoost) trained on the For datasets where the detected class was not
Offensive Speech Dataset are in Table II. represented by large number of training examples, the
ensemble learning methods as AdaBoost for Hate Speech
Dataset (Accuracy=0.904) and RF for Cyberbullying
TABLE II.
THE RESULTS OF MODELS TRAINED USING CLASSIC MACHINE LEARNING Dataset (Accuracy=0.932) came out the best. On the
METHODS (NB, SVM, DT), DEEP LEARNING NN (LSTM, GRU), AND largest Offensive Speech Dataset, the neural networks
ENSEMBLE LEARNING (RF, ADABOOST) ON OFFENSIVE SPEECH were the best (Accuracy=0.964) using GRU. In the results
DATASET REPRESENTED BY ACCURACY AND F1 MEASURES
obtained by training on the Offensive Speech Dataset,
Offensive Speech Accuracy was higher than F1-rate. This means that the
Dataset neural networks were more successful at detecting neutral
Support 35787 than offensive posts. But that does not mean that they
Effectivity measure Acc. F1 would be less precise because of a high FP Conversely,
CML NB 0.893 0.921 with other machine learning methods, including ensemble
SVM 0.926 0.936 learning, F1-rate was greater than Accuracy.
DT 0.924 0.941
DL NN (LSTM) 0.956 0.945 V. CONCLUSION
NN (GRU) 0.964 0.949 The conclusion of our experiments is that the neural
EL RF 0.916 0.937 networks give better results only if they have sufficiently
AdaBoost 0.908 0.926 large datasets available. When dataset is not so large, the
better way is using the ensemble learning. The best result
Testing results of classic machine learning models (NB, on the smallest Hate Speech Dataset were achieved by
SVM and DT), deep neural networks (LSTM, GRU), and AdaBoost (Accuracy=0,904, F1-rate=0.870). On the other
ensemble learning (RF and AdaBoost) trained on the hand, the best result on the largest Offensive Speech
Cyberbulying Dataset are in Table III. Dataset was achieved by neural networks, particularly by
GRU (Accuracy=0.964, F1-rate=0.949).
TABLE III. The impact of this study is in comparison also different
THE RESULTS OF MODELS TRAINED USING CLASSIC MACHINE LEARNING approaches to machine learning strategy as classic, deep
METHODS (NB, SVM, DT), DEEP LEARNING NN (LSTM, GRU), AND and ensemble learning. Another useful contribution of the
ENSEMBLE LEARNING (RF, ADABOOST) ON CYBERBULLYING
DATASET REPRESENTED BY ACCURACY AND F1 MEASURES paper for research community is a finding, that deep
learning model are successful in offensive speech
Cyberbullying recognition, but for the recognition of hate speech or
Dataset cyberbullying the more effective are models obtained by
Support 11863 Ensemble learning. The conclusion is that both deep
Effectivity measure Acc. F1 learning and ensemble learning are better selection as
CML NB 0.923 0.650 classical machine learning methods.
SVM 0.928 0.696 With further optimization of the hyper parameters, or
DT 0.922 0.651 by expanding the dataset, or using augmentation of the
DL NN (LSTM) 0.513 0.496 current dataset, we could probably achieve a few tenths of
NN (GRU) 0.616 0.601 a percent better results. Future research could also focus
EL RF 0.932 0.708 on using capsule CCNs, or the use of other models for
AdaBoost 0.912 0.561 vector representation such as FastText or GloVE.
In the future, we could focus on the combination of
We can see from these three tables (Table I., II. and III.) deep learning and ensemble learning by training neural
that for all three datasets either deep learning or ensemble ensemble models, since ensemble models based on
learning was the best. For comparison, we present also the classical machine learning methods did not achieve the

979-8-3503-7069-0/23/$31.00 ©2023 IEEE 357


Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on January 22,2025 at 08:43:30 UTC from IEEE Xplore. Restrictions apply.
results we hoped for. We could also involve multimodal pp. 1–445, ISBN 978-1-4614-7137-0, DOI 10.1007/978-1-4614-
detection of toxic behavior in social media by extend text 7138-7.
processing to include processing of speech recordings and [8] V. Kandasamy, et al. “Sentimental Analysis of COVID-19 Related
Messages in Social Networks by Involving an N-Gram Stacked
images available in online comments, for example Autoencoder Integrated in an Ensemble Learning Scheme,”
emoticons or illustration images. Other direction for future Sensors, 21(22), 2021, 7582-7582.
research can be using various ensemble strategies to [9] V.Y. Kulkarni, and P.K. Sinha, “Pruning of random forest
increase a detection performance of set of models, for classifiers: a survey and future directions,” In Proc. of
example by using random forest voting strategy. International Conference on Data Science Engineering ICDSE
2012, Cochin, Kerala, India, 64–68.
ACKNOWLEDGMENT [10] K. Machova, et al., “Machine Learning and Lexicon Approach to
Texts Processing in Detection of Degrees of Toxicity in Online
This work was supported by the Scientific Grant Discussions,” In Sensors (Basel), 22(17), 2022, 6468-
Agency of the Ministry of Education, Science, Research 6468, https://doi.org/10.3390/s22176468.
and Sport of the Slovak Republic, and the Slovak [11] T. Mikolov, et al., “Advances in pertaining distributed word
Academy of Sciences under grant no. 1/0685/21 and by representations,” In Proc. of the International Conference on
the Slovak Research and Development Agency under Language Resources and Evaluation LREC 2018, Miyazaki,
Japan, 2018, 52-55.
Contract no. APVV–22–0414.
[12] Oinkina, X. et al. “Understanding LSTM Networks,” 2023,
Available online: http://colah.github.io/posts/2015-08-
REFERENCES Understanding-LSTMs/ (accessed on 03.02.2023).
[1] T.J. Bazhad, and M.A. Adnan, “Classification Based on Decision [13] I.R. Sharker, “Machine Learning: Algorithms, Real-Wold
Tree Algorithm for Machine Learning,” In Journal of Applied Applications and Research Directions,” SN Computer Science,
Science and Technology Trends, 2(1), 2021, 20-28, ISSN: 2708- vol. 2, March 2021, pp. 160–160.
0757, doi: 10.38094/jastt20165. [14] J.I. Sheeba, and S. Pradeep-Devaneyan, “Cyberbully Detection
[2] ͘ D͘ ŝƐŚŽƉ͕ EĞƵƌĂů EĞƚǁŽƌŬƐ ĨŽƌ WĂƚƚĞƌŶ ZĞĐŽŐŶŝƚŝŽŶ͘ KdžĨŽƌĚ from Twitter using Classifiers,” In CiiT - International Journal of
hŶŝǀĞƌƐŝƚLJWƌĞƐƐ͕ KdžĨŽƌĚ͕ϭϵϵϱ͘ Artificial Intelligent Systems and Machine Learning, 9(8), 2017,
[3] T. Davidson, et al., “Automated hate speech detection and the 163-168.
problem of offensive language,” In Eleventh International AAAI [15] Z͘^͘^ƵƚƚŽŶ͕͘'͘ĂƌƚŽ͘ZĞŝŶĨŽƌĐĞŵĞŶƚ>ĞĂƌŶŝŶŐ͗Ŷ/ŶƚƌŽĚƵĐƚŝŽŶ͘
Conference on Web and Social Media: Computation and ϮŶĚĞĚŝƚŝŽŶ͕ĂŵďƌŝĚŐĞ͕D͗D/dWƌĞƐƐ͖ϮϬϭϴ͘
Language, ICWSM 2017, Montreal, Quebec, Canada,512-515, [16] C. Tu, B. Xu, and H. Liu, “The Application of the AdaBoost
https://doi.org/10.48550/arXiv.1703.04009. Algorithm in the Text Classification,” 2nd IEEE Advanced
[4] Z͘K͘ ƵĚĂ͕ W͘͘ ,Ăƌƚ͕ ͘ '͘ ^ƚŽƌŬ͕ WĂƚƚĞƌŶ ůĂƐƐŝĨŝĐĂƚŝŽŶ͕ ϮŶĚ Information Management, Communicates, Electronic and
ĞĚŝƚŝŽŶ͕tŝůĞLJ͕EĞǁzŽƌŬ͕ ϮϬϬϭ͘ Automation Control Conference (IMCEC), Xi’an, China, 2018,
[5] A. Founta, et al. “A unified deep learning architecture for abuse 25-27, doi: 10.1109/IMCEC.2018.8469497.
detection,” In Proc. of the 10th ACM Conference on Web Science, [17] M. Zampieri, et al. “Predicting the type and target of offensive
Boston, Massachusetts USA, 2019, pp. 105-114, posts in social media,” In Proc. of the 2019 Conference of the
https://doi.org/10.1145/3292522.3326028. North American Chapter of the Association for Computational
[6] S. Hochreiter, and J. Schmidhuber, “Long Short-term memory,” In Linguistics: Human Language Technologies, Minneapolis,
Neural Computation, 9(8), 1997, 1735-1780; Minnesota, June 2019, 1415–1420.
DOI:10.1162/neco.1997.9.8.1735. [18] S. Zimmerman, et al., “Improving Hate Speech Detection with
[7] G. James, et al., “An Introduction to Statistical Learning – with Deep Learning Ensembles,” In Proceedings of the - Language
Applications in R - Springer Texts in Statistics,” Springer, 2014, Resources and Evaluation Conference LREC 2018, Miyazaki,
Japan, 2018, 2546-2553

979-8-3503-7069-0/23/$31.00 ©2023 IEEE 358


Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on January 22,2025 at 08:43:30 UTC from IEEE Xplore. Restrictions apply.

You might also like