2022 Using ML and Deep Learning
2022 Using ML and Deep Learning
https://doi.org/10.1007/s42979-022-01308-5
ORIGINAL RESEARCH
Received: 25 April 2022 / Accepted: 1 July 2022 / Published online: 26 July 2022
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2022
Abstract
Nowadays, a lot of people indulge themselves in the world of social media. With the current pandemic scenario, this engage-
ment has only increased as people often rely on social media platforms to express their emotions, find comfort, find like-
minded individuals, and form communities. With this extensive use of social media comes many downsides and one of the
downsides is cyberbully. Cyberbullying is a form of online harassment that is both unsettling and troubling. It can take many
forms, but the most common is a textual format. Cyberbullying is common on social media, and people often end up in a
mental breakdown state instead of taking action against the bully. On the majority of social networks, automated detection of
these situations necessitates the use of intelligent systems. We have proposed a cyberbullying detection system to address this
issue. In this work, we proposed a deep learning framework that will evaluate real-time twitter tweets or social media posts
as well as correctly identify any cyberbullying content in them. Recent studies has shown that deep neural network-based
approaches are more effective than conventional techniques at detecting cyberbullying texts. Additionally, our application
can recognise cyberbullying posts which were written in English, Hindi, and Hinglish (Multilingual data).
Keywords Cyberbullying · Stack word embeddings · Deep learning model · Multilingual · Real-time tweets
SN Computer Science
Vol.:(0123456789)
401 Page 2 of 13 SN Computer Science (2022) 3: 401
Word embedding is an natural language processing meth- subsequent section provides the highlights or the contribu-
odology for mapping words or phrases from a lexicon to a tion made in this work. “Literature Review” provides an
corresponding vector of real numbers, which is then used to incisive review on the related works on existing cyberbully
find word predictions and semantics. These word embed- detection techniques based on machine leaning and deep
dings are beneficial when training on data from contextual learning, and the subsequent section provides the detailed
information because they indicate implicit associations explanation of the proposed model. “Results and Discus-
between words. The main reason for choosing word embed- sion” describes the results obtained and the last section is
ding is that they do not require costly annotation and can concluding in nature.
be produced from vast, unannotated corpora that are freely
available. After that, pre-trained embeddings can be used in
tasks that need tiny amounts of labeled data. Aim and Motivation
In the collection of word embeddings, stacked embed-
dings are one of the most important topics. To combine During the time of COVID-19, people’s activity on social
distinct embeddings, a stack embedding method is utilized. media has increased tremendously, since social media plat-
A stack embedding strategy, for instance, is utilized to use forms provide us with excellent communication platforms,
both conventional and contextual string embeddings. Stack but it also makes young people more vulnerable to online
embeddings allow to mix and match and the combination of threats and harassment. Because of many active users on
embeddings has been established to produce the best results social media networks, cyberbullying has become a global
[1]. problem. The pattern suggests that cyberbullying on social
Deep learning (DL) is a subset of Machine Learning media is on the rise and according to recent studies, it is
(ML) that can be applied to a wide range of applications. becoming more prevalent among teenagers. The ability to
Deep learning models are preferred in our work over tra- recognize potentially dangerous communications is critical
ditional machine learning models because they have been to successful prevention, and the information overload on
shown to be more successful than machine learning (ML) the Internet needs intelligent systems that can automatically
or statistical methods and have a higher number of neuron identify potential threats. This research work aims to develop
layers than ML. Deep learning algorithms are proven to be a model that can accurately detect cyberbully in real-time
highly effective at text classification, with state-of-the-art tweets.
outcomes on a variety of classic academic benchmark issues. Cyberbullying has been one of the main prevailing issues
Deep learning networks have the advantage of improving since the use of technology and internet services has been
their models as the size of the data grows. Because hybrid made accessible to the general public. After the COVID-
techniques have been demonstrated to be potential models 19 pandemic hit, affecting the daily life of most people
for minimizing sentiment inaccuracies on more detailed and forcing them to isolate themselves from social groups
training data, we explored utilizing a hybrid model instead and communities, this usage of social media has only been
of a single layer model [2]. increasing. With the increased use of digital platforms for
On all types of datasets, hybrid models outperformed sin- educational reasons by most youths, an increase in cyber-
gle models in terms of cyberbullying detection, especially bully incidents seems unavoidable, as pupils who are bullied
when deep learning methods were combined. Our research are more inclined to cyberbully. Bullying and discrimination
looks at how our proposed hybrid model reacts to various based on race, religion, sex, caste, and creed can cause an
forms obtained from multiple languages. The combination adverse effect on the victim’s mental health, eventually lead-
of multiple models was studied and validated in this study. ing to anxiety, depression, and even an increase in suicide
We looked at the relationship between models and their cases.
increased abilities to extract traits, store past data and nodes, Cyberbullying has increased by 70% in just a few months,
and classify text [3]. according to a group that records online bullying and har-
On multilingual Twitter datasets, we will use the CNN- assment cases. Human monitoring becomes fruitless in the
BiLSTM model with stacked word embedding. The com- face of such huge amounts of data on the internet, as there
bined models boosted the accuracy of cyberbullying detec- is no scalable and effective means to trace out cyberbullies
tion, according to the findings of our research. After training and tackle the problem. Thus, there is a need to tackle this
the model, we plan to create a Twitter-like app using python problem on an automated level that is fast, efficient, and
and the web tech stack [4], in which real-time data will be accurate for the sake of social well-being.
fed into the system, and whether the input text is cyberbul-
lying or non-cyberbullying will be predicted.
This paper is organized as follows. The following section
provides the aim and motivation of this research work. The
SN Computer Science
SN Computer Science (2022) 3: 401 Page 3 of 13 401
SN Computer Science
401 Page 4 of 13 SN Computer Science (2022) 3: 401
using 5000 Bangla and 7000 Romanized Bangla texts, as memory, further extending to distributed memory concat-
well as a combined dataset. Before they were trained, the enated and distributed memory means.
datasets were preprocessed with NLP techniques and fea- A neural network architecture with a self-attention model
tures retrieved. Multinomial Naive Bayes, SVM, Logistic trained on a balanced dataset combining three different data-
Regression, and XGBoost were among the machine learn- sets from different sources was adopted by [13]. The self-
ing models utilized. Deep Learning algorithms including attention model follows an overall standard encoder-decoder
CNN, LSTM, BLSTM, and GRU were used. According architecture that replaces recurrent layers with multi-headed
to the findings, CNN outperformed all other algorithms self-attention and is tested on parameters like precision,
for the dataset containing Bangla texts, with an accuracy recall, and F1 scores achieved state-of-the-art accuracy
of 84%. In the other two datasets, the Multinomial Naive and even outperformed the BLSTM model with attention.
Bayes machine learning technique fared best, with 84% An ensemble model involving feature analysis techniques
accuracy in the Romanized Bangla dataset and 80 percent was developed by [14] for Naive Bayes-SVM and XGBoost
accuracy in the combined dataset. models and word embeddings for deep learning approaches
Using data from Twitter, Wikipedia, and Formspring, like CNN, Bi-GRU, and attention networks using a majority
a working implementation of an application that detects voting-based ensemble where the prediction from individual
cyberbullying across multiple social media platforms, was models counts as votes for class labels with a fairly based
proposed by [8]. They have used LSTM layers to detect dataset.
cyberbully. Using the backpropagation method, these mod- A classification model based on the attention techniques
els were trained. The cross-entropy loss function is used in was proposed by [15] to analyze Arabic comments, includ-
combination with the Adam optimizer. These results were ing different Arabic dialects. Inspired by human-like learn-
better than the traditional approaches. ing, the proposed attention model dynamically pays attention
Four machine learning models: LR, Gaussian Naive to certain parts of the input that aids in achieving results
Bayes, RNN, and BERT were used by [9]. For final classifi- and ignores everything irrelevant. The model is built with
cation, these are combined with a neural network. The sug- an embedding layer consisting of two LSTM layers with a
gested model’s and other accessory classification models’ dense layer and an output layer to compare impact with and
suitability was assessed using numerous assessment meas- without the recurrent networks.
ures to determine how well the model can perform. Accu- Many hybrid approaches to the test on a variety of data-
racy, Precision, and Recall were the most commonly uti- sets from various disciplines were put by [16] to detect sen-
lized measures for measuring efficacy. The model works well timental analysis. Eight textual tweets and review datasets
regardless of the social media sentence input, according to from various fields are used to develop and test hybrid deep
the results of training and testing. Though the models’ clas- sentiment analysis learning models that integrate Long
sification accuracy is good, they do have significant limits Short-Term Memory (LSTM) networks, Convolutional Neu-
that might be overcome by including a variety of additional ral Networks (CNN), and Support Vector Machine (SVM).
techniques. The effectiveness and efficacy of deep learning Each technique was evaluated based on its dependability and
systems in detecting cyberbullying were discussed by [10]. calculation time. The CNN-BiLSTM model outperformed
They worked on four deep learning models: Bidirectional all the baseline deep learning models. A cyberbully detec-
Long Short-Term Memory (BLSTM), Gated Recurrent Units tion model was proposed by [17] to improve manual moni-
(GRU), Long Short-Term Memory (LSTM), and Recurrent toring for cyberbullying on social media. In this paper OCR
Neural Network (RNN). In comparison to the RNN, LSTM, was used to analyze image character to determine the impact
and GRU models, the BLSTM model achieved good accu- of image-based cyberbullying on an individual basis, which
racy and F1-measure scores. was further tested on a dummy system. To create predictive
A cyberbullying detection framework was developed by models for cyberbullying detection, supervised learning-
[11] using reinforcement learning and combining various based techniques commonly use classifiers such as SVM
natural language processing techniques. The developed and Nave Bayes.
framework leverages human-like behavioral patterns, uses A Deep Convolutional Neural Network (DCNN) was cre-
delayed rewards, and outperforms other models with a highly ated by [18] to create an automated system as its nearly hard
dynamic and populated dataset achieving 89.5% accuracy on to physically filter any information from such a large amount
the dataset. Different machine learning (Logistic Regression, of incoming traffic in the form of tweets With the use of con-
Linear SVC, Multinomial Naive Bayes, and Bernoulli Naive volution, the proposed DCNN model uses the Twitter text
Bayes) and deep learning techniques (CNN models incor- with GloVe embedding vector to capture the semantics of
porating ) were applied by [12], by incorporating various the tweets and outperformed existing models. a strategy for
n-gram ranges to detect cyberbullying on Twitter. Vector predicting hateful speech on social media networks using a
approaches included distributed bag of words and distributed hybrid of natural language processing and machine learning
SN Computer Science
SN Computer Science (2022) 3: 401 Page 5 of 13 401
techniques was described by [19]. A powerful natural lan- of pre-processing procedures. They also used machine learn-
guage processing optimization ensemble deep learning strat- ing optimization techniques to fine tune the neural network
egy is used to analyze the collected data (KNLPEDNN). model’s parameters. They used 5 fold cross validation to
The methodology employs an active learning environment to train and evaluate each network. The CNN-LSTM had the
detect hate speech on social media platforms by classifying highest recall of 83.46% followed by CNN than BiLSTM
the text into neutral, offensive, and hate language. with attention and the last one was BiLSTM.
A comparative study of pre-existing deep learning The problem of detection of hateful comments was solved
methods was presented by [20] to detect Hate Speech and by [25], using text mining and deep learning models built
Offensive Language in textual data. These methods include using LSTM to detect and classify cyberbullying speech and
CNN, RNN, LSTM and BERT models. The authors also filter it out for us. The input layer has input in the form of
investigated the effect of class weighting technique on the sequences which are basically numbers that represent text.
enactment of the deep learning methods. It was found that The embedding layer takes each word from the input layer
the pre-trained BERT model outperformed other models in and produces appropriate word vectors. This is fed to the
case of both unweighted and weighted classification, likely LSTM model which is further connected to the dense layer.
because of the property of BERT to measure the relation Each neuron in each layer is strongly linked to the layers
between sentences by treating them as whole units. above and below it. The final model has accuracy of 94.94%
A model for detecting tweets that contain racist text and is able to detect cyberbullying or not.
was proposed by [21] by performing the sentiment analy-
sis of tweets. A stacked ensemble deep learning model
is assembled by combining GRU, CNN and RNN, called
Gated Convolutional Recurrent- Neural Networks (GCR- Proposed Methodology
NN). The performance of the model is optimized by setting
different structures in terms of the number of layers, loss This proposed model is a prototype for a cyberbullying
function, optimizer and number of neurons, etc. Proposed detection system which can be used for social media plat-
model showed substantially better performance than those forms for automated checking and control of cyberbully. The
of machine learning models. Embedding word representa- data for training is cleansed and preprocessed before being
tions and deep-learning approaches were employed by [22] fed into stacked word embeddings. Then the CNN-BiLSTM
to recognize and classify toxic speech. A Twitter corpora deep learning model is trained to perform better than regu-
was used to conduct binary and multi-class classification, lar deep learning models trained standalone. The model is
and two main approaches were investigated: extracting word saved for its use in the website. The website is similar to any
embeddings accompanied by utilizing a DNN classifier and social media platform where the user has access to many
fine-tuning the pre-trained BERT classifier. BERT fine-tun- features. Admin will have privileges to view content status.
ing was found to be substantially more effective. Even though this work is a prototype, it is still a step towards
Numerous approaches with varying data ratios at about getting a better result.
the same time were compared by [23]. As a result, when the
data is small, machine learning produces good results. When • Dataset Analysis - The acquired labeled data in 3 lan-
they used more data for the trials, they got better outcomes guages, i.e., Hindi, English and Hinglish, from numerous
by employing deep learning. When compared to the other open-source dataset sources go towards the text preproc-
approaches they examined, BiRNN produces the best results. essing stage which involves Data Cleaning, Data Inte-
They have used ML and DL models to detect hate speech gration, Data Transformation, Data Reduction and Data
using RNN. They have two datasets, one has more data than discretization.
the other one. They have strived to extract hate speech from • Data Cleaning - Any irrelevant attributes, empty cells
tweets to discover the best way for improving accuracy in and NaN values are removed. The data is also formatted
several methods. Result shows that ML works well with so that the data type across the dataset is uniform.
small data while DL works well with large datasets. Even if • Data Transformation - As the three datasets are acquired
their strategy outperforms previous models, they must con- from different sources, compiling them in their original
sider the sort of data set that will be used in the future. form will not be compatible because of the difference in
Efficiency of different neural network models such as classification labels. To proceed with these datasets, it is
CNN, BiLstm, BiLstm with attention mechanisms were important to get rid of different label sets and using one
combined with CNNLSTM models and evaluated by [24]. single classification technique, 0-1 classifier, which will
They trained these networks on a labeled dataset of YouTube tell us if the text contains content of cyberbullying or not,
comments. They employed Arabic word representations to thus making it a black and white area to train our model
depict the remarks after running the dataset through a variety and eliminating any gray possibilities.
SN Computer Science
401 Page 6 of 13 SN Computer Science (2022) 3: 401
• Data Integration - All the datesets are integrated to one A CNN BiLSTM is a bidirectional LSTM and CNN
csv file that is used for further text preprocessing. framework that is concatenated. It trains both character-level
• Data Discretization - In this stage the data was tokenized, and word-level characteristics in the initial formulation for
i.e., splitted the sentence into words for easy evaluation classification and prediction. The character-level properties
of data. are induced using the CNN layer. To derive a new feature
• Data Reduction - In this text preprocessing stage, certain vector using per-character feature vectors such as charac-
things are removed such urls, special characters, ‘@’ and ter embeddings and (preferably) character type, the model
stopped words from tweets and converted all the text into includes a convolution and a max pooling layer for each
lower case. Further, stemming is performed, which is word.
transforming a word to its root form, and lemmatization, Combining different variation yields multiple hybrid
which reduces the words to a word existing in the lan- approaches that we have tested:
guage. This stage helps in reducing data into its simplest Glove + Fasttext ⟶ CNN ⟶ BiGRU ⟶ adam
possible form. 1. (dense, conv1d = relu;out = sigmoid), maxlen = 25
Glove + Fasttext ⟶ CNN ⟶ BiLSTM ⟶ adam
After the preprocessing of data is completed, we move 2. (dense, conv1d = relu;out = sigmoid), maxlen = 25
towards the building and training of the model stage. As Glove + Fasttext ⟶ BiLSTM ⟶ BiGRU ⟶ adam
shown in Fig. 1, for building our CNN-BiLSTM model, 3. (dense, conv1d = relu;out = sigmoid), maxlen = 25
Word Embedding approach is used as it solves various Glove + Fasttext ⟶ CNN ⟶ BiGRU ⟶ adam
issues that the simple one-hot vector encodings have. Most (dense, conv1d = relu;out = sigmoid), maxlen = 25,
crucial thing is that word embeddings boost generalization 4. trainable = True
and performance. We will stack 2 word embeddings which Glove + Fasttext ⟶ CNN ⟶ BiLSTM ⟶ adam
are GloVe and FastText. A combination of embeddings has
(dense, conv1d = relu;out = sigmoid), maxlen = 25,
been established to produce the best results. After the stack-
5. trainable = True
ing of word embedding, CNN-BiLSTM model is built. As
Glove + Fasttext ⟶ BiLSTM ⟶ BiGRU ⟶ adam
a hybrid technique has shown the potential of reducing sen-
timental errors on increasingly complex data. An ensemble (dense, conv1d = relu;out = sigmoid), maxlen = 25
ML model is also built, in which feature extraction technique ⟶ Spatialdropout1D, GlobalMaxpooling1D,
and unigram feature engineering are used. The proposed 6. GlobalAveragePooling1D
CNN-BiLSTM model is compared with an ensemble ML
model to draw out a comparison on the accuracy. The CNN-BiLSTM model that is to be used has the fol-
lowing features:
CNN‑BiLSTM Architecture
• Stacked Word Embedding: A distributed representation
A single machine learning or deep learning model can pre- of words where different words that have a similar mean-
dict the outcome rather accurately when applied to specific ing (based on their usage) also have a similar representa-
domains, but each has its own set of advantages and down- tion. Two of such word embeddings are glove and fas-
sides. LSTM usually produces superior results, but it takes text and stacking of these two embeddings provide better
longer to process than CNN, and CNN has fewer hyperpa- results
rameters and requires less supervision. In the meanwhile, • Convolutional Model: A feature extraction model that
the LSTM is more accurate for long sentences but takes learns to extract salient features from documents repre-
longer to analyze. Because RNN has a major gradient loss sented using a word embedding.
issue when processing sequences, the perception of nodes in • Fully Connected Model: The interpretation of extracted
the front decreases as nodes get further back. To tackle the features in terms of a predictive output.
problem of gradient vanishing, BiLSTM is used. It solves
the problem of fixed sequence to sequence prediction. RNN Therefore, the model is comprised of the following elements
has a limitation where both input and output have the same as shown in Fig. 2:
size. So it fails in case of machine translation where input Input layer t — The length of input sequences is defined
and output have different sizes or case of text summarization by the input layer.
where input and output have a different length, which is not Embedding layer — 100-dimensional real-valued repre-
the case with BiLSTM. The concept of combining two (or sentations and an embedding layer set to the vocabulary’s
more) methods is offered as a way of implementing the ben- size.
efits of both while also addressing some of the drawbacks of
existing techniques.
SN Computer Science
SN Computer Science (2022) 3: 401 Page 7 of 13 401
SN Computer Science
401 Page 8 of 13 SN Computer Science (2022) 3: 401
Conv1D layer — Using 32 filters and a kernel size cor- Kernel sizes— 3
responding to the amount of words to read simultaneously. Number of filters— 100
MaxPooling1D — Merge the result of the convolutional Dropout rate— 0.5
layer with this layer. Weight regularization (L2) — 3
Flatten layer — For concatenation and to convert the Batch Size — 128
three-dimensional output to two-dimensional Update Rule — Adam
Transfer function — Rectified Linear.
SN Computer Science
SN Computer Science (2022) 3: 401 Page 9 of 13 401
The Adam optimizer is computationally more efficient, tweets, see real time updates of feed and chat with friends.
requires slight memory, is invariant to diagonal resizing of The admin feature of the app allows the admin to trace
gradients, and it is well suited for problems with a lot of cyberbullying comments and block the users for a certain
data/parameters. We will perform the best parameter using amount of time.
grid search and 10-fold cross validation. Now, Convolutional Features User Register: The system allows new users to
Neural Network (CNN) models are built to classify encoded register themselves on the app.
documents as either cyberbullying or non-cyberbullying. User account: The system allows the user to create their
Now, the CNN model can be defined as follows as shown accounts by providing their emails and setting a password.
in Fig. 2: The user can also set a username for their account as well as
view their profiles.
• One Conv layer with 100 filters, kernel size 3, and relu Admin account: The system allows admin to have separate
activation function; login with which they can perform admin related activities.
• One MaxPool layer with pool size = 2; Posting Feature: The system allows user to post their tweet
• One Dropout layer after flattened; and tag theirs friends.
• Optimizer: Adam View Feed: The system allows user to view their feed and
• Loss function: binary cross-entropy (suited for binary gives admin privileges to see all kind of tweets.
classification problem) Blocking Feature: The system has special admin features
• Dropout layers are used to solve the problem of overfit- where they allow admin to suspend user accounts if they find
ting and bring generalization into the model. As a result, their comments cyberbully.
in hidden layers, it’s best to keep the dropout value near
0.5. Results and Discussion
SN Computer Science
401 Page 10 of 13 SN Computer Science (2022) 3: 401
1 LSTM Activation-sigmoid; Optimizer- 0.87 Adam and RMSProp as optimizers are used and for activa-
Adam tion layers ReLU and Sigmoid are used thus making it a total
2 LSTM Activation-relu; Optimizer-Adam 0.85 of 4 combinations.
3 LSTM Activation-sigmoid; Optimizer- 0.86 The activation function chosen has a significant effect
RMSProp on the neural network’s capabilities and performance, and
4 LSTM Activation-relu; Optimizer- 0.85 various activation functions may be utilized in various por-
RMSProp
tions of the model. The sigmoid function converts any input
into a number between 0 and 1. The function sigmoid gives
the result near to zero for small values, and a value close to
one for high values. Sigmoid is the same as a two-element
Softmax with the second element set to zero. As a result, the
sigmoid is commonly used in binary classification.
The Adam optimizer is computationally more efficient,
requires slight memory, is invariant to diagonal resizing
of gradients, and it is well suited for problems with a lot
of data/parameters, whereas the RMSProp optimization
SN Computer Science
SN Computer Science (2022) 3: 401 Page 11 of 13 401
SN Computer Science
401 Page 12 of 13 SN Computer Science (2022) 3: 401
SN Computer Science
SN Computer Science (2022) 3: 401 Page 13 of 13 401
media (student consortium). In: 2020 IEEE Sixth International 21. Lee E, Rustam F, Washington PB, El Barakaz F, Aljedaani W,
Conference on Multimedia Big Data (BigMM), pp. 297–301 Ashraf I. Racism detection by analyzing differential opinions
(2020). IEEE through sentiment analysis of tweets using stacked ensemble gcr-
15. Berrimi, M., Moussaoui, A., Oussalah, M., Saidi, M.: Attention- nn model. IEEE Access. 2022;10:9717–28.
based networks for analyzing inappropriate speech in arabic text. 22. d’Sa, A.G., Illina, I., Fohr, D.: Bert and fasttext embeddings
In: 2020 4th International Symposium on Informatics and Its for automatic detection of toxic speech. In: 2020 International
Applications (ISIA), pp. 1–6 (2020). IEEE Multi-Conference on:“Organization of Knowledge and Advanced
16. Dang, C.N., Moreno-García, M.N., De la Prieta, F.: Hybrid deep Technologies”(OCTA), pp. 1–5 (2020). IEEE
learning models for sentiment analysis. Complexity 2021 (2021) 23. Jiang, L., Suzuki, Y.: Detecting hate speech from tweets for senti-
17. Yuvaraj N, Chang V, Gobinathan B, Pinagapani A, Kannan S, ment analysis. In: 2019 6th International Conference on Systems
Dhiman G, Rajan AR. Automatic detection of cyberbullying using and Informatics (ICSAI), pp. 671–676 (2019). IEEE
multi-feature based artificial intelligence with deep decision tree 24. Mohaouchane, H., Mourhir, A., Nikolov, N.S.: Detecting offensive
classification. Comput Electr Eng. 2021;92: 107186. language on arabic social media using deep learning. In: 2019
18. Roy PK, Tripathy AK, Das TK, Gao X-Z. A framework for hate Sixth International Conference on Social Networks Analysis,
speech detection using deep convolutional neural network. IEEE Management and Security (SNAMS), pp. 466–471 (2019). IEEE
Access. 2020;8:204951–62. 25. Dubey, K., Nair, R., Khan, M.U., Shaikh, S.: Toxic comment
19. Al-Makhadmeh Z, Tolba A. Automatic hate speech detection detection using lstm. In: 2020 Third International Conference
using killer natural language processing optimizing ensemble on Advances in Electronics, Computers and Communications
deep learning approach. Computing. 2020;102(2):501–22. (ICAECC), pp. 1–8 (2020). IEEE
20. Yadav, Y., Bajaj, P., Gupta, R.K., Sinha, R.: A comparative study
of deep learning methods for hate speech and offensive language Publisher's Note Springer Nature remains neutral with regard to
detection in textual data. In: 2021 IEEE 18th India Council Inter- jurisdictional claims in published maps and institutional affiliations.
national Conference (INDICON), pp. 1–6 (2021). IEEE
SN Computer Science