A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages
A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages
Low-Resource Languages
Susmita Das1 Arpita Dutta2 Kingshuk Roy3 Abir Mondal4
Arnab Mukhopadhyay5
1
Bennett University, India
[email protected]
2
Techno Main Salt Lake, India
[email protected]
3
arXiv:2411.19017v1 [cs.CL] 28 Nov 2024
Abstract
The expanding influence of social media platforms over the past decade has impacted the way
people communicate. The level of obscurity provided by social media and easy accessibility of the
internet has facilitated the spread of hate speech. The terms and expressions related to hate speech
gets updated with changing times which poses an obstacle to policy-makers and researchers in case
of hate speech identification. With growing number of individuals using their native languages
to communicate with each other, hate speech in these low-resource languages are also growing.
Although, there is awareness about the English-related approaches, much attention have not been
provided to these low-resource languages due to lack of datasets and online available data. This
article provides a detailed survey of hate speech detection in low-resource languages around the
world with details of available datasets, features utilized and techniques used. This survey further
discusses the prevailing surveys, overlapping concepts related to hate speech, research challenges and
opportunities. Keywords: article, template, simple
1 Introduction
In the age of social networking and the exponential growth of social media platforms, people have
gained freedom of speech and online anonymity. This has propelled the transmission of hate speech
[65] on multiple social networking platforms and has presently transformed into a global issue. Social
media platforms have become a centre for online communications, discussions, opinion exchange and
arguments. People’s inclination to defend their opinions or ideologies often leads to online debates
which occasionally transform into aggressive online behaviour and hate speech. Users often express their
animosity in the form of hate speech by berating a certain individual or a group based on their religion,
ethnicity, nationality, gender, sex, skin colour or even sexual orientation[135]. Social media companies
have implemented their own policies for prevention of hate speech in the cyber-space which often relies
on multiple users reporting a certain post or the platform admins regulating the timelines. The evolving
social networks depicts increase in prejudiced perceptions, increase speed of hate speech spread[95], delay
in restrictions have posed a great challenge in counteracting against hate speech.
Online social networks produce huge amounts of data which makes manual monitoring for hate speech
infeasible. This naturally leads to requirement for automatic online hate speech detection techniques. As
Natural Language Processing(NLP), Machine Learning(ML) and Deep Learning(DL) approaches make
great headway in multiple pragmatic solutions to real-life problems, researchers obtained motivation
in analysing different methods for detecting online hate speech. Researchers have assembled multiple
large-scale datasets for hate speech, mostly in English language(the most prevalent language worldwide).
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
Although several studies have been organized to deal with hate speech in multiple non-English languages,
lack of resources has restricted extensive research. There are various other limitations apart from resource
availability which have been acknowledged and require further research. Expanding on the previously
available surveys on online hate speech, this paper provides an updated survey on automatic hate speech
detection in low-resource languages. In this survey study, we have discussed the following issues associated
with online hate speech detection, that collectively constitute our primary contributions:
1. Various categories of hate speech and their corresponding background information along with
multiple approaches for online hate speech detection are explained elaborately from the existing
literature.
2. The datasets for hate speech detection are studied well. Moreover, the available datasets for several
low resource languages have received special focus.
3. The multiple cutting-edge natural language processing and deep learning-based techniques used in
low-resource languages are analyzed to identify online hate speech.
4. The challenges encountered in identifying hate speech along with potential directions for further
research have been addressed.
The principal objective of this survey is to provide a comprehensive idea about the history of research
on hate speech and a detailed account of its application in case of low-resource languages. This survey
paper is organized as follows: Section 2 consists background study detailing the definition of hate speech
as explained by different social media giants, categorizing hate speech and the overlapping concepts
related to hate speech. Section 3 explains the search for related works and literature. Section 4 delves
into the datasets available in English and low-resource languages. Section 5 expounds on automatic hate
speech detection, primarily in low-resource languages. These low-resource languages have been presented
continent-wise such as European, African, Latin American and Asian to bring clarity. Asian Languages
have again been elaborated into Indic Languages and non-Indic languages as India has considerable
spoken languages. Section 6 details the research challenges and other opportunities with Section 7
consisting of the Conclusion. The overview of this survey article has been depicted in Fig. 1.
2 Background Study
Although hate speech has existed in society for a prolonged time, detection of hate speech from the
perspective of computer science is quite a recent development.
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
2.1 Hate Speech
Defining hate speech can be quite difficult as it takes into account complex relationships and viewpoints
among different groups of people and the definition is not consistent. As of now, the international human
rights law does not have a comprehensive definition of hate speech, but UN Strategy and Plan of Action
on Hate Speech has defined hate speech as follows:
“any kind of communication in speech, writing or behaviour, that attacks or uses pejorative or discrimi-
natory language with reference to a person or a group on the basis of who they are, in other words, based
on their religion, ethnicity, nationality, race, colour, descent, gender or other identity factor.”
Detecting hate speech is challenging as it relies on language variations. In case of online hate speech,
multiple social media platforms have given their definition and policies, which are as follows:
1. Meta: Meta, the parent company of multiple social media platforms like Facebook, Instagram etc.,
has defined hate speech as follows: ”We define hate speech as a direct attack against people – rather
than concepts or institutions – on the basis of what we call protected characteristics: race, ethnicity,
national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and
serious disease.” 1
2. YouTube: YouTube has established its policies to tackle hate speech and harassment. As per
official pages of YouTube, hate speech is considered as follows:”We consider content to be hate
speech when it incites hatred or violence against groups based on protected attributes such as age,
gender, race, caste, religion, sexual orientation or veteran status.” 2
3. Twitter/X: Twitter, presently known as X, has also provided their hate speech definition: ”You
may not attack other people on the basis of race, ethnicity, national origin, caste, sexual orientation,
gender, gender identity, religious affiliation, age, disability, or serious disease.” 3
4. TikTok: TikTok, a comparatively recent social media platform, has issued their definition: ”Hate
speech and hateful behaviour attack, threaten, dehumanise or degrade an individual or group based
on their characteristics. These include characteristics like race, ethnicity, national origin, religion,
caste, sexual orientation, sex, gender, gender identity, serious disease, disability and immigration
status.” 4
5. LinkedIn: LinkedIn is a social media platform focused on employment and professional improve-
ment. As per their policies, they have defined hate speech as: ”Hate speech, symbols, and groups
are prohibited on LinkedIn. We remove content that attacks, denigrates, intimidates, dehumanizes,
incites or threatens hatred, violence, prejudicial or discriminatory action against individuals or
groups because of their actual or perceived race, ethnicity, national origin, caste, gender, gender
identity, sexual orientation, religious affiliation, or disability status.” 5
As depicted in Fig. 2, millions of people regularly use multiple social media platforms. It is imperative
that these social networking giants implement their policies regarding hate speech to constrain the spread
of these discriminatory posts and content.
2.2 Categorization of Hate Speech
The evolving definition of hate speech has included multiple elements under its blanket. Analysis of hate
speech contains different categories based on the characteristics of a group that are being targeted. The
prevalent categories of hate speech are:
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
Figure 2: Monthly Active Users on Different Social Media Platforms[42]
2.2.4 Ableism:
Negative attitudes are often displayed against people with disabilities. Friedman et al.[49] have proposed
these disabilities may include physical, sensory as well as cognitive disabilities. Sometimes this may
include discrimination against people with certain diseases. This category of hate speech manifests from
the belief that people with disabilities are somehow inferior.
As observed in Table 1, social media platforms have defined hate speech as discrimination against
specific groups. Most of these definitions have considered religion, race, gender discrimination, colour,
ethnicity, disability. Some other criteria like serious disease, age, immigration status etc, have also been
included in hate speech definition.
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
2.3.1 Hate:
While hate speech is stereotyping and focusing on a particular group, hate is generally displaying ag-
gressive behaviour for no particular reason.
2.3.2 Discrimination:
Discrimination is identifying a certain group or community and making unfair distinctions, prejudiced
treatment. Hate speech is basically discrimination, but online or verbal in nature.
2.3.3 Cyberbullying:
Cyberbullying is targeted harassment of a certain individual on social media platforms. Dinakar et al.[40],
Smith et al.[137] have explained that much like traditional bullying, cyberbullying involves repeated
aggression towards the same individual or group of individuals.
Race/
Sexism/Gender Religion Ableism Cyberbullying Abusive
Ethnicity
1. ”He is a good person.” No No No No No No
2. ”May be he is a cheapskate.” No No No No No Yes
3. ”Why doesn’t he realize
he is a chunker. These clothes No No No No Yes Yes
are not for him”
4. ”This person with
<insert skin colour>is Yes No No No Yes Yes
so dirty”
5. ”These two men are obnoxious.
No Yes No No Yes Yes
I don’t want to see these homos”
6. ”Never seen a <insert religion>
No Yes Yes No Yes Yes
who is a pansy ;) .”
7. ”He is a moron. People call
No No No Yes Yes Yes
him THE DUMB.”
8. ”People from <insert country>
Yes No No No No Yes
are not good.”
9. ”People from <insert religion>
No No No Yes No No
should not be allowed inside.”
”Cyberbullying”, ”Abusive Language”, ”Hate Speech” have a lot of similarities among themselves
and their concepts often overlap with each other. Recognizing and classifying online contents accordingly
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
(a) Content Percentage of Frequently Used Lan- (b) Content Percentage of Frequently Used Lan-
guages on YouTube[157] guages on Facebook[131]
can become challenging. Racial slurs, sexist terms etc. can be used to bully a person online. In this case,
a particular post can be classified as both hate speech and cyberbullying or abusive. There are instances
where, the message is a case of cyberbullying but cannot be regarded as hate speech. As shown in Table
2, a certain post or message can be designated under multiple categories of hate speech. As for example,
in sentence-4 of Table 2, (”This person with ¡insert skin colour¿ is so dirty”) is categorised as abusive,
cyberbullying and at the same time racist hate speech. Similarly, in sentence-6, (”Never seen a <insert
religion>who is a pansy ;) .”) contains abusive term (”pansy”) and is categorised as both religious hate
speech and cyberbullying. Sentence-2 contains abusive word ”cheapskate”, but it may not be categorised
as cyberbullying, due to the fact that the writer of the post is just wondering not targeting the person.
Again, sentence-9, (”People from ¡insert religion¿ should not be allowed inside.”), does not contain any
abusive language or cannot be categorised as cyberbullying. This sentence is definitely discriminating a
group of people based on religion which is hate speech. These examples demonstrates that cyberbullying
and abusive language can be potential determinants of hate speech, but presence or non-presence of these
components does not determine whether a certain message is hate speech. As in Fig. 3, hate speech and
the other related or adjacent concepts have intersections in their relationships with each other.
Due to complexities of language formation and semantics, annotation of dataset for hate speech can
be ambiguous. Although abusive language has been used as a blanket term for many categories of hate
speech. Van Hee et al.[146] has considered racist and sexist remarks as part of offensive language, while
Davidson et al.[35] clearly distinguishes both. In case of annotations of online posts, while some posts
are considered as hate speech by one research[154], those same posts are considered as abusive language
in another study[105]. Thus, complex language norms as well as ambiguous dataset annotation, both
contribute to complications in hate speech detection.
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
3 Related Works
Hate speech has been explored in multiple research topics. There are multiple studies regarding hate
speech in politics, law, journalism, human behaviour etc. When it comes to computer science, studies on
hate speech have been a comparatively recent trend. A substantial amount of resources are available in
public domain encouraging researchers to study online hate speech. As English is the prevailing language
for information exchange on the internet, most resources available are predominantly in English. Apart
from English, account holders on various social media platforms have started to use their native languages
for communication purposes. These native languages can be considered online low-resource languages,
as considerably fewer resources can be obtained for these languages. As depicted in Fig. 4, it has been
observed that English is the dominant language used in all social media platforms. After English, it
is discerned that there is an abrupt decrease in the content percentage of other languages across all
platforms. While Spanish is the second-most used language on both YouTube and Facebook as shown in
Fig. 4a and Fig. 4b respectively, German is the second-most used language on Twitter/X as portrayed
in Fig. 4c, while Arabic is the second-most used language by moderators on TikTok as shown in Fig.
4d.
The number of research articles and available datasets is not vast. In the case of low-resource
languages, the frequency of articles is quite sparse. However, we accumulated a considerable number of
scholarly articles from multiple digital libraries and websites.
Table 3: Keywords used for Relevant Document Search
Hate Speech related Keywords Dataset Related Keywords Language Related Keywords
Hate Speech Detection, Hate Speech Survey, Spanish, Dutch, Danish, Italian,
Hate Speech Dataset
Hate Speech Detection Systematic Review, French, German, Portuguese, Greek
Cyberbullying, Cyberbullying Detection, Offensive Language Dataset, Turkish, Indonesian, Korean,
Cyberbullying Survey, Trolling Abusive Language Dataset Mandarin, Chinese, Japanese
Racism, Racism on social media, Racist speech, Arabic, Swahili, Nepali,
Multilingual Dataset
Ableism, Disability Vietnamese, Bengali, Urdu
Sexism, Sexism on Social Media, Sexist Speech, Low-Resource Dataset,
Hindi, Marathi, Assamese
Misogyny, Gender Discrimination, Homophobia, Multimodal Hate Speech
Telugu, Kannada, Telugu
Transgender Dataset
Hinglish, Bengali code-mixed,
Religious Hate Speech, Religious Discrimination Benchmark Dataset
Hindi code-mixed
Abusive Language, Offensive Language, Indic Language, Dravidian
Code-Mixed Dataset
Swear Words, Toxic Comments, Insults Language, Asian Language
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
for hate speech detection. In later stages, our focus shifted to assembling information on low-resource
languages. Our search became more specific on gathering low-resource datasets, and automatic hate
speech detection articles on multiple low-resource languages around the world as well as in India. We
have gathered documents from 2005 but our concentration has been primarily on research articles between
2017 to current time.
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
et al.[90] provides an extensive review of cyberbullying detection methods being used in the case of low-
resource languages. Chhabra et al.[28] presented a survey of hate speech detection techniques in multiple
languages as well as multiple modalities. Pamungkas et al.[110] introduces a survey of abusive language
detection methods across multiple domains in multiple languages.
Recently, an increasing number of people are using their native languages to communicate on social
media platforms. Accordingly, the number of surveys on hate speech, cyberbullying and abusive words
in low-resource languages has increased in the past few years. Our motivation has been to provide an
up-to-date literature review highlighting the methods used for hate speech detection in low-resource
languages along with the resources and datasets available in these languages.
4 Available Datasets
Several valuable resources have been identified during the course of this review. While most of the
resources and datasets are open source projects, there are few resources for which permission is needed
from the authors and creators of the datasets. Most of the publicly accessible datasets are present in
GitHub, while a few of them are obtainable from different other repositories. Majority of the datasets
have been compiled from public resources available on multiple social media platforms. The prevalent
social media platform for data collection is observed to be Twitter/X, while Facebook and Reddit are
dataset sources. A substantial number of studies have also considered YouTube, Gab etc. English is
dominant language in which a major part of the studies have been conducted. Research have been
carried out on other European languages like Spanish, French, German etc. Recently, studies on Asian
languages like Arabic, Korean and multiple Indic languages are gaining attention.
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
4.1.4 Zhang Dataset:
9
Zhang et al.[160] collected their dataset from Twitter consisting of 2435 tweets. These accumulated
tweets are specific to refugees, immigrants and muslims. In this dataset, 414 has been annotated as hate
while 2021 has been labelled as non-hate.
1. Coltekin Dataset:24 Coltekin[30] have compiled this monolingual dataset consisting of 36232 tweets
in Turkish language from the Twitter which have been sampled randomly. A major portion, 92.1% of
the tweets are from unique users. A hierarchical annotation process has been followed during labelling
of tweets. In the top level, the tweets are annotated as offensive or non-offensive. The offensive tweets
are further identified with targeted communities and whether these are targeting individuals, groups or
others. In this dataset, 80.6% are non-offensive tweets, while 19.4% are offensive tweets. From these
offensive tweets, 21.18% are not targeted to any community, 25.45% are targeted to groups, 48.02% are
targeted towards individuals and 5.33% towards others.
2. Beyhan Dataset:25 Beyhan et al.[19] accumulated the monolingual dataset on hate speech in
9 https://github.com/ziqizhang/data#hate
10 https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset
11 https://github.com/punyajoy/HateXplain
24 https://coltekin.github.io/offensive-turkish/
25 https://github.com/verimsu/ Turkish-HS-Dataset
10
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
Table 4: Monolingual Datasets in Low-resource Languages
Turkish language from Twitter. There are 1033 hate speech tweets related to gender and sexual orienta-
tion in Istanbul Convention dataset, while 1278 hate speech tweets are connected to refugees in Turkey
in Refugee dataset. The tweets have been annotated into 5 labels: ’insult’, ’exclusion’, ’wishing harm’,
’threats’ and lastly ’not hate’. For Istanbul Convention dataset, the chosen keywords and filtering process
were adjusted accordingly to capture hate speech. For Refugee dataset, random sampling was performed
to reduce bias during tweets collection. In Istanbul Convention dataset, 36.78% tweets are related to
’insult’, 11.42% tweets are categorised in ’exclusion’, 3.38% in ’wishing harm’, 0.09% in ’threat’ and
48.30% as ’not hate’. In Refugee dataset, 14.16% tweets are affiliated to ’insult’, 21.67% to ’exclusion’,
3% to ’wishing harm’, 0.5% to ’threat’ and 60.56% to ’not hate’.
3. Wich Covid2021 Dataset:26 Wich et al.[155] constructed the monolingual dataset from Twit-
ter compiled of 4960 tweets in German. Emphasis has been given on hate speech related to COVID-19
and 65 keywords have been used to collect the data. The dataset has been annotated into two classes:
’Abusive’ and ’Neutral’. A major portion of the dataset with 3855 tweets(78%) has been labelled as
’Neutral’, while 1105 tweets(22%) has been labelled as ’Abusive’.
4. Tulkens Dataset:27 Tulkens et al.[144] collected a Dutch hate speech monolingual dataset with 6375
social media comments retrieved from two Facebook sites of Belgian organizations. The dataset has been
annotated with ’racist’, ’non-racist’ and ’invalid’ labels. ’Non-racist’ label consists of 4943(77.5%) com-
ments, ’racist’ label is comprised of 1088(17.06%) comments, while ’invalid’ label constitutes 344(0.05%)
26 https://github.com/mawic/german-abusive-language-covid-19
27 https://github.com/clips/hades
11
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
comments.
5. Sigurbergsson Dataset: Sigurbergsson et al.[136] have gathered the Danish hate speech monolin-
gual dataset from Facebook and Reddit. It constitutes of 3600 comments with 800(22.2%) comments
from Facebook page of Ekstra Bladet, 1400(38.8%) comments from Danish subreddit r/Denmark and
1400(38.8%) comments from subreddit r/DANMAG. Hierarchical annotation process has been followed
where 3159(87.75%) are ’not offensive’, while 441(12.25%) comments are ’offensive’. A total of 252 offen-
sive comments are targeted where 95 are towards individuals, 121 towards groups and 36 towards others.
6. K-MHaS Dataset:28 Lee et al.[86] collected a multilabel monolingual dataset from Korean news
comments. It consists of 109692 utterances which have been annotated into multiple labels. The labels
consists of ’Politics’, ’Race’, ’Origin’, ’Religion’, ’Physical’, ’Age’, ’Gender’, ’Profanity’ and ’Not Hate
Speech’. While 59615(54.3%) utterances are ’Not Hate Speech’, 50000(45%) are ’Hate Speech’ under
different labels.
7. Bohra Dataset:29 Bohra et al.[21] collected this dataset from Twitter consisting of 4575 tweets
in Hindi English code-mixed language. The data has been fetched in json format, which consists of
multiple information such as userid, text, user, replies, re-tweets etc. The samples have been annotated
as either ’Hate Speech’ or ’Normal’. In total 1529(33.4%) comments have been annotated as hate speech
and 3046(66.57%) comments have been annotated as non-hate.
8. BD-SHS Dataset:30 Romim et al.[123] compiled this Bengali dataset with 50281 comments from
multiple social networking sites like YouTube, Twitter, Facebook etc. A hierarchical approach has been
obtained for annotation of the samples, where the samples are first identified as hate speech. Prior to
that, the target and the type of hate speech is identified. In total, 26125(51.9%) comments are non-hate
while 24156 (48.04%) comments are hate speech.
1. HatEval-2019 Dataset:31 Basile et al.[16] compiled this multilingual dataset comprising of 19600
tweets, which consists of 13000 English posts and rest 6600 Spanish posts. These hate tweets mainly
targets immigrants and women. While 10509 tweets are misogynistic in nature, 9091 tweets spews
hate about immigrants. The data has been annotated in Offensive Language Identification Dataset
(OLID)[158] format, where the posts are initially labelled with binary classification. In following level,
the hate speech is again specified with annotation whether the hate speech is against an individual or a
community and whether the person is displaying aggression or not.
2. GermEval-2019 Dataset:32 Wiegand et al.[156] constructed this dataset from Twitter consist-
ing of 8541 German tweets. The tweets are mainly concerned about the refugee crisis in Germany and
have been annotated using two labels: ’Offensive’ and ’Neutral’. In this dataset, 2890(33.83%) tweets
have been labelled as ’Offensive’ and 5651(66.16%) tweets have been labelled as ’Neutral’.
3. OffensEval-2020 Dataset:33 Zampieri et al.[159] compiled the multilingual dataset with five lan-
guages: English, Greek, Danish, Arabic and Turkish. There are 9093037 English tweets which becomes
one of the largest datasets. In case of the other low-resource languages, the dataset constitutes 10000
Arabic tweets, 3600 Danish comments from Facebook, Reddit and Ekstra Bladet newspaper site, 10287
Greek tweets and 35000 Turkish tweets.
28 https://github.com/adlnlp/K-MHaS
29 https://github.com/deepanshu1995/HateSpeech-HindiEnglish-Code-Mixed-Social-Media-Text
30 https://github.com/naurosromim/hate-speech-dataset-for-Bengali-social-media
31 https://github.com/msang/hateval/
32 https://projects.cai. fbi.h-da.de/iggsa/
33 https://sites.google.com/site/offensevalsharedtask/home
12
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
Table 5: Multilingual Datasets in Low-resource Languages
13
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
4.3.5 MUTE Dataset:
41
Hossain et al.[61] accumulated a multimodal hateful memes dataset with captions in Bengali or code-
mixed(Bengali+English) language. This dataset consists of 4158 memes which has been labelled as either
Hate or Not-Hate. Memes have been collected manually from multiple social media platforms such as
Twitter, Facebook and Instagram. Keywords such as Bengali Memes, Bangla Funny Memes etc have
been used during the manual search. While, 2572 (61.85%) samples have been annotated as Not-Hate,
1586 (38.14%) samples have been annotated as Hate.
The rampant utilisation of hate speech has become a bother in various online communities. Social
media platforms encourage users to inform and reveal any hate speech encountered online and action is
taken against the posts according to company policy. As depicted in Fig. 6, Meta has released the count
of hate speech content deletion on both Facebook and Instagram in different quarters. In a global survey
across 16 countries conducted by IPSOS and UNESCO, it has been observed that most hate speech is
experienced on Facebook. At the same time, LGBT community is the most targeted community in hate
speech. The details of hate speech encountered on various social media platforms has been depicted in
Fig. 7a. The frequently attacked communities on social media are shown in Fig. 7b.
41 https://github.com/eftekhar-hossain/MUTE-AACL22
14
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
5.1 English
English is the most spoken language globally, which makes it a dominant language for communication.
When it comes to social media platforms, websites, most people utilise English while conveying their
opinions. In case of research related to hate speech and abusive words, English is majorly studied
because of multiple reasons. The data available from public sites are majorly in English, which leads
to considerable number of benchmark datasets. The NLP techniques and parsers developed for English
motivated researchers to investigate hate speech detection and conducting comparable studies focusing
on the dominant language. Few of them have been explored in this survey.
Ghosh et al.[55] introduced a English tweets based multi-domain hate speech corpus (MHC) which
is compiled using tweets related to diverse domains such as natural disasters, terrorism, technology, and
human/drugs trafficking. The dataset contains 10242 samples collected from Twitter having balanced
distribution with hate and non hate classes. Using multiple deep learning methods, the authors developed
their own classifier named SEHC for binary classification of hate speech. Caselli et al.[22] proposes
HateBERT which has been re-trained for detection of offensive language in English. A large scale
English Reddit comments dataset RAL-E has been used to provide the comparative results with general
BERT model, where abusive language inclined HateBERT performed better. Saleh et al.[128] proposed
employing domain-specific word embedding and BiLSTM for hate speech detection in English. The hate
speech corpus consists of 1048563 sentences depending on which the word embedding is built. Roy et
al.[126] proposes DCNN model which employs GloVe embedding vector to represent semantics of the
posts and eventually hate speech detection. It achieved a precision of 0.97, recall of 0.88 and F1-score of
0.92 for the best case and outperformed the existing models. Rottger et al.[125]proposes HATECHECK
which covers model functionalities and compare them with transformer models. Ahluwalia et al.[2] focuses
on misogyny detection with hate tweets directed towards women in tweets that are written in English.
Authors have used binary classification using ML techniques for Automatic Misogyny Identification
(AMI) at EVALITA2018.
15
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
and various definitions of hate speech. The model identifies hate targets and actions for online commu-
nities, emphasizing the significance of context in online hate detection.
Dutch: Markov et al.[94] evaluates models using two Dutch social media datasets: LiLaH (Facebook
comments) and DALC (tweets). Ensemble method is introduced by combining gradient boosting with
transformer-based language models (BERTje and RobBERT) and SVM, along with additional features
like hateful words, personal pronouns, and message length. The ensemble approach significantly surpasses
independent models in both in-domain and cross-domain hate speech detection settings, providing in-
sights into challenges in Dutch hate speech detection.
Polish: Ptaszynski et al.[116] introduced a dataset for research on cyberbullying in Polish. This dataset
has been segregated into binary classification and multi-class classification. This dataset has been col-
lected from Twitter with more than 11k tweets. Korzeniowski et al.[83] discusses the challenges faced in
hate speech detection using supervised learning due to the scarcity of annotated data. Authors have used
unsupervised techniques like fine-tuning of pre-trained BERT and ULMFiT model and have proposed a
Tree-based Pipeline Optimization Tool for finding the optimal solution.
Austrian German: Pachinger et al.[109] introduced a dataset named Austrotox which has been col-
lected from an Austrian newspaper named ’DerStandard’. This newspaper reports on multiple topics,
both national and international. The authors have filtered out 123108 comments with Austrian dialect
on the website of this newspaper. Toxicity score has been assigned to the comments where 873 posts
have been categorized as severely toxic, while other posts have lower level of toxicity.
German: Different NLP techniques have been used for hate speech detection in German. Eder et al.[44]
proposed an approach that includes a workflow for data-driven acquisition and semantic scaling of a lex-
icon covering rough, vulgar, or obscene terms. They applied best-worst scaling for rating obscenity and
used distributional semantics for automatic lexical enlargement. They have used German slice of Wik-
tionary, OpenThesaurus, and corpora like CODE ALLTAGS+d email corpus, DORTMUNDER CHAT
KORPUS, and FASTTEXT word embeddings based on COMMON CRAWL and WIKIPEDIA. They
performed manual editing and utilized distributional semantics methods for lexical expansion. Jaki et
al.[68] have analyzed more than 50k German hate tweets during the 2017 German federal elections. The
authors collected right-wing German Twitter users with subversive profiles. They conduct a comprehen-
sive analysis of characteristics typical of right-wing hate speech, such as the use of derogatory language,
dissemination of misinformation, and incitement to violence.
Norwegian: Hashmi et al.[60] categorized the dataset into five distinct classes based on the intensity
of hate speech, compiled from Facebook, Twitter, and Resset. The paper presents a deep learning ap-
proach with GRU and BiLSTM using FastText word embeddings, forming the FAST-RNN model. It
also discusses the implementation of multilingual transformer-based models with hyperparameter tuning
and generative configuration for Norwegian hate speech detection. The FAST-RNN model outperformed
other deep learning models and achieved a high Macro F1-Score.
Finnish: Jahan et al.[67] have used a Finnish annotated hate speech dataset collected from the Suomi24
forum. It contains 10.7k sentences with 16.7% identified as hate speech. The approach includes exper-
iments with the FinBERT pre-trained model and Convolution Neural Network (CNN). They compared
FinBERT’s performance with multilingual-BERT (mBERT) and other models using fastText embed-
dings. Feature engineering strategies like TF-IDF and n-grams were also applied. FinBERT achieved
91.7% accuracy and 90.8% F1 score, outperforming all other models including mBERT and CNN with
fastText embeddings.
Swedish: Fernquist et al.[46] proposed automatic hate detection techniques from digital media. Specifi-
cally, the language models for Swedish hate speech are fine-tuned and the results are examined based on
the pre-trained models. The authors collected labelled data from Swedish online forums like Avpixlat,
Flashback, Familjeliv etc and curated 17330 posts. Authors chose comments with entities as hateful
comments are often directed towards communities or persons.
16
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
offensive, and nine hate speech categories. The baseline experiments on the HateBR corpus achieved
an F1-score of 85%, surpassing the current state-of-the-art for the Portuguese language. ToLD-Br is
manually annotated into seven categories and discusses the use of state-of-the-art BERT models for bi-
nary classification of toxic comments. Monolingual BERT models trained on Brazilian Portuguese and
Multilingual BERT models were used. The paper also explores transfer learning and zero-shot learning
using the OLID dataset. The monolingual BERT models achieved a macro-F1 score of up to 76% using
monolingual data in the binary case.
Latin-American Spanish: Aldana et al.[7] presents a method for detecting misogynistic content in
texts written in Latin-American Spanish. The approach integrates multiple sources of features to enhance
the accuracy of misogyny detection with transformer-based models. Aguirre et al.[1] have accumulated
235251 Spanish comments from 200 YouTube videos based on discrimination against Venezuelan refugees
in Peru and Ecuador. The comments are mainly categorized as racist, xenophobic and sexist hate speech.
17
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
Roman-Urdu and Urdu: Akhter et al.[3] proposed the first offensive dataset of Urdu curated from
social media. Character and word-level n-grams techniques for feature extraction have been employed.
The study applies 17 classifiers from 7 ML techniques where character n-grams combined with regression-
based models performed better for Urdu. LogitBoost and SimpleLogistic outperformed other models,
achieving F-measure of 99.2% on Roman-Urdu and 95.9% for Urdu datasets. Ali et al.[9] curated Urdu
tweets for sentiment analysis-based hate speech detection. The approach uses Multinomial Naı̈ve Bayes’
(MNB) and SVM. For class imbalance, data sparsity and dimensionality, Variable Global Feature Se-
lection Scheme (VGFSS), dynamic stop words filtering, and Synthetic Minority Optimization Technique
(SMOTE) were employed. The results showed that resolving high dimensionality and skewness provided
the utmost improvement in the overall performance.
Roman-Pashto: Khan et al.[78] used a Roman Pashto dataset, created by collecting 60K comments
from various social media platforms and manually annotating them. The approach includes: bag-of-
words (BoW), term frequency-inverse document frequency (TF-IDF), and sequence integer encoding.
The model architectures used are 4 traditional classifiers and a deep sequence model with BiLSTM. The
random forest classifier achieved a testing accuracy of 94.07% using unigrams, bigrams, and trigrams, and
93.90% with TF-IDF. The highest testing accuracy of 97.21% was obtained using the BiLSTM model.
Japanese: Fuchs et al.[50] utilized a Twitter dataset collected and filtered for 19 preselected female
politician names with a corpus consisting of 9449645 words. An explorative analysis applying computa-
tional corpus linguistic tools and methods, supplemented by qualitative in-depth study is proposed. It
combines quantitative-statistical and qualitative-hermeneutic methods to detect and analyze misogynist
or sexist hate speech and abusive language on Twitter. Abusive language was present in a significant
portion of the negative tweets, with percentages ranging from 33.33% to 48.6% for the politicians studied.
The analysis also highlighted the use of gendered vocabulary and references to physical appearance in
the abuse. Kim et al.[82] collected Twitter data and retweet networks were used to measure adoption
thresholds of racist hate speech. Tweets were collected using 15 keywords related to racism against
Koreans. The detection of hate speech posts was crucial, and a SVM with TF-IDF values was used for
classification.
Korean: Kang et al.[72] formed a multilabel Korean hate speech dataset with 35K comments, includ-
ing 24K online comments, 1.7K sentences from Human-in-the-Loop procedure, 2.2K neutral sentences
from Wikipedia, and 7.1K rule-generated neutral sentences. The dataset is designed considering Korean
cultural and linguistic context to deal with Western context in English posts. It includes 7 categories of
hate speech and employs Krippendorff’s Alpha for label accordance. The base model achieved an LRAP
accuracy of 0.919.
Burmese: Nkemelu et al.[104] curated a dataset of 226 Burmese hate speech from Facebook using its
CrowdTangle API service. It involves a community-driven process with context experts throughout the
ML project pipeline, including scoping the project, assessing hate speech definitions, and working with
volunteers to generate data, train, and validate models. Classical machine learning algorithms were em-
ployed, with feature combinations of n-grams and term-frequency weighting. The best-performing model
was FastText, achieving good precision, recall, and F1-score.
Vietnamese: Do et al.[43] used dataset from the VLSP Shared Task 2019: Hate Speech Detection on
Social Networks which includes 25431 items. The authors implemented a framework based on the ensem-
ble of Bi-LSTM models for hate speech detection. Word embeddings used were Word2Vec and FastText,
with FastText achieving better results. The Bi-LSTM model achieved the best results with a 71.43%
F1-score on the public standard test set of VLSP 2019. Luu et al.[89] introduced the ViHSD dataset,
which is a human-annotated dataset for automatically detecting hate speech on social networks. It con-
tains over 30K comments annotated as Hate, Offensive or Clean. Data was collected from Vietnamese
Facebook pages and YouTube videos, with preprocessing to remove name entities for anonymity. They
implemented Text-CNN and GRU models with fastText pre-trained word embedding, and transformer
models like BERT, XLM-R, and DistilBERT with multilingual pre-training. The BERT model with
bert-base-multilingual-cased achieved the best result with 86.88% accuracy and 62.69% F1-score on the
ViHSD dataset.
Indonesian: Sutejo et al.[141] develop deep learning models for Indonesian hate speech detection from
speech and text. Authors created a new dataset Tram containing posts from various social media such
as LineToday, Facebook, YouTube, and Twitter. Both acoustic features for speech and textual features
for text were utilized to compare their accuracies. Experiments result depicted that textual features
are more efficient than acoustic features for hate speech detection. The best text-based model gained
Fl-score of 87.98% .
18
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
5.2.5 Indian sub-continent Low Resource Languages:
The Indian subcontinent region consists of multiple countries with diverse languages. A miniscule num-
ber of languages have been studied regarding hate speech detection. Languages like Bengali, Sinhala,
Nepali have large speakers in the different countries of Indian subcontinent and have gained attention
from researchers.
Sinhala: Sandaruwan et al.[129] proposed lexicon-based and machine learning-based models for de-
tecting hate speech in Sinhala social media. Corpus-based lexicon generated an accuracy of 76.3% for
offensive, hate, and neutral speech detection. A corpus of 3k comments were curated consisting of texts
which are evenly distributed among hate, offensive and neutral speeches. Multinomial Naı̈ve Bayes com-
bined with character tri-gram provided an accuracy of 92.33% with the best recall value as 0.84.
Nepali: Niraula et al.[103] used a dataset consisting of over 15,000 comments and posts from diverse
social media platforms such as Facebook, Twitter, YouTube, Nepali Blogs, and News Portals. The
dataset was manually annotated with fine-grained labels for offensive language detection in Nepali. The
approach discussed involves supervised machine learning for detecting offensive language. The authors
experimented with word and character features, and employed machine learning models. They proposed
novel preprocessing methods for Nepali social media text. The results showed that character-based fea-
tures are extremely useful for classifying offensive languages in Nepali. The Random Forest classifier
was chosen for fine-grained classification and achieved F1 scores of 0.87 for Non-Offensive, 0.71 for Other
Offensive, 0.45 for Racist, and 0.01 for Sexist categories.
Bengali: While Bengali as a language is spoken in whole of Bangladesh, it is also the second most
spoken language in India. As a result, Bengali has acquired adequate observation in terms of hate speech
detection.
ToxLex bn by Rashid et al.[120]is an exhaustive wordlist curated for detecting toxicity in social me-
dia. It consists of 1968 unique bigrams or phrases derived from 2207590 comments. The paper analyzes
the Bangla toxic language dataset by developing a toxic wordlist or phrase-list as classifier material.
Hossain et al.[62] presented a dataset of 114 slang words and 43 non-slang words with 6100 audio clips.
Data was collected from 60 native speakers of slang words and 23 native speakers of non-abusive words.
Junaid et al.[71] created their dataset using Bangla videos from YouTube. It involved machine learning
classification methods and deep learning techniques. The logistic regression model achieved the best
average accuracy of 96%. The fine-tuned GRU model achieved an accuracy of 98.89%, while the fine-
tuned LSTM model achieved 86.67%. Jahan et al.[66] used a large-scale corpus with abusive and hateful
Feature
Author Models Metrics
Representation
Word2Vec, FastTest, SVM, LSTM, SVM (Acc- 87.5%,
Romim et al.[122]
BengFastText Bi-LSTM F1-Score - 0.911)
Conv-LSTM (P- 0.79,
Conv-LSTM,
R-0.78, F1 - 0.78)
Bi-LSTM,
XLM-RoBERTa (P- 0.82,
Karim et al.[76] Word Embeddings m-BERT(cased & uncased),
R- 0.82,F1- 0.82)
ResNet-152,
DenseNet-161 (P- 0.79,
DenseNet-161
R- 0.79, F1- 0.79)
MuRIL(Acc- 0.833,
m-BERT, XLM-Roberta, F1- 0.808)
Das et al.[33] Linguistic Features
IndicBERT, MuRIL XLM-RoBERTa(Acc- 0.865,
F1- 0.810)
XML-RoBERTa (P-0.87,
m-BERT(cased & uncased),
R- 0.87, F1- 0.87)
XLM-Roberta,
Conv-LSTM (P- 0.79,
Karim et al.[75] Linguistic Features BiLSTM,
R- 0.78, F1- 0.78)
Conv-LSTM,
GBT (P- 0.71,
ML Baselines
R- 0.69, F1- 0.68)
Bengali posts collected from various sources. The authors provided a manually labelled dataset with
15K Bengali hate speech. They considered the existing pre-trained BanglaBERT model and retrained it
with 1.5 million offensive posts. The architecture includes Masked-Language Modeling(MLM) and Next
Sentence Prediction (NSP) as part of the BERT training strategies. BanglaHateBERT outperformed
19
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
the corresponding available BERT model in all datasets. Keya et al.[77] used Bengali offensive text from
the social platform (BHSSP), which includes 20,000 posts, comments, and memes from social networks
and Bengali news websites. The approach combines BERT architecture and GRU to form G-BERT
model. G-BERT achieved an accuracy of 95.56%, precision of 95.07%, recall of 93.63%, and F1-score
of 92.15%. Das et al.[31] proposed an encoder–decoder-based ML model, followed by attention mech-
anism, LSTM, and GRU-based decoders. The attention-based decoder achieved the best accuracy of
77%. A dataset of 7,425 Bengali comments from various Facebook pages was used. The comments were
classified into 7 categories: Hate speech, aggressive comment, religious hatred, ethnical attack, religious
comment, political comment, and suicidal comment. Ishmam et al.[64] developed ML algorithms and
GRU-based deep neural network model. They employed word2vec for word embeddings and compared
several ML algorithms. The best performance was achieved by the GRU-based model with 70.10% accu-
racy, improving upon the Random Forest model’s 52.20% accuracy. Belal et al.[18] uses the dataset that
consists of 16,073 instances, with 7,585 labeled as Non-toxic and 8,488 as Toxic. The toxic instances are
manually labeled in one or more of six classes – vulgar, hate, religious, threat, troll, and insult. Data
was accumulated from three sources: Bangla-Abusive-Comment-Dataset, Bengali Hate Speech Dataset,
and Bangla Online Comments Dataset. The authors proposed a two-stage deep learning pipeline: a
binary classification model (LSTM with BERT Embedding) determines if a comment is toxic and if
toxic, a multi-label classifier (CNN-BiLSTM with attention mechanism) categorizes the toxicity type.
The binary classifier achieved 89.42% accuracy, multi-label classifier attained 78.92% accuracy and a
weighted F1-score of 0.86. Banik et al.[14] used a human-annotated dataset with labels such as toxic,
threat, obscene, insult, and racism. The dataset include 10,219 total comments, with 4,255 toxic and
5,964 non-toxic comments. The CNN model included a 1D convolutional layer and global max-pooling,
while the LSTM model processed one-hot encoded vectors of length 50. The deep learning-based models,
CNN and LSTM, outperformed the other classifiers by a 10% margin, with CNN achieving the highest
accuracy of 95.30%. Sultana et al.[140] collected a dataset of 5,000 data points from Facebook, Twitter,
and YouTube. The dataset is labeled as abusive or not abusive and various ML algorithms are performed
for hate speech detection. Sazzed[132] compiled two Bengali review corpora consisting of 7,245 comments
collected from YouTube, manually annotated into vulgar and non-vulgar categories after preprocessing
to exclude English and Romanized Bengali. BiLSTM model yielded the highest recall scores in both
datasets. Haque et al.[59] proposed a supervised deep learning classifier based on CNN and LSTM for
multi-class sentiment analysis. It involved training RNN variants, LSTM, BiLSTM, and BiGRU models,
on word embedding. Emon et al.[45] evaluate several ML and DL algorithms for identifying abusive
content in the Bengali language. It introduces new stemming rules for Bengali to improve algorithm
performance. The RNN with LSTM cell outperformed other models, achieving the highest accuracy of
82.20%.
5.2.5.1 Survey on Mono-Lingual Indic Languages Architecture:
Recent years have observed a rise in usage of Indic languages leading to easier accumulation of data from
online sources.
Hindi: Hindi is the most widely used Indian language and a majority of the communications on social
media are conducted in Hindi. Kannan et al.[73] used dataset of the HASOC 2021 workshop contain-
ing tweets in Hindi annotated as ‘NOT’ and ‘HOF’. The approach is a C-BiGRU model, combining a
CNN with a bidirectional RNN. It uses fastText word embeddings and is designed to capture contextual
information for binary classification of offensive text. The C-BiGRU model achieved a macro F1 score,
accuracy, precision, and recall of 75.04%, 77.48%, 74.63%, and 75.60% respectively. Bashar et al.[15]
pretrained word vectors on a collection of relevant tweets and training a CNN model on these vectors.
The CNN model achieved an accuracy of 0.82 on the test dataset, with precision, recall, and F1-score
for the HOF class being 0.85, 0.74, and 0.79 respectively, and for the NOT class being 0.8, 0.89, and
0.84 respectively. Sreelakshmi et al.[138] uses Facebook’s pre-trained word embedding library, fastText,
for representing data samples. It compares fastText features with word2vec and doc2vec features us-
ing SVM-RBF classifier. The dataset consists of 10000 texts equally divided into hate and non-hate
classes. Rani et al.[119] used three datasets:(1) 4575 Hindi-English code-mixed annotated tweets in Ro-
man script only, (2) HASOC2019, with 4665 annotated posts collected from Twitter and Facebook (3)
Author-created, containing 3367 tweets annotated by them, in both Roman and Devanagari script. They
used a character-based CNN model, TF weighting as ML classifiers feature, and no feature engineering
or preprocessing for the CNN model. The character-level CNN model outperformed other classifiers.
The accuracy for the combined dataset using the CNN model was 0.86, and the micro F1 score was 0.74.
Sharma et al.[134] compiled the dataset,”THAR,” consisting of 11549 YouTube comments in Hindi-
English code-mixed language, annotated for religious discrimination. The paper explores the application
20
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
of deep learning and transformer-based models. It discusses the use of word embeddings and the eval-
uation of model performance on the curated dataset. MuRIL outperformed others, achieving macro
average and weighted average F1 scores of 0.78 for binary classification and 0.65 and 0.72 for multi-class
classification, respectively.
Marathi: Marathi is considered as the third most spoken language in India. Multiple HASOC datasets
have been formed from online communications in Marathi for abusive language detection. Chavan et
al.[27] used the HASOC 2021, HASOC 2022, and MahaHate datasets. The authors have focused on
MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT models. The MahaTweetBERT
model, pre-trained on Marathi tweets and fine-tuned on the combined dataset (HASOC 2021 + HASOC
2022 + MahaHate), achieved an F1 score of 98.43 on the HASOC 2022 test set, providing a new state-
of-the-art result on HASOC 2022 / MOLD v2 test set. Gaikwad et al.[51] introduced MOLD (Marathi
Offensive Language Dataset), the first dataset for offensive language identification in Marathi. The paper
discusses ML experiments including zero-shot and transfer learning experiments on cross-lingual trans-
formers from existing data in Bengali, English, and Hindi. Models like SVMs, LSTM, and deep learning
models like BERT-m and XLM-Roberta (XLM-R) were used. Transfer learning from Hindi outperformed
other methods with a macro F1 score of 0.9401. Weighted F1 scores were also reported. Velankar et
al.[153] have used datasets like HASOC’21 Marathi dataset, L3Cube-MahaSent etc. to present a com-
parative study of both monolingual BERT and multilingual BERT based models. Ghosh et al.[52] have
examined the effectiveness of mono and multilingual transformer models in Indic languages.
Assamese: In north-east India, majority speaks Assamese language. Recently, researchers have started
giving attention to studying hate speech in Assamese. Ghosh et al.[54] annotated Assamese dataset
with 4,000 sentences. The approach involves fine-tuning two pre-trained BERT models: mBERT-cased
(bert-base-multilingual-cased) and Bangla BERT (sagorsarker/bangla-bert-base), using the Assamese
data for hate speech detection. Results are based on weighted F1 scores, precision, recall, and accuracy
metrics. The mBERT-cased model achieved an accuracy of 0.63, and the Bangla-BERT model achieved
an accuracy of 0.61. They also used the datasets for hate speech detection in Assamese, Bengali, and
Bodo languages, collected from YouTube and Facebook comments[53]. Each dataset is binary classified
(hate or non-hate). The paper discusses the use of supervised machine learning systems, with a focus
on a variant of BERT architecture achieving the best performance. Other systems applied include deep
learning and transformer models. The best classifiers for Assamese, Bengali, and Bodo achieved Macro
F1 scores of 0.73, 0.77, and 0.85, respectively.
Dravidian: Dravidian contains the languages spoken in the southern region of India, which consists of
Tamil, Telugu, Kannada, Malayalam etc. Among all the Dravidian languages, Tamil has been researched
extensively for offensive language detection. Das et al.[32] used datasets in 8 different languages from
14 publicly available sources. The datasets vary in their choice of class labels and are combined into
a binary classification task of abusive versus normal posts. The datasets include languages such as
Bengali, English, Hindi (Devanagari and code-mixed), Kannada (code-mixed), Malayalam (code-mixed),
Marathi, Tamil (code-mixed), and Urdu (actual and code-mixed). It explores various transfer mecha-
nisms for abusive language detection, such as zero-shot learning, few-shot learning, instance transfer,
cross-lingual learning, and synthetic transfer. Models like m-BERT and MuRIL are used, which are
pre-trained on multiple languages. Chakravarthi et al.[26] used the DravidianCodeMix dataset, which
consists of code-mixed comments on Tamil, Malayalam, and Kannada movie trailers from YouTube. The
approach discussed involves a fusion model combining MPNet (Masked and Permuted Network) and
CNN for detecting offensive language in low-resource Dravidian languages. The model achieved better
offensive language detection results than other baseline models with weighted average F1-scores of 0.85,
0.98, and 0.76 for Tamil, Malayalam, and Kannada respectively. It outperformed the baseline models
EWDT and EWODT. Vasantharajan et al.[149] proposes an approach that includes selective translation
and transliteration techniques for text conversion, and extensive experiments with BERT, DistilBERT,
XLM-RoBERTa, and CNN-BiLSTM. The ULMFiT model was used with AWD-LSTM architecture and
FastAI for fine-tuning. The ULMFiT model yielded the best results with a weighted average F1-score of
0.7346 on the test dataset. The dataset used is a Tamil-English code-mixed dataset from YouTube com-
ments/posts, compared with the Dakshina dataset for out-of-vocabulary words. Subramanian et al.[139]
compares traditional machine learning techniques with pre-trained multilingual transformer-based mod-
els using adapters and fine-tuners for detecting offensive texts. Transformer-based models outperformed
machine learning approaches, with adapter-based techniques showing better performance in terms of
time and efficiency, especially in low-resource languages like Tamil. The XLM-RoBERTa (Large) model
achieved the highest accuracy of 88.5%. Anbukkarasi et al.[11] proposes a dataset of 10,000 Tamil-English
code-mixed texts collected from Twitter, annotated as hate text/non-hate text. A Bi-LSTM model has
21
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
been proposed to classify hate and non-hate text in tweets. Hande et al.[58] proposed the dataset named
KanCMD, a multi-task learning dataset for sentiment analysis and offensive language identification in
Kannada. It contains comments from YouTube, annotated for sentiment analysis and offensive language
detection.The paper presents the dataset statistics and discusses the results of experiments with several
machine learning algorithms. The results are evaluated in terms of precision, recall, and F1-score. Using
this model Roy et al.[127] have proposed an ensemble model for detection of hate speech and offensive
language in Dravidian languages. The model has obtained a weighted F1-score of 0.802 and 0.933 for
Malayalam and Tamil code-mixed datasets respectively.
and combined them into a single homogeneous dataset. They classified the data into three classes:
abusive, hateful, or neither. The proposed multilingual model architecture includes logistic regression,
CNN-LSTM, and BERT-based networks. Das et al.[34] created a benchmark dataset of 5062 abusive
speech/counterspeech pairs, with 2460 pairs in Bengali and 2602 pairs in Hindi. The authors experi-
mented with several transformer-based baseline models for counterspeech generation, including GPT2,
MT5, BLOOM, and ChatGPT. They evaluated several interlingual mechanisms and observed that the
monolingual setting yielded the best performance. Gupta et al.[57] introduced the MACD dataset, a
large-scale, human-annotated, multilingual abuse detection dataset sourced from ShareChat. It contains
150K textual comments in 5 Indic languages, with 74K abusive and 77K non-abusive comments. The pa-
per presents AbuseXLMR, a model pre-trained on 5M+ social media comments in 15+ Indic languages.
It is based on the XLM-R model and is adapted to the social media domain to bridge the domain gap.
Jhaveri et al.[69]t used the dataset provided by ShareChat/Moj in IIIT-D Multilingual Abusive Com-
ment Identification challenge, containing 665k+ training samples and 74.3k+ test samples in 13 Indic
languages labeled as Abusive or Not. The approach included leveraging multilingual transformer-based
pre-trained and fine-tuned models such as XLM-RoBERTa and MuRIL. Parikh et al.[111] used dataset
which consists of comments in Gujarati, Hindi, English, Marathi, and Punjabi. It contains 7500 non-
toxic and 7495 toxic comments. LSTM, CNN and DistilBERT Embedding are used and fine-tuned on
the multilingual dataset. Various implementations on the multiple HASOC datasets is depicted in Table
22
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
6.
23
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
Multimodal and Multilingual Characteristic
As the dominant language around the world, resources and approaches in English are comparatively
abundant. Presently, multiple datasets are being accumulated[13] in multiple languages and significant
advances have been achieved in low-resource languages. Various datasets have been generated consisting
of textual data, but the evolving social media space have incorporated different modalities for spread of
hateful content. Images and texts which individually may be harmless, when combined can generate hate
speech. Audio-visual mediums can also be used to perpetuate harmful language. Minimal focus has been
provided to the multimodal aspect of hate speech detection. Researchers are nowadays concentrating on
the multimodal properties[12] for hate speech.
There is a lot of potential for future research opportunities in the domain of hate speech detection.
Following are some of the research sections which need attention and improvement.
• Improving research in the understanding of language nuances and cultural references. Cross-
language detection with the incorporation of various dialects can improve hate speech detection
models.
• Merging multiple modalities and promotion of interdisciplinary collaboration among language ex-
perts, sociologists, and computer researchers to strengthen detection algorithms.
• Detection of hate speech in real-time and monitoring the systems in live social media platforms.
Interactions with different communities to understand their perception of hate speech and create
models that are culturally and linguistically responsive.
• Establishment of transparent ethical frameworks and policies along with moderation tools for build-
ing trust among the users.
• Investigating the behaviour of users who are involved in hate speech, can assist in developing robust
algorithms for targeted interventions and preventive measures.
• Analyzing emotion and sentiment to distinguish between extremely subjective language and hate
speech.
• Enhancing accuracy detection and interpreting neighbouring contexts by utilizing community feed-
back and crowd-sourced data.
7 Conclusion
The continual presence of the internet has facilitated the integration of social media platforms in every-
day life. Moreover, people display hostility for a variety of reasons. Perusing this large volume of data
for automatic harmful language detection is challenging. In this survey, we have presented an extensive
overview of advances obtained in online hate speech detection techniques to date. We have discussed the
overlying hate speech concepts of cyberbullying, offensive and abusive language. Initially, the publicly
available datasets in both English and other low-resource languages have been detailed. We have iden-
tified prevailing reviews, surveys and implementation approaches on hate speech. Particular attention
has been provided to hate speech detection in low-resource languages and the techniques applied. Prim-
itively, TF-IDF and other machine learning techniques were used which have been replaced by FastText,
word2Vec, GloVe word embeddings combined with deep learning approaches like CNN, RNN and BERT.
Multiple research gaps, limitations and challenges have been elaborated with prospective future studies
are elaborated. This survey will definitely be useful for researchers pursuing analysis in hate speech
detection in multiple low-resource languages.
24
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
References
[1] Luis Aguirre and Emese Domahidi. Problematic content in spanish language comments in youtube
videos about venezuelan refugees and migrants. Journal of Quantitative Description: Digital Media,
1, 2021.
[2] Resham Ahluwalia, Himani Soni, Edward Callow, Anderson Nascimento, and Martine De Cock.
Detecting hate speech against women in english tweets. EVALITA Evaluation of NLP and Speech
Tools for Italian, 12:194, 2018.
[3] Muhammad Pervez Akhter, Zheng Jiangbin, Irfan Raza Naqvi, Mohammed Abdelmajeed, and
Muhammad Tariq Sadiq. Automatic detection of offensive language for urdu and roman urdu.
IEEE Access, 8:91213–91226, 2020.
[4] Areej Al-Hassan and Hmood Al-Dossari. Detection of hate speech in social networks: a survey on
multilingual corpus. In 6th international conference on computer science and information technol-
ogy, volume 10, pages 10–5121. ACM, 2019.
[5] Azalden Alakrot, Liam Murray, and Nikola S Nikolov. Dataset construction for the detection of
anti-social behaviour in online communication in arabic. Procedia Computer Science, 142:174–181,
2018.
[6] Nuha Albadi, Maram Kurdi, and Shivakant Mishra. Are they our brothers? analysis and detection
of religious hate speech in the arabic twittersphere. In 2018 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining (ASONAM), pages 69–76. IEEE, 2018.
[11] S Anbukkarasi and S Varadhaganapathy. Deep learning-based hate speech detection in code-mixed
tamil text. IETE Journal of Research, 69(11):7893–7898, 2023.
[12] Greeshma Arya, Mohammad Kamrul Hasan, Ashish Bagwari, Nurhizam Safie, Shayla Islam, Fa-
tima Rayan Awad Ahmed, Aaishani De, Muhammad Attique Khan, and Taher M Ghazal. Multi-
modal hate speech detection in memes using contrastive language-image pre-training. IEEE Access,
2024.
[13] Md Rabiul Awal, Roy Ka-Wei Lee, Eshaan Tanwar, Tanmay Garg, and Tanmoy Chakraborty.
Model-agnostic meta-learning for multilingual hate speech detection. IEEE Transactions on Com-
putational Social Systems, 11(1):1086–1095, 2023.
[14] Nayan Banik and Md Hasan Hafizur Rahman. Toxicity detection on bengali social media comments
using supervised models. In 2019 2nd international conference on Innovation in Engineering and
Technology (ICIET), pages 1–5. IEEE, 2019.
[15] Md Abul Bashar and Richi Nayak. Qutnocturnal@ hasoc’19: Cnn for hate speech and offensive
content identification in hindi language. arXiv preprint arXiv:2008.12448, 2020.
[16] Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco
Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. Semeval-2019 task 5: Multilin-
gual detection of hate speech against immigrants and women in twitter. In Proceedings of the 13th
international workshop on semantic evaluation, pages 54–63, 2019.
25
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[17] Delphine Battistelli, Cyril Bruneau, and Valentina Dragos. Building a formal model for hate
detection in french corpora. Procedia Computer Science, 176:2358–2365, 2020.
[18] Tanveer Ahmed Belal, GM Shahariar, and Md Hasanul Kabir. Interpretable multi labeled bengali
toxic comments classification using deep learning. In 2023 International Conference on Electrical,
Computer and Communication Engineering (ECCE), pages 1–6. IEEE, 2023.
[19] Fatih Beyhan, Buse Çarık, Inanç Arın, Ayşecan Terzioğlu, Berrin Yanikoglu, and Reyyan Yeniterzi.
A turkish hate speech dataset and detection system. In Proceedings of the thirteenth language
resources and evaluation conference, pages 4177–4185, 2022.
[20] Mehar Bhatia, Tenzin Singhay Bhotia, Akshat Agarwal, Prakash Ramesh, Shubham Gupta, Kumar
Shridhar, Felix Laumann, and Ayushman Dash. One to rule them all: Towards joint indic language
hate speech detection. arXiv preprint arXiv:2109.13711, 2021.
[21] Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. A
dataset of hindi-english code-mixed social media text for hate speech detection. In Proceedings of
the second workshop on computational modeling of people’s opinions, personality, and emotions in
social media, pages 36–41, 2018.
[22] Tommaso Caselli, Valerio Basile, Jelena Mitrović, and Michael Granitzer. Hatebert: Retraining
bert for abusive language detection in english. arXiv preprint arXiv:2010.12472, 2020.
[23] Galo Castillo-López, Arij Riabi, and Djamé Seddah. Analyzing zero-shot transfer scenarios across
spanish variants for hate speech detection. In Tenth Workshop on NLP for Similar Languages,
Varieties and Dialects (VarDial 2023), pages 1–13, 2023.
[24] Laura Ceci. Tiktok: languages covered by content moderators, 2024. URL https://www.
statista.com/statistics/1405643/tiktok-language-covered-moderators/. Accessed: 2024-
08-28.
[25] Fabio Celli, Mirko Lai, Armend Duzha, Cristina Bosco, Viviana Patti, et al. Policycorpus xl: An
italian corpus for the detection of hate speech against politics. In CEUR workshop proceedings,
volume 3033, pages 1–7. CEUR-WS. org, 2021.
[26] Bharathi Raja Chakravarthi, Manoj Balaji Jagadeeshan, Vasanth Palanikumar, and Ruba Priyad-
harshini. Offensive language identification in dravidian languages using mpnet and cnn. Interna-
tional Journal of Information Management Data Insights, 3(1):100151, 2023.
[27] Tanmay Chavan, Shantanu Patankar, Aditya Kane, Omkar Gokhale, and Raviraj Joshi. A twitter
bert approach for offensive language detection in marathi. arXiv preprint arXiv:2212.10039, 2022.
[28] Anusha Chhabra and Dinesh Kumar Vishwakarma. A literature survey on multimodal and multi-
lingual automatic hate speech identification. Multimedia Systems, 29(3):1203–1230, 2023.
[29] Yi-Ling Chung, Elizaveta Kuzmenko, Serra Sinem Tekiroglu, and Marco Guerini. Conan–counter
narratives through nichesourcing: a multilingual dataset of responses to fight online hate speech.
arXiv preprint arXiv:1910.03270, 2019.
[30] Çağrı Çöltekin. A corpus of turkish offensive language on social media. In Proceedings of the twelfth
language resources and evaluation conference, pages 6174–6184, 2020.
[31] Amit Kumar Das, Abdullah Al Asif, Anik Paul, and Md Nur Hossain. Bangla hate speech detection
on social media using attention-based recurrent neural network. Journal of Intelligent Systems, 30
(1):578–591, 2021.
[32] Mithun Das, Somnath Banerjee, and Animesh Mukherjee. Data bootstrapping approaches to
improve low resource abusive language detection for indic languages. In Proceedings of the 33rd
ACM conference on hypertext and social media, pages 32–42, 2022.
[33] Mithun Das, Somnath Banerjee, Punyajoy Saha, and Animesh Mukherjee. Hate speech and offen-
sive language detection in bengali. arXiv preprint arXiv:2210.03479, 2022.
26
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[34] Mithun Das, Saurabh Kumar Pandey, Shivansh Sethi, Punyajoy Saha, and Animesh Mukherjee.
Low-resource counterspeech generation for indic languages: The case of bengali and hindi. arXiv
preprint arXiv:2402.07262, 2024.
[35] Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. Automated hate speech de-
tection and the problem of offensive language. In Proceedings of the international AAAI conference
on web and social media, volume 11, pages 512–515, 2017.
[36] Ernesto del Valle and Luis de la Fuente. Sentiment analysis methods for politics and hate speech
contents in spanish language: a systematic review. IEEE Latin America Transactions, 21(3):
408–418, 2023.
[37] Christoph Demus, Jonas Pitz, Mina Schütz, Nadine Probol, Melanie Siegel, and Dirk Labudde.
Detox: A comprehensive dataset for german offensive language and conversation analysis. In
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 143–153, 2022.
[38] Statista Research Department. Statista dataset for facebook, 2023. URL https://www.statista.
com/statistics/1013804/facebook-hate-speech-content-deletion-quarter/. Accessed:
2024-08-28.
[39] LK Dhanya and Kannan Balakrishnan. Hate speech detection in asian languages: a survey. In 2021
international conference on communication, control and information sciences (ICCISc), volume 1,
pages 1–5. IEEE, 2021.
[40] Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. Com-
mon sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions
on Interactive Intelligent Systems (TiiS), 2(3):1–30, 2012.
[41] Stacy Jo Dixon. Statista dataset for instagram, 2023. URL https://www.statista.com/
statistics/1275933/global-actioned-hate-speech-content-instagram/. Accessed: 2024-
08-28.
[42] Stacy Jo Dixon. Statista most used social networks, 2024. URL https://www.statista.
com/statistics/272014/global-social-networks-ranked-by-number-of-users/. Accessed:
2024-08-28.
[43] Hang Thi-Thuy Do, Huy Duc Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen, and Anh Gia-
Tuan Nguyen. Hate speech detection on vietnamese social media text using the bidirectional-lstm
model. arXiv preprint arXiv:1911.03648, 2019.
[44] Elisabeth Eder, Ulrike Krieg-Holz, and Udo Hahn. At the lower end of language—exploring the
vulgar and obscene side of german. In Proceedings of the third workshop on abusive language online,
pages 119–128, 2019.
[45] Estiak Ahmed Emon, Shihab Rahman, Joti Banarjee, Amit Kumar Das, and Tanni Mittra. A deep
learning approach to detect abusive bengali text. In 2019 7th International Conference on Smart
Computing & Communications (ICSCC), pages 1–5. IEEE, 2019.
[46] Johan Fernquist, Oskar Lindholm, Lisa Kaati, and Nazar Akrami. A study on the feasibility to
detect hate speech in swedish. In 2019 IEEE international conference on big data (Big Data),
pages 4724–4729. IEEE, 2019.
[47] Paula Fortuna and Sérgio Nunes. A survey on automatic detection of hate speech in text. ACM
Computing Surveys (CSUR), 51(4):1–30, 2018.
[48] Paula Fortuna, Joao Rocha da Silva, Leo Wanner, Sérgio Nunes, et al. A hierarchically-labeled
portuguese hate speech dataset. In Proceedings of the third workshop on abusive language online,
pages 94–104, 2019.
[49] Carli Friedman and Aleksa L Owen. Defining disability: Understandings of and attitudes towards
ableism and disability. Disability Studies Quarterly, 37(1), 2017.
[50] Tamara Fuchs and Fabian Schäfer. Normalizing misogyny: hate speech and verbal abuse of female
politicians on japanese twitter. In Japan forum, volume 33, pages 553–579. Taylor & Francis, 2021.
27
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[51] Saurabh Gaikwad, Tharindu Ranasinghe, Marcos Zampieri, and Christopher M Homan. Cross-
lingual offensive language identification for low resource languages: The case of marathi. arXiv
preprint arXiv:2109.03552, 2021.
[52] Koyel Ghosh and Apurbalal Senapati. Hate speech detection: a comparison of mono and multi-
lingual transformer model with cross-language evaluation. In Proceedings of the 36th Pacific Asia
Conference on Language, Information and Computation, pages 853–865, 2022.
[53] Koyel Ghosh, Apurbalal Senapati, and Aditya Shankar Pal. Annihilate hates (task 4 hasoc 2023):
Hate speech detection in assamese bengali and bodo languages. In FIRE (Working Notes), pages
368–382, 2023.
[54] Koyel Ghosh, Debarshi Sonowal, Abhilash Basumatary, Bidisha Gogoi, and Apurbalal Senapati.
Transformer-based hate speech detection in assamese. In 2023 IEEE Guwahati Subsection Confer-
ence (GCON), pages 1–5. IEEE, 2023.
[55] Soumitra Ghosh, Asif Ekbal, Pushpak Bhattacharyya, Tista Saha, Alka Kumar, and Shikha Sri-
vastava. Sehc: A benchmark setup to identify online hate speech in english. IEEE Transactions
on Computational Social Systems, 10(2):760–770, 2022.
[56] Raul Gomez, Jaume Gibert, Lluis Gomez, and Dimosthenis Karatzas. Exploring hate speech
detection in multimodal publications. In Proceedings of the IEEE/CVF winter conference on
applications of computer vision, pages 1470–1478, 2020.
[57] Vikram Gupta, Sumegh Roychowdhury, Mithun Das, Somnath Banerjee, Punyajoy Saha, Binny
Mathew, Animesh Mukherjee, et al. Multilingual abusive comment detection at scale for indic
languages. Advances in Neural Information Processing Systems, 35:26176–26191, 2022.
[58] Adeep Hande, Ruba Priyadharshini, and Bharathi Raja Chakravarthi. Kancmd: Kannada
codemixed dataset for sentiment analysis and offensive language detection. In Proceedings of the
Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s in
Social Media, pages 54–63, 2020.
[59] Rezaul Haque, Naimul Islam, Mayisha Tasneem, and Amit Kumar Das. Multi-class sentiment
classification on bengali social media comments using machine learning. International journal of
cognitive computing in engineering, 4:21–35, 2023.
[60] Ehtesham Hashmi and Sule Yildirim Yayilgan. Multi-class hate speech detection in the norwegian
language using fast-rnn and multilingual fine-tuned transformers. Complex & Intelligent Systems,
10(3):4535–4556, 2024.
[61] Eftekhar Hossain, Omar Sharif, and Mohammed Moshiul Hoque. Mute: A multimodal dataset
for detecting hateful memes. In Proceedings of the 2nd conference of the asia-pacific chapter of
the association for computational linguistics and the 12th international joint conference on natural
language processing: student research workshop, pages 32–39, 2022.
[62] Md Fahad Hossain, Md Al Abid Supto, Zannat Chowdhury, Hana Sultan Chowdhury, and Sheikh
Abujar. Baad: A multipurpose dataset for automatic bangla offensive speech recognition. Data in
Brief, 48:109067, 2023.
[63] IPSOS. Survey on the impact of online disinformation and hate speech, 2023.
URL https://www.ipsos.com/sites/default/files/ct/news/documents/2023-11/
unesco-ipsos-online-disinformation-hate-speech.pdf. Accessed: 2024-08-28.
[64] Alvi Md Ishmam and Sadia Sharmin. Hateful speech detection in public facebook pages for the
bengali language. In 2019 18th IEEE international conference on machine learning and applications
(ICMLA), pages 555–560. IEEE, 2019.
[65] Md Saroar Jahan and Mourad Oussalah. A systematic review of hate speech automatic detection
using natural language processing. Neurocomputing, page 126232, 2023.
[66] Md Saroar Jahan, Mainul Haque, Nabil Arhab, and Mourad Oussalah. Banglahatebert: Bert for
abusive language detection in bengali. In Proceedings of the second international workshop on
resources and techniques for user information in abusive language analysis, pages 8–15, 2022.
28
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[67] Md Saroar Jahan, Mourad Oussalah, and Nabil Arhab. Finnish hate-speech detection on social
media using cnn and finbert. In Language Resources and Evaluation Conference, LREC 2022,
20-25 June 2022, Palais du Pharo, Marseille, France: conference proceedings. European Language
Resources Association, 2022.
[68] Sylvia Jaki and Tom De Smedt. Right-wing german hate speech on twitter: Analysis and automatic
detection. arXiv preprint arXiv:1910.07518, 2019.
[69] Manan Jhaveri, Devanshu Ramaiya, and Harveen Singh Chadha. Toxicity detection for indic
multilingual social media content. arXiv preprint arXiv:2201.00598, 2022.
[70] Ananya Joshi and Raviraj Joshi. Harnessing pre-trained sentence transformers for offensive lan-
guage detection in indian languages. arXiv preprint arXiv:2310.02249, 2023.
[71] Mohd Istiaq Hossain Junaid, Faisal Hossain, and Rashedur M Rahman. Bangla hate speech de-
tection in videos using machine learning. In 2021 IEEE 12th Annual Ubiquitous Computing, Elec-
tronics & Mobile Communication Conference (UEMCON), pages 0347–0351. IEEE, 2021.
[72] TaeYoung Kang, Eunrang Kwon, Junbum Lee, Youngeun Nam, Junmo Song, and JeongKyu Suh.
Korean online hate speech dataset for multilabel classification: How can social science improve
dataset on hate speech? arXiv preprint arXiv:2204.03262, 2022.
[73] Sudharsana Kannan and Jelena Mitrovic. Hatespeech and offensive content detection in hindi
language using c-bigru. In FIRE (Working Notes), pages 209–216, 2021.
[74] Md Rezaul Karim, Bharathi Raja Chakravarthi, John P McCrae, and Michael Cochez. Classifica-
tion benchmarks for under-resourced bengali language based on multichannel convolutional-lstm
network. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics
(DSAA), pages 390–399. IEEE, 2020.
[75] Md Rezaul Karim, Sumon Kanti Dey, Tanhim Islam, Sagor Sarker, Mehadi Hasan Menon, Kabir
Hossain, Md Azam Hossain, and Stefan Decker. Deephateexplainer: Explainable hate speech
detection in under-resourced bengali language. In 2021 IEEE 8th international conference on data
science and advanced analytics (DSAA), pages 1–10. IEEE, 2021.
[76] Md Rezaul Karim, Sumon Kanti Dey, Tanhim Islam, Md Shajalal, and Bharathi Raja
Chakravarthi. Multimodal hate speech detection from bengali memes and texts. In International
Conference on Speech and Language Technologies for Low-resource Languages, pages 293–308.
Springer, 2022.
[77] Ashfia Jannat Keya, Md Mohsin Kabir, Nusrat Jahan Shammey, MF Mridha, Md Rashedul Islam,
and Yutaka Watanobe. G-bert: an efficient method for identifying hate speech in bengali texts on
social media. IEEE Access, 2023.
[78] Anas Ali Khan, M Hammad Iqbal, Shibli Nisar, Awais Ahmad, and Waseem Iqbal. Offensive
language detection for low resource language using deep sequence model. IEEE Transactions on
Computational Social Systems, 2023.
[79] Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan,
Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, et al. Muril: Multilin-
gual representations for indian languages. arXiv preprint arXiv:2103.10730, 2021.
[80] Rahul Khurana, Chaitanya Pandey, Priyanshi Gupta, and Preeti Nagrath. Animojity: Detecting
hate comments in indic languages and analysing bias against content creators. In Proceedings of
the 19th International Conference on Natural Language Processing (ICON), pages 172–182, 2022.
[81] Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ring-
shia, and Davide Testuggine. The hateful memes challenge: Detecting hate speech in multimodal
memes. Advances in neural information processing systems, 33:2611–2624, 2020.
[82] Taehee Kim and Yuki Ogawa. The impact of politicians’ behaviors on hate speech spread: hate
speech adoption threshold on twitter in japan. Journal of Computational Social Science, pages
1–26, 2024.
29
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[83] Renard Korzeniowski, Rafal Rolczyński, Przemyslaw Sadownik, Tomasz Korbak, and Marcin
Możejko. Exploiting unsupervised pre-training and automated feature engineering for low-resource
hate speech detection in polish. arXiv preprint arXiv:1906.09325, 2019.
[84] Dihia Lanasri, Juan Olano, Sifal Klioui, Sin Liang Lee, and Lamia Sekkai. Hate speech detection
in algerian dialect using deep learning. arXiv preprint arXiv:2309.11611, 2023.
[85] Mario Laurent. Project hatemeter: helping ngos and social science researchers to analyze and
prevent anti-muslim hate speech on social media. Procedia Computer Science, 176:2143–2153,
2020.
[86] Jean Lee, Taejun Lim, Heejun Lee, Bogeun Jo, Yangsok Kim, Heegeun Yoon, and Soyeon Caren
Han. K-mhas: A multi-label hate speech detection dataset in korean online news comment. arXiv
preprint arXiv:2208.10684, 2022.
[87] Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Juho Kim, and Alice Oh. Crehate: Cross-cultural
re-annotation of english hate speech dataset. arXiv preprint arXiv:2308.16705, 2023.
[88] Joao A Leite, Diego F Silva, Kalina Bontcheva, and Carolina Scarton. Toxic language detection
in social media for brazilian portuguese: New dataset and multilingual analysis. arXiv preprint
arXiv:2010.04543, 2020.
[89] Son T Luu, Kiet Van Nguyen, and Ngan Luu-Thuy Nguyen. A large-scale dataset for hate speech
detection on vietnamese social media texts. In Advances and Trends in Artificial Intelligence.
Artificial Intelligence Practices: 34th International Conference on Industrial, Engineering and
Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia, July
26–29, 2021, Proceedings, Part I 34, pages 415–426. Springer, 2021.
[90] Tanjim Mahmud, Michal Ptaszynski, Juuso Eronen, and Fumito Masui. Cyberbullying detection
for low-resource languages and dialects: Review of the state of the art. Information Processing &
Management, 60(5):103454, 2023.
[91] Thomas Mandl, Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Man-
dlia, and Aditya Patel. Overview of the hasoc track at fire 2019: Hate speech and offensive content
identification in indo-european languages. In Proceedings of the 11th annual meeting of the Forum
for Information Retrieval Evaluation, pages 14–17, 2019.
[92] Mihai Manolescu, Denise Löfflad, Adham Nasser Mohamed Saber, and Masoumeh Moradipour
Tari. Tueval at semeval-2019 task 5: Lstm approach to hate speech detection in english and
spanish. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 498–
502, 2019.
[93] Zainab Mansur, Nazlia Omar, and Sabrina Tiun. Twitter hate speech detection: a systematic
review of methods, taxonomy analysis, challenges, and opportunities. IEEE Access, 11:16226–
16249, 2023.
[94] Ilia Markov, Ine Gevers, and Walter Daelemans. An ensemble approach for dutch cross-domain hate
speech detection. In International conference on applications of natural language to information
systems, pages 3–15. Springer, 2022.
[95] Binny Mathew, Ritam Dutt, Pawan Goyal, and Animesh Mukherjee. Spread of hate speech in
online social media. In Proceedings of the 10th ACM conference on web science, pages 173–182,
2019.
[96] Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh
Mukherjee. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings
of the AAAI conference on artificial intelligence, volume 35, pages 14867–14875, 2021.
[97] Puneet Mathur, Ramit Sawhney, Meghna Ayyar, and Rajiv Shah. Did you offend me? classification
of offensive tweets in hinglish language. In Proceedings of the 2nd workshop on abusive language
online (ALW2), pages 138–148, 2018.
30
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[98] Sandip Modha, Thomas Mandl, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Tharindu
Ranasinghe, and Marcos Zampieri. Overview of the hasoc subtrack at fire 2021: Hate speech and
offensive content identification in english and indo-aryan languages and conversational hate speech.
In Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, pages
1–3, 2021.
[99] Ioannis Mollas, Zoe Chrysopoulou, Stamatis Karlos, and Grigorios Tsoumakas. Ethos: a multi-
label hate speech detection dataset. Complex & Intelligent Systems, 8(6):4663–4678, 2022.
[100] Zewdie Mossie, Jenq-Haur Wang, et al. Social network hate speech detection for amharic language.
Computer Science & Information Technology, pages 41–55, 2018.
[101] Hamdy Mubarak, Kareem Darwish, and Walid Magdy. Abusive language detection on arabic social
media. In Proceedings of the first workshop on abusive language online, pages 52–56, 2017.
[102] Nikhil Narayan, Mrutyunjay Biswal, Pramod Goyal, and Abhranta Panigrahi. Hate speech and
offensive content detection in indo-aryan languages: A battle of lstm and transformers. arXiv
preprint arXiv:2312.05671, 2023.
[103] Nobal B Niraula, Saurab Dulal, and Diwa Koirala. Offensive language detection in nepali social
media. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pages
67–75, 2021.
[104] Daniel Nkemelu, Harshil Shah, Michael Best, and Irfan Essa. Tackling hate speech in low-resource
languages with context experts. In Proceedings of the 2022 International Conference on Information
and Communication Technologies and Development, pages 1–11, 2022.
[105] Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. Abusive language
detection in online user content. In Proceedings of the 25th international conference on world wide
web, pages 145–153, 2016.
[106] Jodi O’Brien. Encyclopedia of gender and society, volume 2. Sage, 2009.
[107] Oluwafemi Oriola and Eduan Kotzé. Evaluating machine learning techniques for detecting offensive
and hate speech in south african tweets. IEEE Access, 8:21496–21509, 2020.
[108] Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, and Dit-Yan Yeung. Multilin-
gual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049, 2019.
[109] Pia Pachinger, Janis Goldzycher, Anna Maria Planitzer, Wojciech Kusa, Allan Hanbury, and Julia
Neidhardt. Austrotox: A dataset for target-based austrian german offensive language detection.
arXiv preprint arXiv:2406.08080, 2024.
[110] Endang Wahyu Pamungkas, Valerio Basile, and Viviana Patti. Towards multidomain and multi-
lingual abusive language detection: a survey. Personal and Ubiquitous Computing, 27(1):17–43,
2023.
[111] Yashkumar Parikh and Jinan Fiaidhi. Regional language toxic comment classification. Authorea
Preprints, 2023.
[112] Konstantinos Perifanos and Dionysis Goutsos. Multimodal hate speech detection in greek social
media. Multimodal Technologies and Interaction, 5(7):34, 2021.
[113] Zeses Pitenis, Marcos Zampieri, and Tharindu Ranasinghe. Offensive language identification in
greek. arXiv preprint arXiv:2003.07459, 2020.
[114] Flor Miriam Plaza-del Arco, M Dolores Molina-González, L Alfonso Urena-López, and M Teresa
Martı́n-Valdivia. Comparing pre-trained language models for spanish hate speech detection. Expert
Systems with Applications, 166:114120, 2021.
[115] Fabio Poletto, Marco Stranisci, Manuela Sanguinetti, Viviana Patti, Cristina Bosco, et al. Hate
speech annotation: Analysis of an italian twitter corpus. In Ceur workshop proceedings, volume
2006, pages 1–6. CEUR-WS, 2017.
31
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[116] Michal Ptaszynski, Agata Pieciukiewicz, and Pawel Dybala. Results of the poleval 2019 shared
task 6: First dataset and open shared task for automatic cyberbullying detection in polish twitter.
2019.
[117] Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Belding, and William Yang Wang. A benchmark
dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251, 2019.
[118] Tharindu Ranasinghe, Isuri Anuradha, Damith Premasiri, Kanishka Silva, Hansi Hettiarachchi,
Lasitha Uyangodage, and Marcos Zampieri. Sold: Sinhala offensive language dataset. Language
Resources and Evaluation, pages 1–41, 2024.
[119] Priya Rani, Shardul Suryawanshi, Koustava Goswami, Bharathi Raja Chakravarthi, Theodorus
Fransen, and John Philip McCrae. A comparative study of different state-of-the-art hate speech
detection methods in hindi-english code-mixed data. In Proceedings of the second workshop on
trolling, aggression and cyberbullying, pages 42–48, 2020.
[120] Mohammad Mamun Or Rashid. Toxlex bn: A curated dataset of bangla toxic language derived
from facebook comment. Data in Brief, 43:108416, 2022.
[121] Amir H Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin. Offensive language detection
using multi-level classification. In Advances in Artificial Intelligence: 23rd Canadian Conference
on Artificial Intelligence, Canadian AI 2010, Ottawa, Canada, May 31–June 2, 2010. Proceedings
23, pages 16–27. Springer, 2010.
[122] Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, and Md Saiful Islam. Hate speech detection
in the bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint
Conference on Advances in Computational Intelligence: IJCACI 2020, pages 457–468. Springer,
2021.
[123] Nauros Romim, Mosahed Ahmed, Md Saiful Islam, Arnab Sen Sharma, Hriteshwar Talukder, and
Mohammad Ruhul Amin. Bd-shs: A benchmark dataset for learning to detect online bangla hate
speech in different social contexts. arXiv preprint arXiv:2206.00372, 2022.
[124] Björn Ross, Michael Rist, Guillermo Carbonell, Benjamin Cabrera, Nils Kurowsky, and Michael
Wojatzki. Measuring the reliability of hate speech annotations: The case of the european refugee
crisis. arXiv preprint arXiv:1701.08118, 2017.
[125] Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, and Janet B
Pierrehumbert. Hatecheck: Functional tests for hate speech detection models. arXiv preprint
arXiv:2012.15606, 2020.
[126] Pradeep Kumar Roy, Asis Kumar Tripathy, Tapan Kumar Das, and Xiao-Zhi Gao. A framework
for hate speech detection using deep convolutional neural network. IEEE Access, 8:204951–204962,
2020.
[127] Pradeep Kumar Roy, Snehaan Bhawal, and Chinnaudayar Navaneethakrishnan Subalalitha. Hate
speech and offensive language detection in dravidian languages using deep ensemble framework.
Computer Speech & Language, 75:101386, 2022.
[128] Hind Saleh, Areej Alhothali, and Kawthar Moria. Detection of hate speech using bert and hate
speech word embedding with deep model. Applied Artificial Intelligence, 37(1):2166719, 2023.
[129] HMST Sandaruwan, SAS Lorensuhewa, and MAL Kalyani. Sinhala hate speech detection in social
media using text mining and machine learning. In 2019 19th International Conference on Advances
in ICT for Emerging Regions (ICTer), volume 250, pages 1–8. IEEE, 2019.
[130] Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Marco Stranisci. An italian
twitter corpus of hate speech against immigrants. In Proceedings of the eleventh international
conference on language resources and evaluation (LREC 2018), 2018.
[131] Juan Pablo Sans. Facebook’s top ten languages, 2018. URL https://www.linkedin.com/pulse/
facebooks-top-ten-languages-who-using-them-juan-pablo#. Accessed: 2024-08-28.
32
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[132] Salim Sazzed. Identifying vulgarity in bengali social media textual content. PeerJ Computer
Science, 7:e665, 2021.
[133] Semiocast. Top languages on twitter-stats, 2024. URL https://semiocast.com/
top-languages-on-twitter-stats/#. Accessed: 2024-08-28.
[134] Deepawali Sharma, Aakash Singh, and Vivek Kumar Singh. Thar-targeted hate speech against reli-
gion: A high-quality hindi-english code-mixed dataset with the application of deep learning models
for automatic detection. ACM Transactions on Asian and Low-Resource Language Information
Processing, 2024.
[135] Alexander Shvets, Paula Fortuna, Juan Soler, and Leo Wanner. Targets and aspects in social
media hate speech. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH
2021), pages 179–190, 2021.
[136] Gudbjartur Ingi Sigurbergsson and Leon Derczynski. Offensive language and hate speech detection
for danish. arXiv preprint arXiv:1908.04531, 2019.
[137] Peter K Smith, Jess Mahdavi, Manuel Carvalho, Sonja Fisher, Shanette Russell, and Neil Tippett.
Cyberbullying: Its nature and impact in secondary school pupils. Journal of child psychology and
psychiatry, 49(4):376–385, 2008.
[138] K Sreelakshmi, B Premjith, and KP Soman. Detection of hate speech text in hindi-english code-
mixed data. Procedia Computer Science, 171:737–744, 2020.
[139] Malliga Subramanian, Rahul Ponnusamy, Sean Benhur, Kogilavani Shanmugavadivel, Adhithiya
Ganesan, Deepti Ravi, Gowtham Krishnan Shanmugasundaram, Ruba Priyadharshini, and
Bharathi Raja Chakravarthi. Offensive language detection in tamil youtube comments by adapters
and cross-domain knowledge transfer. Computer Speech & Language, 76:101404, 2022.
[140] Sherin Sultana, Md Omur Faruk Redoy, Jabir Al Nahian, Abu Kaisar Mohammad Masum, and
Sheikh Abujar. Detection of abusive bengali comments for mixed social media data using machine
learning. 2023.
[141] Taufic Leonardo Sutejo and Dessi Puji Lestari. Indonesia hate speech detection using deep learning.
In 2018 International Conference on Asian Language Processing (IALP), pages 39–43. IEEE, 2018.
[142] Surafel Getachew Tesfaye and Kula Kakeba. Automated amharic hate speech posts and comments
detection model using recurrent neural network. 2020.
[143] Surendrabikram Thapa, Aditya Shah, Farhan Ahmad Jafri, Usman Naseem, and Imran Razzak.
A multi-modal dataset for hate speech detection on social media: Case-study of russia-ukraine
conflict. In CASE 2022-5th Workshop on Challenges and Applications of Automated Extraction
of Socio-Political Events from Text, Proceedings of the Workshop. Association for Computational
Linguistics, 2022.
[144] Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans.
A dictionary-based approach to racism detection in dutch social media. arXiv preprint
arXiv:1608.08738, 2016.
[145] Stefanie Ullmann and Marcus Tomalin. Quarantining online hate speech: technical and ethical
perspectives. Ethics and Information Technology, 22(1):69–80, 2020.
[146] Cynthia Van Hee, Ben Verhoeven, Els Lefever, Guy De Pauw, Véronique Hoste, and Walter Daele-
mans. Guidelines for the fine-grained analysis of cyberbullying. 2015.
[147] Natalia Vanetik and Elisheva Mimoun. Detection of racist language in french tweets. Information,
13(7):318, 2022.
[148] Francielle Alves Vargas, Isabelle Carvalho, Fabiana Rodrigues de Góes, Fabrı́cio Benevenuto, and
Thiago Alexandre Salgueiro Pardo. Hatebr: A large expert annotated corpus of brazilian instagram
comments for offensive language and hate speech detection. arXiv preprint arXiv:2103.14972, 2021.
[149] Charangan Vasantharajan and Uthayasanker Thayasivam. Towards offensive language identifica-
tion for tamil code-mixed youtube comments and posts. SN Computer Science, 3(1):94, 2022.
33
This work is shared under a CC BY-SA 4.0 license unless otherwise noted
[150] Neeraj Vashistha and Arkaitz Zubiaga. Online multilingual hate speech detection: experimenting
with hindi and english social media. Information, 12(1):5, 2020.
[151] Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, and Raviraj Joshi. Hate and
offensive speech detection in hindi and marathi. arXiv preprint arXiv:2110.12200, 2021.
[152] Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, and Raviraj Joshi. L3cube-
mahahate: A tweet-based marathi hate speech detection dataset and bert models. arXiv preprint
arXiv:2203.13778, 2022.
[153] Abhishek Velankar, Hrushikesh Patil, and Raviraj Joshi. Mono vs multilingual bert for hate speech
detection and text classification: A case study in marathi. In IAPR Workshop on Artificial Neural
Networks in Pattern Recognition, pages 121–128. Springer, 2022.
[154] Zeerak Waseem and Dirk Hovy. Hateful symbols or hateful people? predictive features for hate
speech detection on twitter. In Proceedings of the NAACL student research workshop, pages 88–93,
2016.
[155] Maximilian Wich, Svenja Räther, and Georg Groh. German abusive language dataset with focus
on covid-19. In Proceedings of the 17th Conference on Natural Language Processing (KONVENS
2021), pages 247–252, 2021.
[156] Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. Overview of the germeval 2018 shared
task on the identification of offensive language. 2018.
[160] Ziqi Zhang, David Robinson, and Jonathan Tepper. Detecting hate speech on twitter using a
convolution-gru based deep neural network. In The Semantic Web: 15th International Conference,
ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 745–760. Springer,
2018.
34
This work is shared under a CC BY-SA 4.0 license unless otherwise noted