Comparison_of_Deep_Learning_and_Ensemble_Learning_in_Classification_of_Toxic_Comments
Comparison_of_Deep_Learning_and_Ensemble_Learning_in_Classification_of_Toxic_Comments
Abstract—The paper is focused on recognition of various racial diversity, skin colour, religion, gender, or sexual
forms of toxic comments on social networks, particularly on orientation [17]. Companies that have hate speech policies
the offensive and hate speech and on the cyberbullying. The in place include Facebook and YouTube. In 2018, there
spreading of toxic content through social networks is was a post containing part of the United Declaration of
nowadays very serious problem, which can be harmful for a Independence of the States, which refers to Native
democratic society functioning. The paper presents an Americans as "ruthless Indian savages", marked as
experiment with various machine learning methods to Facebook hate speech and removed from his page [11]. In
discover which from them would be the most suitable for 2019, plataform for videos sharing YouTube has shut
building recognition models. Primarily, we will compare the down channels such as American radio host Jesse Lee
deep learning and ensemble learning. Peterson, based on his politically hateful speech.
The offensive speech may involve various forms of
I. INTRODUCTION toxicity. It can be difficult to distinguish general offensive
language from hate speech. The offensive language can
As the Internet expands, so does the amount of content contain any kind of profanity or insult, so the hate speech
on it. In addition to content based on facts, a large amount can be classified as a subset of offensive language. When
of content is a toxic type, which negatively affects the web detecting offensive posts, the type and purpose of
users, particularly teenagers. While most web users offensive expression is considered. And therefore, the
respect the norms of social behaviour, some users do not, criteria for detection should capture the attributes of
and their comments reflect their antisocial behaviour. The offensive expression in general [3].
anonymity provided by social networks, simplicity of
The cyberbullying represents content posted online by
contribution and easiness of toxic content spreading
an individual or a group who is aggressive, humiliating, or
represent an extremely topical problem today. An
hurtful towards the victim who does not know or cannot
automated detection of various forms of toxicity can be
easily defend himself. It may be described on a basis of
helpful in the process of regulation and limitation of them
three criteria: intention, repetition, and superiority. Leaked
by moderators of web discussions but also by social
information means a big problem for the bullied person,
networks users.
since if any defamatory or confidential information is
Social media has seen increased use as a source of published on the web, it is very difficult to remove it [14].
information and is mainly used to search for information
Machine learning and its methods are very popular
on serious topics. There has also been great use by those
today and useful. They are used for various forms of
who seek health information. People use social "tools" to
classification, regression and solution problems associated
gather information, share stories, but also discuss issues.
with text or image detection. They have a wide range of
Similarly, healthcare organizations see benefits of social
uses through detection of antisocial behavior, cyber
media because they give them access to healthcare
security, healthcare, IoT and various others [13]. We
information.
focused our research on the automatic detection of three
Social media comes to the fore as a source of forms of the antisocial behaviour: hate speech, offensive
information in times of disaster and risk situations, speech, and cyberbullying using machine learning.
although the accuracy of the information that is shared
The main objectives of the study intended to achieve
through these channels is unclear. Therefore, it is essential
are as follows:
to learn more about how people evaluate the information
they receive on social media websites, especially in terms x A comparison of various machine learning
of their credibility. methods in generation of models for recognition
There are many kinds of uncredible information and of various antisocial behaviour forms (hate,
toxic comments, which can be and is harmful for users as offensive, and cyberbullying) on social networks,
fake news, conspiracy theories, trolling, hatreds, offensive but similar in impact on social networks users.
posts, cyberbullying, and phishing. We have concentrated x A meta-level comparison by offering an
on detection of hate speech, offensive speech and evaluation of the success of classical learning
cyberbullying in our research. contra deep learning, and ensemble learning in
The hate speech is defined as public speech that building detection models.
expresses hatred or promotes prejudice and violence
against a person or a group based on something such as