A Deep-Word and Character Based Approach To Offensive Language Identification

Uploaded by

Arpan Jain 22056

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

A Deep-Word and Character Based Approach To Offensive Language Identification

Uploaded by

Arpan Jain 22056

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based

Approach to Offensive Language Identification

Ehsan Doostmohammadi♣, ♠ , Hossein Sameti♣ , Ali Saffar♠
♣ Speech
Processing Lab, Department of Computer Engineering,
Sharif University of Technology, Tehran, Iran
♠ NazarBin, Tehran, Iran

[email protected], [email protected],
[email protected]

Abstract Neural Network (CNN) for character-level pro-

cessing1 . Character-level processing is beneficial,
This paper presents the models submitted by
Ghmerti team for subtasks A and B of the Of- as offensive comments are likely to follow un-
fensEval shared task at SemEval 2019. Offen- orthodox writing styles, contain obfuscated words,
sEval addresses the problem of identifying and or have irregular word separation which leads to
categorizing offensive language in social me- tokenization issues (Mehdad and Tetreault, 2016;
dia in three subtasks; whether or not a content Nobata et al., 2016). We also experimented with
is offensive (subtask A), whether it is targeted two other methods, a Support Vector Machine
(subtask B) towards an individual, a group, or (SVM) with TFIDF and count features and another
other entities (subtask C). The proposed ap-
SVM with BERT (Devlin et al., 2018) -encoded
proach includes character-level Convolutional
Neural Network, word-level Recurrent Neural sentences as input, both with lower performances
Network, and some preprocessing. The per- comparing with the deep model.
formance achieved by the proposed model for After overviewing the related work in section 2,
subtask A is 77.93% macro-averaged F1 -score. we discuss the methodology and the data in details
in section 3, and the results in section 4. In section
1 Introduction
5, we analyze the results and conclude the paper
The massive rise in user-generated web content, in section 6.
alongside with the freedom of speech in social me-
dia and anonymity of the users has brought about 2 Related Work
an increase in online offensive content and anti-
Offensive language identification has been of in-
social behavior. The consequences of such behav-
terest for researchers in recent years. Early work
ior on genuine users of the social media have be-
in the related fields include detection of online
come a serious concern for researchers in Natural
trolling (Cambria et al., 2010), racism (Greevy and
Language Processing and related fields in recent
Smeaton, 2004), and cyberbullying (Dinakar et al.,
years.
2012).
The shared task number 6 at SemEval 2019,
OffensEval (Zampieri et al., 2019b), proposes to Papers published in recent years include
model the task of offensive language identification (Davidson et al., 2017), which introduces the Hate
hierarchically, which means identifying the offen- Speech Detection dataset and experiments with
sive content, whether it is targeted, and if so, the different machine learning models, such as logistic
target of the offense. In OffensEval, offensive lan- regression, naı̈ve Bayes, random forests, and lin-
guage is defined as “any form of non-acceptable ear SVMs to investigate hate speech and offensive
language (profanity) or a targeted offense, which language, (Malmasi and Zampieri, 2017) which
can be veiled or direct” which includes “insults, experiments further on the same dataset using
threats, and posts containing profane language or SVMs with n-grams and skip-grams features, and
swear words” (Zampieri et al., 2019b). (Gambäck and Sikdar, 2017) and (Zhang et al.,
We have participated in the first two subtasks (A 2018), both exploring the performance of neural
and B) of OffensEval with the proposed approach networks and comparing them with other machine
of a deep model consisting of a Recurrent Neural 1
You can find the code of the deep model on this project’s
Network (RNN) for word-level and Convolutional repository on github: github.com/edoost/offenseval

617
Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), pages 617–621
Minneapolis, Minnesota, USA, June 6–7, 2019. ©2019 Association for Computational Linguistics
learning approaches. Also, there has been pub- B. Two representations are therefore created for
lished a couple of surveys covering various work each input x:
addressing the identification of abusive, toxic, and
offensive language, hate speech, etc., and their 1. xc which is the indexed representation of
methodology including (Schmidt and Wiegand, the tweet based on its characters padded to
2017) and (Fortuna and Nunes, 2018). the length of the longest word in the corpus.
Additionally, there were several workshops and The indices include 256 of the most common
shared tasks on offensive language identification characters, plus 0 for padding and 1 for un-
and related problems, including TA-COS2 , Abu- known characters.
sive Language Online3 , and TRAC4 (Kumar et al., 2. xw which is the embeddings of the words
2018), and GermEval (Wiegand et al., 2018), in the input tweet based on FastText’s 600B-
which shows the significance of the problem. token common crawl model (Mikolov et al.,
2018).
3 Methodology and Data
The methodology used for both subtask A, offen- Then, xc is fed into an embedding layer with
sive language identification, and subtask B, au- output size of 32 and a CNN layer after that. xc
tomatic categorization of offense types, consists is then concatenated with xw and both are fed to
of a preprocessing phase and a deep classification a unidirectional RNN with LSTM cell of size 256,
phase. We first introduce the preprocessing phase, the output of which is the input to two consecu-
then elaborate on the classification phase. tive fully-connected layers that map their input to
an R128 and an R2 space, respectively. We also ap-
3.1 Preprocessing plied dropout of keeping rate 0.5 on CNN’s output,
The preprocessing phase consists of (1) replacing xw , RNN’s output, and the first fully-connected
obfuscated offensive words with their correct form layer’s output.
and (2) tweet tokenization using NLTK tweet tok- The CNN layer consists of four consecutive
enizer (Bird et al., 2009). In social media, some sub-layers:
words are distorted in a way to escape the of-
1. CNN consisting of 64 filters with kernel size
fense detection systems or to reduce the imperti-
of 2, stride of 1, same padding and RELU ac-
nence. For instance, ‘asshole’ may be written as
tivation;
‘a$$hole’, ‘a$sh0le’, ‘a**hole’, etc. Having a list
of English offensive words, we can create a list 2. max-pooling layer with pool size and stride
containing most of the possible permutations. Us- of 2;
ing such a list will ease the job for the classifier
and searching in it is computationally cheap. Fur- 3. another CNN, same as the first one, but with
thermore, replacing contractions, e.g. ‘I’m’ with ‘I 128 filters;
am’, and replacing common social media abbrevi-
4. the same max-pooling again.
ations, e.g. ‘w/’ with ‘with’, were not helpful and
were not used to train the final model. Finally, we used an AdamOptimizer (Kingma
and Ba, 2014) with learning rate of 1e−3 and
3.2 Deep Classifier
batch size of 32 to train the model.
Given a tweet, we want to know if its offensive or
not (subtask A), and if the offense is targeted (sub- 3.3 Baseline Methods
task B). Regarding that both subtasks are problems We used two baseline methods for subtask A:
of binary classification, we used one architecture
to tackle both. To define the problem, if we have • an SVM with 1- to 3-gram word TFIDF and
a tweet x, we want to predict the label y, OFF or 1- to 5-gram character count feutrue vectors
NOT in subtask A, and TIN or UNT in subtask as input;
2
http://ta-cos.org/ • an SVM with BERT representations of the
3
https://sites.google.com/site/ tweets (using average pooling (Xiao, 2018))
abusivelanguageworkshop2017/
4
https://sites.google.com/view/trac1/ as input using BERT-Large, Uncased
home model.

618
The SVMs were trained for 15 epochs cluding the validation data) and DeepModel+val
with stochastic gradient descent, hinge loss, on the combination of the training and validation
alpha of 1e−6, elasticnet penalty, and data. The best performance is in bold.
random state of 5. The SVMs were imple-
mented using Scikit-learn (Pedregosa et al., 2011). System Macro F1 Accuracy
All NOT baseline 0.4189 0.7209
3.4 Data All OFF baseline 0.2182 0.2790
The main dataset used to train the model is Of- SVM 0.7452 0.8011
fensive Language Identification Dataset (OLID) BERT-SVM 0.7507 0.8011
Zampieri et al. (2019a). The dataset is annotated DeepModel 0.7788 0.8326
hierarchically to identify offensive language (OF- DeepModel+val 0.7793 0.8337
Fensive or NOT), whether it is targeted (Targeted
INsult or UNTargeted), and if so, its target (INDi- Table 1: Results for subtask A
vidual, GRouP, or OTHer). We divided the 13,240
samples in the training set into 12,000 samples for The best performance belongs to Deep-
training and 1,240 samples for validation. Model+val by a margin of more than 2.8 percent,
with the best baseline performance, BERT-SVM.
As neural networks require huge amount of
However, it should be mentioned that the results in
training data, we tried adding more data from the
the first two rows belong to a model trained only
dataset of the First Workshop on Trolling, Aggres-
on OLID. You can see the confusion matrix for the
sion, and Cyberbullying (TRAC-1) (Kumar et al.,
best performance in figure 1.
2018) which was not helpful. However, adding
the training data from Toxic Comment Classifica- Confusion Matrix
tion Challenge on Kaggle (Conversation AI, 2017)
increased the macro-averaged F1 -score on the val- 0.8
idation set by ∼ 2%. This data comprises tweets
NOT
with positive and negative tags in six categories: 572 48
toxic, severe toxic, obscene, threat, 0.6
True label

insult, identity hate. We only used

toxic and severe toxic positive samples
0.4
as OFF and the ones with no positive label in any
category as NOT. None of the data from other cat-
OFF
egories, either positive or negative, were included 95 145 0.2
in the additional training data. After that, we were
left with 109,236 samples, most of which were la-
0.0
beled as NOT. To balance OFF and NOT samples,
NOT

OFF

84,626 of NOT samples were randomly removed. Predicted label

In the end, 12,305 OFF and 12,305 NOT samples
were added to the training data. Figure 1: The confusion matrix for DeepModel+val in
subtask A
4 Results
From the confusion matrix we can see that the
Finally, we trained the baseline models in 3.3 and performance of DeepModel+val on NOT is quite
the model described in 3.2 using the combination good, but not on OFF. You can see the detailed
of the OLID training data and the data from Toxic results of DeepModel+val in table 2.
Comment Classification Challenge (which is de-
scribed in 3.4). Precision Recall F1 -score
You can see the macro-averaged F1 -score and NOT 0.8576 0.9226 0.8889
accuracy on the test set for the baseline scores OFF 0.7513 0.6042 0.6697
provided by task organizers, baseline methods we
used (on both training and validation data), and Table 2: Detailed DeepModel+val results in subtask A
the deep classifier model (DeepModel) in table 1.
DeepModel is trained on the training data (not in- In subtask B, DeepModel+val outperformed the

619
baseline results by a large margin, like subtask A. were not much better than the SVM with TFIDF
The results for subtask B are presented in table 3. and count features, probably due the fact that the
BERT model requires fine-tuning for more task-
System Macro F1 Accuracy specific representations.
All TIN baseline 0.4702 0.8875 The majority of DeepModel+val’s errors are in
All UNT baseline 0.1011 0.1125 OFF class and can be categorized into (1) sar-
DeepModel 0.6065 0.8583 casm: the model is unable to detect sarcastic lan-
DeepModel+val 0.6400 0.8875 guage which is even difficult for humans to detect;
(2) emotion: discerning emotions, such as anger,
Table 3: Results for subtask B seems to be a challenge for the model; (3) eth-
nic and racial slurs, etc. Solving these problems
This time, adding the validation data made a require a more comprehensive knowledge of the
considerable difference, as the training data for context and the language, which was examined in
subtask B is fewer. You can see the confusion ma- works such as (Poria et al., 2016) and improved
trix for DeepModel+val in figure 2. the results. However, experimenting with emotion
Confusion Matrix embeddings in the current work was not helpful
and did not appear in the final results. Being aware
of the emotion of the text, personality of the au-
0.8 thor, and sentiment of the sentences is helpful to
TIN
206 7 detect offensive language, as many offensive con-
0.6 tents have an angry tone (ElSherief et al., 2018)
True label

or do not contain profane language (Malmasi and

Zampieri, 2018). One can also make use of the
0.4 benefits of BERT’s context and sentence sequence
awareness by fine-tuning it on the training data,
UNT
20 7 0.2 which is computationally expensive and was not
feasible for the authors of this paper.
0.0
6 Conclusion
TIN

UNT

Predicted label
In this paper, we introduced Ghmerti team’s ap-
Figure 2: The confusion matrix for the DeepModel+val proach to the problems of ‘offensive language
in subtask B identification’ and ‘automatic categorization of of-
fense type’ in shared task 6 of SemEval 2019, Of-
The confusion matrix shows that the perfor- fensEval. In subtask A, the neural network-based
mance of the model is good for TIN, but poor for model outperformed the other methods, including
UNT. Table 4 shows the detailed results for Deep- an SVM with word TFIDF and character count
Model+val in subtask B, which indicates that the features and another SVM with BERT-encoded
imbalance is worse than subtask A and the poor tweets as input. Furthermore, analysis of the re-
performance on UNT is mainly due to low recall. sults indicates that sarcastic language, inability to
discern the emotions such as anger, and ethnic and
Precision Recall F1 -score racial slurs constitute a considerable portion of the
TIN 0.9115 0.9671 0.9385 errors. Such deficiencies demand larger training
UNT 0.5000 0.2593 0.3415 corpora and variety of other features, such as in-
formation on sarcasm, emotion, personality, etc.
Table 4: Detailed DeepModel+val results in subtask B

5 Analysis References
Steven Bird, Ewan Klein, and Edward Loper. 2009.
In subtask A, DeepModel+val outperformed the Natural language processing with Python: analyz-
second best method, BERT-SVM, by 2.86% ing text with the natural language toolkit. ” O’Reilly
Macro F1 -score. BERT-SVM results, however, Media, Inc.”.

620
Erik Cambria, Praphul Chandra, Avinash Sharma, and Yashar Mehdad and Joel Tetreault. 2016. Do charac-
Amir Hussain. 2010. Do not feel the trolls. ISWC, ters abuse more than words? In Proceedings of the
Shanghai. 17th Annual Meeting of the Special Interest Group
on Discourse and Dialogue, pages 299–303.
Conversation AI. 2017. Toxic comment classification
challenge: Identify and classify toxic online com- Tomas Mikolov, Edouard Grave, Piotr Bojanowski,
ments. Christian Puhrsch, and Armand Joulin. 2018. Ad-
vances in pre-training distributed word representa-
Thomas Davidson, Dana Warmsley, Michael Macy, tions. In Proceedings of the International Confer-
and Ingmar Weber. 2017. Automated Hate Speech ence on Language Resources and Evaluation (LREC
Detection and the Problem of Offensive Language. 2018).
In Proceedings of ICWSM.
Chikashi Nobata, Joel Tetreault, Achint Thomas,
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Yashar Mehdad, and Yi Chang. 2016. Abusive
Kristina Toutanova. 2018. Bert: Pre-training of deep Language Detection in Online User Content. In
bidirectional transformers for language understand- Proceedings of the 25th International Conference
ing. arXiv preprint arXiv:1810.04805. on World Wide Web, pages 145–153. International
World Wide Web Conferences Steering Committee.
Karthik Dinakar, Birago Jones, Catherine Havasi,
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
Henry Lieberman, and Rosalind Picard. 2012. Com-
B. Thirion, O. Grisel, M. Blondel, P. Pretten-
mon sense reasoning for detection, prevention, and
hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas-
mitigation of cyberbullying. ACM Transactions on
sos, D. Cournapeau, M. Brucher, M. Perrot, and
Interactive Intelligent Systems (TiiS), 2(3):18.
E. Duchesnay. 2011. Scikit-learn: Machine learning
Mai ElSherief, Vivek Kulkarni, Dana Nguyen, in Python. Journal of Machine Learning Research,
William Yang Wang, and Elizabeth Belding. 2018. 12:2825–2830.
Hate Lingo: A Target-based Linguistic Analysis Soujanya Poria, Erik Cambria, Devamanyu Hazarika,
of Hate Speech in Social Media. arXiv preprint and Prateek Vij. 2016. A deeper look into sarcas-
arXiv:1804.04257. tic tweets using deep convolutional neural networks.
In Proceedings of COLING 2016, the 26th Inter-
Paula Fortuna and Sérgio Nunes. 2018. A Survey on national Conference on Computational Linguistics:
Automatic Detection of Hate Speech in Text. ACM Technical Papers, pages 1601–1612.
Computing Surveys (CSUR), 51(4):85.
Anna Schmidt and Michael Wiegand. 2017. A Sur-
Björn Gambäck and Utpal Kumar Sikdar. 2017. Using vey on Hate Speech Detection Using Natural Lan-
Convolutional Neural Networks to Classify Hate- guage Processing. In Proceedings of the Fifth Inter-
speech. In Proceedings of the First Workshop on national Workshop on Natural Language Process-
Abusive Language Online, pages 85–90. ing for Social Media. Association for Computational
Linguistics, pages 1–10, Valencia, Spain.
Edel Greevy and Alan F Smeaton. 2004. Classifying
racist texts using a support vector machine. In Pro- Michael Wiegand, Melanie Siegel, and Josef Rup-
ceedings of the 27th annual international ACM SI- penhofer. 2018. Overview of the GermEval 2018
GIR conference on Research and development in in- Shared Task on the Identification of Offensive Lan-
formation retrieval, pages 468–469. ACM. guage. In Proceedings of GermEval.

Diederik P Kingma and Jimmy Ba. 2014. Adam: A Han Xiao. 2018. bert-as-service. https://
method for stochastic optimization. arXiv preprint github.com/hanxiao/bert-as-service.
arXiv:1412.6980.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov,
Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, and Sara Rosenthal, Noura Farra, and Ritesh Kumar.
Marcos Zampieri. 2018. Benchmarking Aggression 2019a. Predicting the Type and Target of Offensive
Identification in Social Media. In Proceedings of the Posts in Social Media. In Proceedings of NAACL.
First Workshop on Trolling, Aggression and Cyber- Marcos Zampieri, Shervin Malmasi, Preslav Nakov,
bulling (TRAC), Santa Fe, USA. Sara Rosenthal, Noura Farra, and Ritesh Kumar.
2019b. SemEval-2019 Task 6: Identifying and Cat-
Shervin Malmasi and Marcos Zampieri. 2017. Detect-
egorizing Offensive Language in Social Media (Of-
ing Hate Speech in Social Media. In Proceedings
fensEval). In Proceedings of The 13th International
of the International Conference Recent Advances in
Workshop on Semantic Evaluation (SemEval).
Natural Language Processing (RANLP), pages 467–
472. Ziqi Zhang, David Robinson, and Jonathan Tepper.
2018. Detecting Hate Speech on Twitter Using a
Shervin Malmasi and Marcos Zampieri. 2018. Chal- Convolution-GRU Based Deep Neural Network. In
lenges in Discriminating Profanity from Hate Lecture Notes in Computer Science. Springer Ver-
Speech. Journal of Experimental & Theoretical Ar- lag.
tificial Intelligence, 30:1–16.

621

Promax 3d
No ratings yet
Promax 3d
214 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
DZ e X Z: Probabilidad Acumulada Inferior para Distribución Normal N (0,1)
No ratings yet
DZ e X Z: Probabilidad Acumulada Inferior para Distribución Normal N (0,1)
2 pages
Upb at Semeval-2020 Task 12: Multilingual Offensive Language Detection On Social Media by Fine-Tuning A Variety of Bert-Based Models
No ratings yet
Upb at Semeval-2020 Task 12: Multilingual Offensive Language Detection On Social Media by Fine-Tuning A Variety of Bert-Based Models
10 pages
CH 04
No ratings yet
CH 04
47 pages
BRUMS at HASOC 2019: Deep Learning Models For Multilingual Hate Speech and Offensive Language Identification
No ratings yet
BRUMS at HASOC 2019: Deep Learning Models For Multilingual Hate Speech and Offensive Language Identification
9 pages
Detecting Offensive Tweets in Hindi-English Code-Switched Language-W18-3504
No ratings yet
Detecting Offensive Tweets in Hindi-English Code-Switched Language-W18-3504
9 pages
Hate Speech Detection in Online Social Media
No ratings yet
Hate Speech Detection in Online Social Media
12 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
BERT-base Ensemble Approaches for HDS
No ratings yet
BERT-base Ensemble Approaches for HDS
7 pages
assign 5 tt
No ratings yet
assign 5 tt
13 pages
TWEETEVAL: Unified Benchmark and Comparative Evaluation For Tweet Classification
No ratings yet
TWEETEVAL: Unified Benchmark and Comparative Evaluation For Tweet Classification
7 pages
INDEXReport Ayush (1)
No ratings yet
INDEXReport Ayush (1)
38 pages
Zhang Et Al. (2018)
No ratings yet
Zhang Et Al. (2018)
10 pages
Offensive Social Network Posts Classification Using Apache Spark Platform
No ratings yet
Offensive Social Network Posts Classification Using Apache Spark Platform
7 pages
Topic 1
No ratings yet
Topic 1
2 pages
Pre Processing
No ratings yet
Pre Processing
9 pages
Hate Speech Detection Using Lstm and NLp Sushan Pratihar 3 Page
No ratings yet
Hate Speech Detection Using Lstm and NLp Sushan Pratihar 3 Page
13 pages
Badjatiya 2017
No ratings yet
Badjatiya 2017
2 pages
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
No ratings yet
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
7 pages
Deep Learning For Hate Speech Detection in Tweets (Pinkesh Badjatiya and Others)
No ratings yet
Deep Learning For Hate Speech Detection in Tweets (Pinkesh Badjatiya and Others)
3 pages
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
No ratings yet
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
13 pages
Hatemonitors: Language Agnostic Abuse Detection in Social Media
No ratings yet
Hatemonitors: Language Agnostic Abuse Detection in Social Media
8 pages
Sentiment Analysis Presentationnotes
No ratings yet
Sentiment Analysis Presentationnotes
4 pages
2023 Ranlp-1 125
No ratings yet
2023 Ranlp-1 125
7 pages
Mama Edha at Semeval-2017 Task 8: Stance Classification With CNN and Rules
No ratings yet
Mama Edha at Semeval-2017 Task 8: Stance Classification With CNN and Rules
5 pages
Ml Projrct Article 2
No ratings yet
Ml Projrct Article 2
6 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
Natural language processing-Section (5)
No ratings yet
Natural language processing-Section (5)
38 pages
Hate Speech Detection Using Machine Learning2
No ratings yet
Hate Speech Detection Using Machine Learning2
4 pages
One-Step and Two-Step Classification For Abusive Language Detection
No ratings yet
One-Step and Two-Step Classification For Abusive Language Detection
5 pages
Group08_BDM01_Topic-Modelling-in-Text-Classification
No ratings yet
Group08_BDM01_Topic-Modelling-in-Text-Classification
19 pages
Marathi Hate Speech Detection IEEE Paper
No ratings yet
Marathi Hate Speech Detection IEEE Paper
5 pages
Boucherit Oussama F1
No ratings yet
Boucherit Oussama F1
55 pages
Detection Hate Offensive
No ratings yet
Detection Hate Offensive
6 pages
2022-V14I4075
No ratings yet
2022-V14I4075
9 pages
T6-9
No ratings yet
T6-9
9 pages
Offensive Tweet Project Report
No ratings yet
Offensive Tweet Project Report
3 pages
Online Abuse Detection
No ratings yet
Online Abuse Detection
8 pages
CP5074 - SNA Unit V Notes
No ratings yet
CP5074 - SNA Unit V Notes
21 pages
Machine Learning Report
No ratings yet
Machine Learning Report
15 pages
CSE4062S21_Group3_Project_Delivery7_FinalReport
No ratings yet
CSE4062S21_Group3_Project_Delivery7_FinalReport
9 pages
Text Classificatio N: - by TV Harshawardhan (COE17B 005)
No ratings yet
Text Classificatio N: - by TV Harshawardhan (COE17B 005)
19 pages
caps.final
No ratings yet
caps.final
20 pages
Perplexed by Quality: A Perplexity-Based Method For Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data
No ratings yet
Perplexed by Quality: A Perplexity-Based Method For Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data
14 pages
Batch 17
No ratings yet
Batch 17
27 pages
A Deep Learning Approach For Sentiment Analysis in Spanish Tweets
No ratings yet
A Deep Learning Approach For Sentiment Analysis in Spanish Tweets
8 pages
Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model
No ratings yet
Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model
26 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
#Metoomaastricht: Building A Chatbot To Assist Survivors of Sexual Harassment
No ratings yet
#Metoomaastricht: Building A Chatbot To Assist Survivors of Sexual Harassment
19 pages
Arxiv
No ratings yet
Arxiv
15 pages
Semeval-2015 Task 10: Sentiment Analysis in Twitter: Sara Rosenthal Preslav Nakov Svetlana Kiritchenko
No ratings yet
Semeval-2015 Task 10: Sentiment Analysis in Twitter: Sara Rosenthal Preslav Nakov Svetlana Kiritchenko
13 pages
Twitter Sentiment Analysis System
No ratings yet
Twitter Sentiment Analysis System
5 pages
13 Chapter 6 PSO GA DT
No ratings yet
13 Chapter 6 PSO GA DT
11 pages
2019 Using Deep Neural Network
No ratings yet
2019 Using Deep Neural Network
4 pages
Tweets Classification With BERT in The Field of Disaster Management
No ratings yet
Tweets Classification With BERT in The Field of Disaster Management
15 pages
Ikm at Semeval-2017 Task 8: Convolutional Neural Networks For Stance Detection and Rumor Verification
No ratings yet
Ikm at Semeval-2017 Task 8: Convolutional Neural Networks For Stance Detection and Rumor Verification
5 pages
Sarcastic Tweet - MGR
No ratings yet
Sarcastic Tweet - MGR
26 pages
Bertweet: A Pre-Trained Language Model For English Tweets
No ratings yet
Bertweet: A Pre-Trained Language Model For English Tweets
6 pages
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Iiser Bhopal CV
No ratings yet
Iiser Bhopal CV
1 page
IISER Bhopal CV
No ratings yet
IISER Bhopal CV
1 page
Arpan Jain Resume
No ratings yet
Arpan Jain Resume
1 page
Analyzing The Use of Large Language Models For Content
No ratings yet
Analyzing The Use of Large Language Models For Content
8 pages
Basics of C Programming U24ge1108
No ratings yet
Basics of C Programming U24ge1108
3 pages
Quite A Box of Tricks Book PDF
No ratings yet
Quite A Box of Tricks Book PDF
33 pages
Ginodecastroits
No ratings yet
Ginodecastroits
8 pages
ISO 9001 2015 2008 Clause by Clause Matrix
100% (1)
ISO 9001 2015 2008 Clause by Clause Matrix
37 pages
A3 Project Management and Problem Solving Thinking 1. What Is An A3 Project?
No ratings yet
A3 Project Management and Problem Solving Thinking 1. What Is An A3 Project?
7 pages
Lesson 34. An Economic Interpretation of LP Duality: Today
No ratings yet
Lesson 34. An Economic Interpretation of LP Duality: Today
4 pages
12 Ict-X
No ratings yet
12 Ict-X
176 pages
High-Throughput Multi Standard Transform Core Supporting Mpegh.264vc-1 Using Common Sharing Distributed Arithmetic
No ratings yet
High-Throughput Multi Standard Transform Core Supporting Mpegh.264vc-1 Using Common Sharing Distributed Arithmetic
4 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
Network Error: Connection Timed Out: R S
No ratings yet
Network Error: Connection Timed Out: R S
15 pages
R20-Atcd-Q.p - Model Paper.
100% (1)
R20-Atcd-Q.p - Model Paper.
3 pages
SQL Queries
No ratings yet
SQL Queries
82 pages
Polynomial Division Problem and Its Synthetic Counterpart
No ratings yet
Polynomial Division Problem and Its Synthetic Counterpart
2 pages
Chapter - 4 - Internet of Things (IoT)
No ratings yet
Chapter - 4 - Internet of Things (IoT)
35 pages
SGC Map Down Loader
No ratings yet
SGC Map Down Loader
141 pages
The Educational Benefits of Using Artificial Intelligence For Senior High School in Batangas City
No ratings yet
The Educational Benefits of Using Artificial Intelligence For Senior High School in Batangas City
61 pages
MSC Training Catalogue 2014: Hängpilsgatan 6, SE-426 77 Västra Frölunda, Sweden Tel: +46 (0) 31 7485990
No ratings yet
MSC Training Catalogue 2014: Hängpilsgatan 6, SE-426 77 Västra Frölunda, Sweden Tel: +46 (0) 31 7485990
44 pages
Installation Instructions 31J K M N P Win7 H-2011-0216-H
No ratings yet
Installation Instructions 31J K M N P Win7 H-2011-0216-H
11 pages
Tektronix Tas 465 Service Manual
No ratings yet
Tektronix Tas 465 Service Manual
3 pages
Unit 2 MCQs INT 250
No ratings yet
Unit 2 MCQs INT 250
24 pages
Ict 10 Q1
No ratings yet
Ict 10 Q1
11 pages
Intro To Traffic Eng - A Manual For Data Collection and Analysis 2nd Edition - Chap11-14 and Back
No ratings yet
Intro To Traffic Eng - A Manual For Data Collection and Analysis 2nd Edition - Chap11-14 and Back
48 pages
ISO 20000 Clauses
No ratings yet
ISO 20000 Clauses
1 page
Implementing An REA Model in A Relational Database
No ratings yet
Implementing An REA Model in A Relational Database
13 pages
NITVIEW
No ratings yet
NITVIEW
2 pages
Index 2
No ratings yet
Index 2
5 pages
370MP SN W01131197 Documentation
No ratings yet
370MP SN W01131197 Documentation
84 pages
Curriculum Vitae English
No ratings yet
Curriculum Vitae English
2 pages

A Deep-Word and Character Based Approach To Offensive Language Identification

Uploaded by

A Deep-Word and Character Based Approach To Offensive Language Identification

Uploaded by

Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based

Approach to Offensive Language Identification

Abstract Neural Network (CNN) for character-level pro-

insult, identity hate. We only used

84,626 of NOT samples were randomly removed. Predicted label

or do not contain profane language (Malmasi and

You might also like