Detection of Cyberbullying On Social Media Using Machine Learning

The document discusses detecting cyberbullying on social media using machine learning. It summarizes an existing system that uses keyword matching, opinion mining and social network analysis to detect cyberbullying with 79% precision and 71% recall. The proposed system classifies tweets and Wikipedia comments as containing cyberbullying or not using natural language processing techniques like tokenization, stemming, stop word removal and machine learning classifiers, achieving over 90% accuracy for tweets and over 80% for Wikipedia comments. It describes the technical approach and system requirements to implement the proposed cyberbullying detection system.

Uploaded by

SLDFLAG

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views

Detection of Cyberbullying On Social Media Using Machine Learning

Uploaded by

SLDFLAG

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Detection of Cyberbullying on Social Media Using

Machine learning
ABSTRACT

Cyberbullying is a major problem encountered on internet that affects teenagers and

also adults.It has lead to mishappenings like suicide and depression. Regulation of
content on Social media platorms has become a growing need. The following study
uses data from two different forms of cyberbullying, hate speech tweets from
Twittter and comments based on personal attacks from Wikipedia forums to build a
model based on detection of Cyberbullying in text data using Natural Language
Processing and Machine learning. Three methods for Feature extraction and four
classifiers are studied to outline the best approach. For Tweet data the model
provides accuracies above 90% and for Wikipedia data it gives accuracies above
80%.

EXISTING SYSTEM

 Hsien[1] used an approach using keyword matching, opinion mining and

social network analysis and got a precision of 0.79 and recall of 0.71 from
datasets from four websites.Patxi Gal´an-Garc´ıa et al.[2] proposed a
hypothesis that a troll(one who cyberbullies) on a social networking sites
under a fake profile always has a real profile to check how other see the fake
profile. They proposed a Machine learning approach to determine such
profiles. The identification process studied some profiles which has some kind
of close relation to them.

 The method used was to select profiles for study, acquire information of
tweets, select features to be used from profiles and using ML to find the author
of tweets. 1900 tweets were used belonging to 19 different profiles. It had an
accuracy of 68% for identifying author. Later it was used in a Case Study in a
school in Spain where out of some suspected students for Cyberbullying the
real owner of a profile had to be found and the method worked in the case. The
following method still has some shortcomings.

 For example a case where trolling account doesnt have a real account to fool
such systems or experts who can change writing styles and behaviours so that
no patterns are found . For changing writing styles more efficient algorithms
will be needed.

 Mangaonkar et al. [3] proposed a collaborative detection method where there

are multiple detection nodes connected to each other where each nodes uses
either different or same algorithm and data and results were combined to
produce results. P. Zhou et al.[4] suggested a B-LSTM technique based on
concentration. Banerjee et al.[5]. used KNN with new embeddings to get an
precision of 93%.

Disadvantages
 A vocabulary is not designed from all the documents. The vocabulary may
consist of all words (tokens) in all documents or some top frequency tokens
 Tf-Idf method is not similar to the bag of words model since it uses the same
way to create a vocabulary to get its features.

PROPOSED SYSTEM

Cyberbullying detection is solved in this project as a binary classification problem

where we are detecting two majors form of Cyberbullying: hate speech on Twitter
and Personal attacks on Wikipedia and classifying them as containing Cyberbullying
or not.
 Tokenization: In tokenization we split raw text into meaningful words or tokens.
For example, the text “we will do it” can be tokenized into ‘we’, ‘will’, ‘do’, ‘it’.
Tokenization can be done into words called word tokenization or sentences called
sentence tokenization. Tokenization has many more variants but in the project we
use Regex Tokenizer. In regex tokenizer tokens are decided based on rule which in
the case is a regular expression. Tokens matching the following regular expression
are chosen Eg For the regular expression ‘\w+’ all the alphanumeric tokens
are extracted.

 Stemming: Stemming is the process of converting a word into a root word or stem.
Eg for three words ‘eating’ ‘eats’ ‘eaten’ the stem is ‘eat’. Since all three branch
words of root ‘eat’ represent the same thing it should be recognized as similar.
NLTK offers 4 types of stemmers: Porter Stemmer, Lancaster Stemmer, Snowball
Stemmer and Regexp Stemmer. The following project uses PorterStemmer.

 Stop word Removal: Stop words are words that do not add any meaning to a
sentence eg. some stop words for english language are: what, is, at, a etc. These
words are irrelevant and can be removed. NLTK contains a list of english stop words
which can be used to filter out all the tweets. Stop words are often removed from the
text data when we train deep learning and Machine learning models since the
information they provide is irrelevant to the model and helps in improving
performance.
Advantages
 Common Bag of Words model takes as input of multiple words and predicts
the word based on the context. Input can be one word or multiple words.
 CBOW model takes a mean of context of input words but two semantics can
be clicked for a single word. i.e. two vector of Apple can be predicted. First is
for the firm Apple and next is Apple as a fruit.
SYSTEM REQUIREMENTS

➢ H/W System Configuration:-

➢ Processor - Pentium –IV

➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

SOFTWARE REQUIREMENTS:

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Front-End : Python.

 Back-End : Django-ORM

 Designing : Html, css, javascript.

 Data Base : MySQL (WAMP Server).

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
yshu ppt - Copy
No ratings yet
yshu ppt - Copy
23 pages
Abstract 9
No ratings yet
Abstract 9
11 pages
Nasimuzzaman(M230105004 Project)
No ratings yet
Nasimuzzaman(M230105004 Project)
32 pages
Batch 17
No ratings yet
Batch 17
27 pages
Chavan 2015
No ratings yet
Chavan 2015
5 pages
Abstract
No ratings yet
Abstract
10 pages
Cyberbullying
No ratings yet
Cyberbullying
18 pages
2022-V14I4075
No ratings yet
2022-V14I4075
9 pages
CBDPPT
No ratings yet
CBDPPT
25 pages
2022 Using ML and Deep Learning
No ratings yet
2022 Using ML and Deep Learning
13 pages
2019 Using Deep Neural Network
No ratings yet
2019 Using Deep Neural Network
4 pages
Ijarcce 2021 101272
No ratings yet
Ijarcce 2021 101272
4 pages
CyberbullyingDetection - Documentation
No ratings yet
CyberbullyingDetection - Documentation
12 pages
Detection of Cyberbullying On Social Media (RESEARCH) 123
No ratings yet
Detection of Cyberbullying On Social Media (RESEARCH) 123
8 pages
JES_2_Sandip+Bankar_6_2241
No ratings yet
JES_2_Sandip+Bankar_6_2241
9 pages
Machine Learning Based Cyber Bullying Detection
No ratings yet
Machine Learning Based Cyber Bullying Detection
5 pages
Research Paper3
No ratings yet
Research Paper3
9 pages
Predicting Cyberbullying in Social Media Using Machine Learning
No ratings yet
Predicting Cyberbullying in Social Media Using Machine Learning
7 pages
Cyberbullying Detection & Ai-Based Chatbot For Depression
No ratings yet
Cyberbullying Detection & Ai-Based Chatbot For Depression
18 pages
Batch-9 Paper
No ratings yet
Batch-9 Paper
8 pages
Cyber Harassment
No ratings yet
Cyber Harassment
21 pages
Empowering Online Safety A Machine Learning Approach To Cyberbullying Detection
No ratings yet
Empowering Online Safety A Machine Learning Approach To Cyberbullying Detection
5 pages
Final Project Report
No ratings yet
Final Project Report
31 pages
survey_paper[28]
No ratings yet
survey_paper[28]
8 pages
Cyberbullying Detection Based On Semantic Enhanced Marginalised Denoising Autoencoder - Report
No ratings yet
Cyberbullying Detection Based On Semantic Enhanced Marginalised Denoising Autoencoder - Report
71 pages
Yogeesh
No ratings yet
Yogeesh
9 pages
Cyberbullying IPR
No ratings yet
Cyberbullying IPR
25 pages
CBDA Research Paper
No ratings yet
CBDA Research Paper
29 pages
paper final
No ratings yet
paper final
8 pages
paper 7
No ratings yet
paper 7
13 pages
Detection oof cyber bullying in social media using machine learningppt
No ratings yet
Detection oof cyber bullying in social media using machine learningppt
19 pages
Cyberbullying Detection Using Natural Language Processing
No ratings yet
Cyberbullying Detection Using Natural Language Processing
10 pages
fake news detection ppt
No ratings yet
fake news detection ppt
25 pages
5372-Article Text-9814-1-10-20210513
No ratings yet
5372-Article Text-9814-1-10-20210513
9 pages
CBDA Research Paper
No ratings yet
CBDA Research Paper
19 pages
Topic Analysis Presentation
No ratings yet
Topic Analysis Presentation
23 pages
IRJET-V7I12375-converted
No ratings yet
IRJET-V7I12375-converted
15 pages
Cyber Bullying Text Detection Using Machine Learning
No ratings yet
Cyber Bullying Text Detection Using Machine Learning
7 pages
NLP - J - Final ReviewReport - Cyberbullying
No ratings yet
NLP - J - Final ReviewReport - Cyberbullying
25 pages
Cyberbullying Detection On Twitter Using Machine Learning A Review
No ratings yet
Cyberbullying Detection On Twitter Using Machine Learning A Review
5 pages
Paper 4
No ratings yet
Paper 4
5 pages
Natural Language Processing Project Review-3: Cyber Bullying Detection System Using Sentiment Analysis
No ratings yet
Natural Language Processing Project Review-3: Cyber Bullying Detection System Using Sentiment Analysis
30 pages
Machine Learning-Based Strategies For Detecting Cyberbullying in Online Chats
No ratings yet
Machine Learning-Based Strategies For Detecting Cyberbullying in Online Chats
4 pages
Building A Simple Chatbot From Scratch in Python1
No ratings yet
Building A Simple Chatbot From Scratch in Python1
8 pages
Cyberbullying Detection Using Machine Learning
No ratings yet
Cyberbullying Detection Using Machine Learning
6 pages
Detecting Cyberbullying Text Using The Approaches With Machine Learning Models For The Low-Resource Bengali Language
No ratings yet
Detecting Cyberbullying Text Using The Approaches With Machine Learning Models For The Low-Resource Bengali Language
10 pages
Online Abuse Detection
No ratings yet
Online Abuse Detection
8 pages
Final Poster
No ratings yet
Final Poster
1 page
Cyber Bullying Detection On Social Media Network
No ratings yet
Cyber Bullying Detection On Social Media Network
9 pages
Abs 1
No ratings yet
Abs 1
2 pages
Cyber Bullying
No ratings yet
Cyber Bullying
20 pages
Ml Projrct Article 2
No ratings yet
Ml Projrct Article 2
6 pages
Cyberbulling Detection Using ML Updated
No ratings yet
Cyberbulling Detection Using ML Updated
13 pages
Cyberbullying IEEE
No ratings yet
Cyberbullying IEEE
16 pages
icest Journal Paper
No ratings yet
icest Journal Paper
12 pages
Blood Bank Management System
No ratings yet
Blood Bank Management System
20 pages
Advanced Cyberbullying Detection a Hybrid Model Integrated With Nave Bayes
No ratings yet
Advanced Cyberbullying Detection a Hybrid Model Integrated With Nave Bayes
5 pages
A Comprehensive Review On Cyberbullying Prevention
No ratings yet
A Comprehensive Review On Cyberbullying Prevention
7 pages
Cyber Bullying Detection Using Machine Learning
No ratings yet
Cyber Bullying Detection Using Machine Learning
4 pages
Absolute Beginner's Python Programming: The Illustrated Guide to Learning Computer Programming
From Everand
Absolute Beginner's Python Programming: The Illustrated Guide to Learning Computer Programming
Kevin Wilson
1/5 (1)
2020 Icde Paper
No ratings yet
2020 Icde Paper
13 pages
Predicting Bus Passenger Flow and Prioritizing Influential Factors Using Multi-Source Data
No ratings yet
Predicting Bus Passenger Flow and Prioritizing Influential Factors Using Multi-Source Data
4 pages
Traffic Sign Board Recognition and Voice Alert System Using Convolutional Neural Network
No ratings yet
Traffic Sign Board Recognition and Voice Alert System Using Convolutional Neural Network
2 pages
Authentication and Key Agreement Based On Anonymous Identity For Peer-To-Peer Cloud
0% (1)
Authentication and Key Agreement Based On Anonymous Identity For Peer-To-Peer Cloud
7 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
【iMaster NCE-Campus Encyclopedia】MAC Address Authentication
No ratings yet
【iMaster NCE-Campus Encyclopedia】MAC Address Authentication
2 pages
Arduino Script Complete
No ratings yet
Arduino Script Complete
14 pages
CONTOH ETHICAL CONSIDERATION Kuliah S2
No ratings yet
CONTOH ETHICAL CONSIDERATION Kuliah S2
7 pages
How To Configure SSH PDF
No ratings yet
How To Configure SSH PDF
6 pages
Instrumentation Engineering Books For Instrument Engineers and Technicians Learning Instrumentation and Control Engineering
No ratings yet
Instrumentation Engineering Books For Instrument Engineers and Technicians Learning Instrumentation and Control Engineering
4 pages
Blocking
No ratings yet
Blocking
2 pages
SKF 22220 EK Specification
No ratings yet
SKF 22220 EK Specification
4 pages
Bluetooth Protocol Stack
No ratings yet
Bluetooth Protocol Stack
3 pages
Web Application Security Standard PDF
No ratings yet
Web Application Security Standard PDF
5 pages
CN3421 Lecture Note 1 - Introduction
No ratings yet
CN3421 Lecture Note 1 - Introduction
20 pages
Ducati 1098S Parts Catalouge (Us Version)
No ratings yet
Ducati 1098S Parts Catalouge (Us Version)
115 pages
CSC 458/2209: Computer Networks, Fall 2019: Department of Computer Science, University of Toronto
No ratings yet
CSC 458/2209: Computer Networks, Fall 2019: Department of Computer Science, University of Toronto
4 pages
Reg 670
No ratings yet
Reg 670
52 pages
KT Use Carefully New
No ratings yet
KT Use Carefully New
52 pages
Same Words But Totally Different Meaning in German PDF
No ratings yet
Same Words But Totally Different Meaning in German PDF
22 pages
Hadiyyisa POS Tagger With Deep Learning
100% (2)
Hadiyyisa POS Tagger With Deep Learning
34 pages
Ak400 English User Manual Obd365 PDF
No ratings yet
Ak400 English User Manual Obd365 PDF
37 pages
Automated Cast Quality Inspection Using Deep Learning: Karthik J, Sk. Sabeer Ahmed, Meghana K & Satheesh Kumar Reddy P
No ratings yet
Automated Cast Quality Inspection Using Deep Learning: Karthik J, Sk. Sabeer Ahmed, Meghana K & Satheesh Kumar Reddy P
8 pages
Super Market Billing System: Project Progress Report-2
No ratings yet
Super Market Billing System: Project Progress Report-2
6 pages
The AI "Crisis" and A (Re) Turn To Pedagogy
No ratings yet
The AI "Crisis" and A (Re) Turn To Pedagogy
5 pages
Open Hub-Extraction of Data FRM SAP To Non SAP
No ratings yet
Open Hub-Extraction of Data FRM SAP To Non SAP
10 pages
4_5879513609192609366
No ratings yet
4_5879513609192609366
37 pages
(Ebook) Business-Friendly DSLs (MEAP Edition / Manning Early Access Program Version 9) by Meinte Boersma ISBN 9781617296475, 1617296473 - Download the full set of chapters carefully compiled
100% (2)
(Ebook) Business-Friendly DSLs (MEAP Edition / Manning Early Access Program Version 9) by Meinte Boersma ISBN 9781617296475, 1617296473 - Download the full set of chapters carefully compiled
81 pages
Final Exam
No ratings yet
Final Exam
2 pages
Experiment 3 HTML CSS JS (1)
No ratings yet
Experiment 3 HTML CSS JS (1)
7 pages
Paxata Unilog Case Study PDF
No ratings yet
Paxata Unilog Case Study PDF
11 pages