0% found this document useful (0 votes)
21 views

Empowering Online Safety A Machine Learning Approach To Cyberbullying Detection

Uploaded by

tamilset
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Empowering Online Safety A Machine Learning Approach To Cyberbullying Detection

Uploaded by

tamilset
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT-2024)

IEEE Xplore Part Number: CFP24CV1-ART; ISBN: 979-8-3503-2753-3


2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT) | 979-8-3503-2753-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/IDCIOT59759.2024.10467617

Empowering Online Safety: A Machine


Learning Approach to Cyberbullying
Detection
B.V. Chowdary Mavoori Akhil Komirishetty Pavan
Associate Professor UG Scholar UG Scholar
Dept of IT Dept of IT Dept of IT
Vignan Institute of Technology and Science(A) Vignan Institute of Technology and Science(A) Vignan Institute of Technology and science(A)
Hyderabad Hyderabad Hyderabad
[email protected] [email protected] [email protected]

B. Pavana Teja Reddy V.S. Gunjan


UG Scholar UG Scholar
Dept of IT Dept of IT
Vignan Institute of Technology and Science(A) Vignan Institute of Technology and Science(A)
Hyderabad Hyderabad
[email protected] [email protected]

ABSTRACT— With the growth of the Internet, social


media use has increased significantly as time passed, hurt their reputation and hurt their feelings. Cyberbullying
making it the most significant network platform of the has emerged as a significant social media concern in
twenty-first century. Where, increasing social recent times. Cyber-harassment, often known as
networks frequently has detrimental effects on society cyberbullying, is an electronic
and fuels a few undesirable phenomena like
cyberbullying, cyber abuse, cybercrime, and online II. OBJECTIVE
trolling. Particularly for women and children,
cyberbullying frequently causes severe mental and
physical pain. In some cases, it even compels the victim In this regard, a model built upon machine learning
to try suicide. Because of its severe detrimental effects towards cyberbullying identification is introduced to
on society, online harassment garners attention. determine whether a news article is related to bullying or
Recently, there have been numerous incidents of not. Several machine learning methods have been
online Bullying—including discovering private chat, examined for the proposed cyberbullying detection
giving rumors, and making sex remarks—all across model, such as Naive Bayes, Support Vector Machine,
the world. As a result, there has been an increase in the Decision Trees, and Random Forest. Datasets
recognition of bullying texts or messages on social containing posts and comments from Facebook and
media. Twitter were utilized in our research. The study utilizes
Index Terms— Cyber abuse, social media, online two distinct featured vectors, BoW and TF-IDF, for
harassment, Cyberbullying Texts. performance analysis. Results show that Random
Forest outperforms every other machine-learning
I.INTRODUCTION technique, but the TF-IDF feature leads BoW in terms
of accuracy. By developing a prototype that can
The Internet is an environment that allows users to engage automatically identify abusive conduct on social media
with society and submit everything, including lengthy platforms and cyberbullying, the research study aims to
documents, films, and images [1]. People use their laptops reduce digital bullying and assertiveness.
or cell phones for access to online communities.
Facebook, Twitter, Instagram, TikTok, and Facebook are 2.1 Existing System
the most popular social-media platforms. Social media is
used these days for a variety of purposes, including In America, nearly fifty percent of all teens have been
business, education, and charitable endeavors [2, 3, 4]. survivors of cyberbullying. The victim of harassment
Additionally, social media boosts the global economy by suffers from psychological and physical effects. The
generating a large number of new work possibilities [5]. trauma of cyberbullying is difficult to endure, and thus the
Social media has many Pros, but it also has certain Cons. victims decide to commit suicide or other self-destructive
Malevolent users use online platforms to carry out behaviors. Therefore, it's critical to recognize and stop
immoral and dishonest deeds that harm other people. cyberbullying in order to safeguard youngsters. Decision
tree techniques are used in the current machine learning
application for cyberbullying detection, although this

979-8-3503-2753-3/24/$31.00 ©2024 IEEE 1187


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on April 05,2024 at 05:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT-2024)
IEEE Xplore Part Number: CFP24CV1-ART; ISBN: 979-8-3503-2753-3

strategy is not particularly effective at categorizing proposed XBully, an innovative cyberbullying


messages including online bullies. identification system. XBully reformatted multimodal
social media data into a heterogeneous network, enabling
2.2 Proposed System the integration of diverse attributes and correlations.
Recognizing the evolving nature of cyberbullying, Vuong
The framework to identify cyberbullying is explained in et al. [17] devised a multimodal recognition system
this section, with primary components, as seen in Figure integrating images, videos, comments, and social network
1. Natural language processing, as well or NLP for short, activity. Their approach utilized top-to-bottom attention
is the first section, in addition, machine learning, also networks to capture session features and multimedia info
referred to as ML, is the second. The initial stage involves effectively.
gathering and utilizing natural language processing to
build datasets that include bully words, messages, etc Neural networks have gained popularity in online
announcements for the machine learning techniques. harassment identification, with researchers exploring
After the datasets have been examined, machine learning combinations of long-term and minimum memory layers.
algorithms are trained to identify any harassment or a novel neural network model tailored for text media
Cyberbullying interactions on online platforms like cyberbullying detection. Their architecture incorporated
YouTube and Twitter. Techniques • Processing Natural short-term memory layers, convolutional layers, and
Language: The content or posts from the actual world stacked core layers, improving network efficiency.
include a variety of extraneous characters. For instance, Additionally, they introduced a unique activation method
grammar or numerals have no bearing on whether called "Support Vector Machine Activation," enhancing
bullying is detected. The remarks need to be fixed before the system's performance.
the machine techniques for learning are applied.
In summary, ongoing research in cyber-harassment
III. LITERATURE SURVEY detection leverages diverse machine learning techniques,
Researchers have made significant strides in the field of including supervised algorithms, deep neural networks,
cyber harassment detection using machine learning and multimodal approaches, to combat the multifaceted
techniques. One such approach involved a supervised nature of online harassment. These efforts underscore the
machine learning algorithm that employed a word-by- importance of continuous innovation to address the
word method to analyze sentimental and context feature challenges posed by cyberbullying effectively.
of judgments [9]. While initial attempts often resulted in
low accuracy, advancements were made by the IV. ARCHITECTURE AND METHODOLOGY
Massachusetts Institute of Technology through the
Ruminati project, which utilized support-vector tools to A. System Architecture
identify bullies in Facebook comments. This approach
incorporated social parameters and achieved an accuracy It refers to the high-level design of a computer-based
of 66% [10]. system. It defines the components or modules that
constitute the system, their relationships, and how they
Another noteworthy method was introduced by Reynolds interact to achieve the intended functionality. A system
et al. [11], who propose a bullying detection technique on architecture description typically includes the
proximity modeling. This approach utilized decision trees following components:
and instance-based trainers, achieving an impressive  Components: These are the building blocks of
accuracy of 78.5%. To enhance cyberbullying detection, the system. Components can be hardware
researchers explored the use of personality, emotions, and elements like servers, computers, or devices, as
sentiments as additional features [12]. well as software elements like modules,
libraries, and databases.
Deep learning models have also been deployed to combat  Modules: Components are often divided into
cyberbullying. One such model utilized a deep neural smaller functional units called modules.
network to analyze real-world data, employing transfer Modules encapsulate specific features or
learning to enhance the detection process [13]. Baladitya operations within the system. They can be
et al. [14] introduced a deep neural network architecture
designed to handle specific tasks, ensuring
specifically designed to identify dislike speeches.
modularity and ease of maintenance.
Additionally, a conventual neural network-based model
was developed to detect bullying text, incorporating word  Interfaces: Interfaces define how different
embeddings to capture semantic similarities [15]. modules interact with each other. protocols, and
data formats used for communication. Well-
In the realm of multimodal data, researchers faced the defined interfaces are essential for seamless
challenge of complex correlations between various social
media elements. To address this, Cheng et al. [16]

979-8-3503-2753-3/24/$31.00 ©2024 IEEE 1188


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on April 05,2024 at 05:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT-2024)
IEEE Xplore Part Number: CFP24CV1-ART; ISBN: 979-8-3503-2753-3

integration and interoperability between system noise. These irregularities must be addressed to create a
elements. dataset suitable for machine learning algorithms. In our
 Data Storage: System architecture describes case, we focused on obtaining relevant data metrics
how data is stored, managed, and accessed. It related to profanity in daily online comments to train our
includes databases, file systems, and data models effectively. The initial dataset was in XML
format, which we converted to the standard CSV format
structures. Data storage mechanisms are crucial
commonly used for machine learning purposes. During
for ensuring data integrity, security, and
preprocessing, we handled missing values, removed
efficient retrieval. noise, and addressed inconsistencies in the data.
 Scalability and Performance: System Additionally, we ensured that variables were
architecture addresses how the system can appropriately scaled and transformed to prevent any
handle increased loads and demands. Scalability single variable from dominating the model's predictions.
features ensure that the system can expand its These meticulous data preparation steps were crucial to
capabilities as the user base or data volume creating a clean and reliable dataset, providing a solid
grows. foundation for our regression modeling efforts.
 Deployment: System architecture outlines 3) Training Phase: For training the model, first we
how the system is deployed in various import a specific algorithm class/module and create an
environments. It includes considerations for instance of it. Then using that instance, we fit the model
physical deployment (such as server to the training data. Then we validate it by testing its
locations), cloud-based deployment, and accuracy score and tuning its parameters till we get the
virtualization strategies required results.
4) Testing Phase: For testing the model, we compare its
predicted values after the training phase with test data.
Then input some different values for prediction and check
whether it predicts it right. If it didn’t predict right then,
fine-tune the algorithmic parameters and fit the model
again.

V IMPLEMENTATION

A. PyCharm IDE
The widely used Integrated Development Environment
(IDE) PyCharm was created especially for Python
development. PyCharm, created by JetBrains, provides a
robust and user-friendly platform tailored to meet the
needs of Python developers. It provides a comprehensive
Fig. 1. System Architecture set of features that enhance productivity, code quality,
B. Modules andcollaboration.
The development of the study is based on the The IDE gives advanced code error, smart suggestions,
Dataset considered and effective tuning of allowing developers to write code faster and with fewer
mistakes. Its powerful refactoring tools simplify the
parameters of Machine Learning Algorithms. The
process of restructuring code, making it easier to
system consists of basically 4 phases: maintain and improve the quality of existing projects.
1) Data Gathering PyCharm also includes a built- in visual debugger that
2) Data processing assists in identifying and fixing bugs efficiently.
3) Training Phase PyCharm excels in supporting various, Flask, and
4) Testing Phase Pyramid. It offers dedicated project templates, integrated
tools for database management, and seamless integration
1) Data Gathering: The dataset represented here is a with popular version control systems like Git. The IDE's
collection of tweets that were collected using Twitter API. web development capabilities streamline the creation of
The number of data entries exceeded 1000 tweets which dynamic web applications and ensure smooth
belong to different periods. The following images depict collaboration among
the datasets indicating Text Labels.
2) Data Processing: Preparing raw data for regression
modeling is a critical step, as the data obtained from online
sources are often inconsistent, incomplete, or contain

979-8-3503-2753-3/24/$31.00 ©2024 IEEE 1189


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on April 05,2024 at 05:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT-2024)
IEEE Xplore Part Number: CFP24CV1-ART; ISBN: 979-8-3503-2753-3

team members. So that user can register with the unique information
Additionally, PyCharm promotes efficient testing with
its integrated test runner and comprehensive testing
tools. It facilitates running unit tests, and behavioral tests
and even provides support for popular testing
frameworks like pytest. The version control features
enable seamless collaboration by allowing developers to
manage and merge code changes.
Furthermore, PyCharm enhances the development
process with its powerful tools for data science and
scientific computing. Supports the pandas, and
mathplotLib enables data analysis and visualization
Fig..3. Registration Status
within the IDE. PyCharm's user-friendly interface and
integration capabilities make it a preferred choice for
Python developers, whether they are working on web Fig. 4. Displays the posted information of the
applications, data science projects, or any other Python-
members of the website and their friends
based software development.
B. Python
The Python programming language is interpreted as high-
level, dynamic, cross-platform, and open source. Python's
'philosophy' prioritizes readability, clarity, and simplicity
while optimizing the programmer's power and
expressiveness. When a Python programmer writes
elegant code, rather than just intelligent code, it is the
greatest compliment. For these reasons, Python makes an
excellent 'first language' but may also be a very potent tool
in the hands of a seasoned and ruthless coder. Python is
an incredibly versatile language. It is extensively utilized
for a variety of objectives. Common applications include:
• Writing web applications using frameworks like Django, Fig.4. Post Page
Zope, and TurboGears; Using basic scripts for systems
Using GUI toolkits such as Tkinter or wxPython (and
more recently, Windows Forms and Iron Python) to create Fig.5. It displays the profile of the user where he can
desktop applications; developing Windows apps; update and post information

VI.RESULTS AND OUTPUT


The following screenshots are the results of the
Cyberbullying Detection on social media developed by us
Fig. 2. It is the login page of our application which is the
user login page

Fig..5. Profile Page

VII.CONCLUSION
The cyberbullying detection study stands as a
Fig.2. Login Status pivotal initiative in promoting online safety and
Fig. 3. It is the registration Page of our application fostering a positive digital atmosphere. this study
addresses the pressing issue of cyberbullying across
diverse online platforms. The implementation of

979-8-3503-2753-3/24/$31.00 ©2024 IEEE 1190


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on April 05,2024 at 05:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT-2024)
IEEE Xplore Part Number: CFP24CV1-ART; ISBN: 979-8-3503-2753-3

robust algorithms not only facilitate early Poels, K. Van Cleemput, A. Desmet, and I. DeBourdeaudhuij
intervention and mental health support for victims [7] Hoff, D. L., and Mitchell, S. N., "Cyberbullying: Causes,
but also encourages responsible online behavior, Effects, and Remedies," Journal of Educational Administration,
2009.
making significant strides toward creating secure
[8] S. Hinduja and J. W. Patchin, "Bullying, Cyberbullying,
online spaces. Despite the challenges, including
and Suicide," Archives of Suicide Research, vol. 14, no. 3, 2010.
privacy concerns and algorithmic biases, the
[9] V. Balakrishnan, S. Khan, and H. R. Arabnia, “Improving
potential for impact is immense. As technologies
cyberbullying detection using twitter users’ psychological
evolve, it is imperative to refine these systems features and machine learning,” Computers & Security, vol. 90, p.
continually, ensuring they strike the right balance 101710,2020.
between safeguarding users and preserving freedom [10] S. Agrawal and A. Awekar, “Deep learning for detecting
of expression. The study not only contributes to cyberbullying across multiple social media platforms,” in
immediate online safety but also serves a foundation European Conference on Information Retrieval. Springer, 2018,
pp. 141–153.
for ongoing research, paving an empathetic
[11] M. A. Al-Ajlan and M. Ykhlef, “Deep learning algorithm
respectful digital landscape where individuals can
for cyberbullying detection,” International Journal of Advanced
engage, learn, and express themselves without the Computer Science and Applications, vol. 9, no. 9, 2018.
fear of cyberbullying. [12] K. Wang, Q. Xiong, C. Wu, M. Gao, and Y. Yu, “Multi-
modal cyberbullying detection on social networks,” in 2020
International Joint Conference on Neural Networks (IJCNN).
ACKNOWLEDGEMENT IEEE, 2020, pp. 1–8
First of all, we would like to extend our deepest [13] T. A. Buan and R. Ramachandra, “Automated
appreciation to Mr. B.V. Chowdary, Associate cyberbullying detection in social media using an svm activated
Professor, who served as our project’s mentor. Next, stacked convolution lstm network,” in Proceedings of the 2020
we would like to express our heartfelt gratitude to the 4th International Conference on Compute and Data Analysis,
2020, pp. 170–174
Vignan Institute of Technology and Science,
[14] E. Raisi and B. Huang, “Weakly supervised cyberbullying
Hyderabad, and especially the Department of
detection using co-trained ensembles of embedding models,” in
Information Technology for providing our team with 2018 IEEE/ACM International Conference on Advances in
all the tools resources, help, and direction required to Social Networks Analysis and Mining (ASONAM). IEEE, 2018,
finish this research work. pp. 479–486.
[15] M. A. Al-garadi, K. D. Varathan, and S. D. Ravana,
REFERENCE “Cybercrime detection in online communications: The
experimental case of cyberbullying detection in the twitter
[1] Fuchs, social media: An analytical overview. Sage (2017) network,” Computers in Human Behavior, vol. 63, pp. 433– 443,
[2] N. Selwyn, "Social media in higher education," Erasmus 2016.
World of Learning, Vol. 1, No. 3, 2012, pp.1–10. [16] D. Perito, C. Castelluccia, M. A. Kaafar, and P. Manila,
[3] Antecedents of social media business-to- business use in “How unique and traceable are usernames?” in Proc. 11th Int.
an industrial marketing context: clients' perspective, H. Karafuto, Conf. Privacy Enhancing Technology., 2011, pp. 1–17
P. Ulkuniwemi, H. Keinanenq, and O. Kuivalainen, Journal of
Business& Industrial Marketing, 2015.
[4] W. Akram and R. Kumar, "A study on the positive and
negative effects of social media on society," International Journal
of Computer Sciences and Engineering, vol. 5, no. 10, pp. 351-
354, 2017.
[5] The digital marketplace, by D. Tapscott et al. 2015 saw
McGraw-Hill Education.
[6] Cyberbullying on social network sites: a pilot investigation
by S. Bastiaensens, H. Vandebosch, K.

979-8-3503-2753-3/24/$31.00 ©2024 IEEE 1191


Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on April 05,2024 at 05:55:21 UTC from IEEE Xplore. Restrictions apply.

You might also like