Detection of Cyberbullying On Social Media Using Machine Learning
Detection of Cyberbullying On Social Media Using Machine Learning
Machine learning
ABSTRACT
EXISTING SYSTEM
The method used was to select profiles for study, acquire information of
tweets, select features to be used from profiles and using ML to find the author
of tweets. 1900 tweets were used belonging to 19 different profiles. It had an
accuracy of 68% for identifying author. Later it was used in a Case Study in a
school in Spain where out of some suspected students for Cyberbullying the
real owner of a profile had to be found and the method worked in the case. The
following method still has some shortcomings.
For example a case where trolling account doesnt have a real account to fool
such systems or experts who can change writing styles and behaviours so that
no patterns are found . For changing writing styles more efficient algorithms
will be needed.
Disadvantages
A vocabulary is not designed from all the documents. The vocabulary may
consist of all words (tokens) in all documents or some top frequency tokens
Tf-Idf method is not similar to the bag of words model since it uses the same
way to create a vocabulary to get its features.
PROPOSED SYSTEM
Stemming: Stemming is the process of converting a word into a root word or stem.
Eg for three words ‘eating’ ‘eats’ ‘eaten’ the stem is ‘eat’. Since all three branch
words of root ‘eat’ represent the same thing it should be recognized as similar.
NLTK offers 4 types of stemmers: Porter Stemmer, Lancaster Stemmer, Snowball
Stemmer and Regexp Stemmer. The following project uses PorterStemmer.
Stop word Removal: Stop words are words that do not add any meaning to a
sentence eg. some stop words for english language are: what, is, at, a etc. These
words are irrelevant and can be removed. NLTK contains a list of english stop words
which can be used to filter out all the tweets. Stop words are often removed from the
text data when we train deep learning and Machine learning models since the
information they provide is irrelevant to the model and helps in improving
performance.
Advantages
Common Bag of Words model takes as input of multiple words and predicts
the word based on the context. Input can be one word or multiple words.
CBOW model takes a mean of context of input words but two semantics can
be clicked for a single word. i.e. two vector of Apple can be predicted. First is
for the firm Apple and next is Apple as a fruit.
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
Front-End : Python.
Back-End : Django-ORM