Twitter Bot Detection (K2203674)
Twitter Bot Detection (K2203674)
DETECTION USING
MACHINE LEARNING
Unveiling the automation in the Twittersphere
Introduction
Social Media's Integral Role:
Social media platforms, such as Twitter, have become essential for communication,
information sharing, and public discussions in today's digital era.
Detecting and mitigating these bots is vital for upholding the integrity and
trustworthiness of online conversations.
Presentation Focus:
This thesis explores the fusion of artificial intelligence and social media, specifically
the application of machine learning models to effectively identify and combat Twitter
bots.
The presentation will delve into research methods, findings, and implications,
providing insights into the intriguing world of Twitter bot detection.
Dataset
The "Twitter Human-Bots Dataset" is a comprehensive Kaggle dataset
meticulously crafted for the specific purpose of Twitter bot detection.
This dataset encompasses a substantial volume of data, featuring
37,438 rows and 23 columns, each column offering valuable insights
into the characteristics of Twitter accounts.
Bot column is the imperative column with values 0/1. 0 signifies it is a
human-operated account and 1 stands for the bot accounts.
This dataset was chosen as it was evident after the literature review
that this is one of the most extensive dataset available.
Legal and ethical
considerations
Significant ethical and legal factors must be taken into account
while creating and implementing Twitter bot detection
programmes namely-
• Maintaining user-privacy
• Unbiased models
• Transparent, access to information
• Getting informed consent
• Maintaining user trust and upholding user rights
System flowchart
Plotting metrics
Hyperparameter after Class-weight
Ensemble Learning
tuning hyperparameter balancing
tuning
Evaluation Metrics
In the context of a classification problem, such as
Twitter bot detection, we rely on six key
performance metrics to assess the model's quality
and its ability to make accurate predictions.
• Accuracy
• Precision
• Recall
• F1 score
• AUC-ROC
• Confusion Matrix
Key Findings
Before Tuning:
•Models' performance (accuracy, precision, recall, F1-Score)
was initially acceptable (around 0.85).
•Random Forest and XGBoost outperformed others.
•Logistic Regression had the lowest performance, while
Gaussian Naive Bayes showed high accuracy but low recall.
After Tuning:
•Notable improvements observed post hyperparameter
tuning.
•Decision Tree, Random Forest, and KNN continued to
perform well (accuracy between 0.84 and 0.86).
•Logistic Regression demonstrated some improvement.
•Gaussian Naive Bayes exhibited decreased recall and F1-
Score.
ROC curves
Conclusion
optimizing parameters, and considering trade-offs.
As bots evolve, the future requires advanced
machine learning, real-time systems, ethical
awareness, and collaboration. This research lays the
foundation for future progress, emphasizing
adaptability and innovation as essential elements in
tackling these evolving challenges.
THANKYOU