0% found this document useful (0 votes)

16 views

Tarp Da 3

Uploaded by

Anurag Karki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Tarp Da 3

Uploaded by

Anurag Karki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Technical Answers for Real World Problems (TARP)

CSE1901
Assignment 3
Fake Social Media Account Detection using ML
In partial fulfilment for the award of the degree of B. Tech in Computer Science and
Engineering

Submitted by:
Pramit Karki (20BCE2896)
Anurag Karki (20BCE2907)
Shreya Karki (20BCE2899)

Under the Guidance of

Prof. Suresh A.

Fall Semester 2023-24

Fake Social Media Account Detection Using ML
#
Scope – Vellore Institute of Technology

1. Proposed Methodology:
a) Data Collection:
Current social media Data sets which are collected using Instagram Scrapper and
Kaggle.
b) Data Preprocessing:
Process of cleaning up the unnecessary data, filling in the empty rows and columns,
and fixing or eliminating inaccurate, incomplete, or duplicate data from a dataset.
c) Feature Extraction:
Extraction of relevant features from the pre-processed data.
d) Training
Train our machine learning algorithms on the extracted features
e) Feature Importance Analysis:
Analyse the importance of each feature in the ML model.
f) Fine-Tuning:
Model is experimented with different hyper parameters to improve the model's
performance
g) Threshold Selection:
Develop adaptive thresholding strategies informed by anomaly detection and real-
time monitoring.
h) Model Deployment:
Deploy the trained model to detect fake accounts in real-time or batch processing.
i) Model Evaluation:
Performance is evaluated on a held-out test set.
j) Monitoring and Maintenance:
Create a continuous learning system that updates with fresh data and detects model
drift.
Data Collection

Social Media
Dataset

Data Cleaning Data

Imputations Pre-Processing
Oversampling Data
Integration
Data Normalization

Reduction

Feature Extraction

ADA – Boost
SVM Training
(Training)

ADA – Boost
Datasets Evaluation
SVM
(Test)
(Classification)

Fake Accounts Actual Accounts

(Bots) (Real users)

Fig. 1: Block Diagram

1. Data Collection
As machine learning grows more prevalent, it is more crucial to collect massive
volumes of data and identify it. The core process in the machine-learning pipeline is
gathering the data needed to train the ML model. The accuracy of the predictions made
by ML systems is only as good as the training datasets. Data is entered into the system
during processing. The effectiveness and accuracy of the algorithm will depend on its
quality and correctness. As a result, the datasets determine the output. So, we gather a
large dataset of Instagram profiles, including both real and fake accounts. Web
scraping techniques or API access can be used to collect this data.

2. Data preprocessing:
In this step we handle missing data and outliers, Normalize or scale the feature values
as necessary and Split the dataset into training and testing sets to evaluate the model's
performance.
We are only interested in data normalization as our data is already clean and doesn’t
have missing values.
• Data normalization: The amount of processing and memory required for
training iterations depends on the size of the dataset. Normalization reduces
the order and magnitude of the data, hence reducing the size.

3. Feature extraction:
The next step is to extract relevant features from the pre-processed data. This could
involve using techniques like dimensionality reduction or feature selection to identify
the most informative features. These features could include: Profile picture analysis
(e.g., image quality, face detection), Number of followers and followings, Engagement
metrics (likes, comments, posts), Account age, Bio and caption text analysis, Frequency
and timing of posts, User activity (e.g., frequency of logins),Hashtag usage etc.
4. Training:
The next step is to train our machine learning algorithms on the extracted features. For
instance, the AdaBoost classifier is a sequence of weak classifiers, and each weak
classifier is trained on a subset of the data. The weights assigned to misclassified
samples are adjusted at each iteration to improve the performance of the next weak
classifier.
5. Feature Importance Analysis:
The next step is to analyse the importance of each feature in the AdaBoost model. This
can help us understand which features are most informative for detecting fake accounts.

6. Fine-Tuning:

Then the model is experimented with different hyper parameters of the AdaBoost
algorithm to improve the model's performance. This might include adjusting the number
of weak classifiers (base estimators) or learning rate.

7. Threshold Selection:

Then an appropriate threshold is determined for classifying profiles as fake or real. Like
we can balance precision and recall.

8. Model Deployment:

The next step is to deploy the trained AdaBoost model to detect fake Instagram accounts
in real-time or batch processing. Implement mechanisms for periodic model updates to
adapt to changing fake account patterns.

9. Evaluation:
After deploying the model , the next step is to evaluate its performance on a held-out
test set. AdaBoost model is evaluated using appropriate metrics such as accuracy,
precision, recall, F1-score, and ROC AUC on the testing dataset.

10. Monitoring and Maintenance:

Then the model's performance is continuously monitored and adapted to emerging fake
account strategies. The training dataset is regularly updated to include new examples of
fake and real accounts
Overall, this architecture provides a framework to detect fake Instagram accounts. By
comparing multiple classifiers, we can use the model with the best performance and
identify fake accounts more accurately.

The algorithm which we would be using to train the model are briefly explained below:

AdaBoost
Adaptive Boosting (AdaBoost) is a popular ensemble learning algorithm that can be
used for classification tasks. It works by combining multiple weak classifiers to form a
strong classifier. Each weak classifier is trained on a subset of the data and assigns
weights to misclassified samples to improve the performance of the next weak
classifier. Bootstrapping is not followed by Ada-boost. Classifiers with more accuracy
is assigned with high weight to get the final output.

where,
ht(x) = week classifier’s output t for x
αt = weight alloted to classifier.
αt can be calculated as: αt= 1/2 * ln ((1 - E)/E).
αt is based on the error rate E. Each training data has equal weights initially.
The component classifiers gain from boosting when applying boosting technique to
robust classifiers.
Platform Used:

(a) Hardware:
The evaluation of the suggested model is performed on a Laptop PC. It has following
hardware configuration:
1) an Intel Core i5-10300H processor.
2) a processor with 8 cores and a clock speed of 2.50GHz.
3) 16 GB of physical memory.
4) a graphics card with GTX1650Ti.

(b) Software:
Programming Language:
Python: Python is a high-level programming language widely used for data analysis,
machine learning, web development, and more.
Libraries:
1) Scikit-learn: Scikit-learn is a powerful Python library for machine learning that
provides tools for classification, regression, clustering, and more.
2) Numpy: Numpy is a fundamental Python library for scientific computing that
enables numerical operations with multi-dimensional arrays and matrices
3) Matplotplit: Matplotlib is a Python library for data visualization that allows
creating charts, graphs, and other graphical representations of data.
4) Instaloader: Instaloader is a Python package that enables downloading pictures,
videos, and other media from Instagram.
IDE:
1) Jupyter Notebook: Jupyter Notebook is an interactive web-based environment
that allows writing, executing, and sharing code in various programming
languages, including Python.
2) VsCode: VSCode is a free source code editor developed by Microsoft that
supports many programming languages, debugging, Git integration, and more.

Fresco Code Python Application Programming
90% (20)
Fresco Code Python Application Programming
7 pages
Fake Account Detection
100% (1)
Fake Account Detection
34 pages
Digital Design and Computer Architecture: ARM® Edition: Sarah L. Harris and David Money Harris
No ratings yet
Digital Design and Computer Architecture: ARM® Edition: Sarah L. Harris and David Money Harris
28 pages
Credit Fraud
0% (1)
Credit Fraud
67 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Dr. M.K.K Arya Model School IT Practical Information Technology (802) Class XI
No ratings yet
Dr. M.K.K Arya Model School IT Practical Information Technology (802) Class XI
17 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
TARP Final Poster
No ratings yet
TARP Final Poster
1 page
AI Insta Fake Proj Report
No ratings yet
AI Insta Fake Proj Report
27 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Front Pages
No ratings yet
Front Pages
30 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
From Everand
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
Blaine Bateman
No ratings yet
XGBoost Classifier Based Improved Spammer Discovery and Forged-1
No ratings yet
XGBoost Classifier Based Improved Spammer Discovery and Forged-1
13 pages
HackShield (SIH Final)
No ratings yet
HackShield (SIH Final)
15 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Social Media Fake Account Prediction Report
No ratings yet
Social Media Fake Account Prediction Report
21 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
Detecting Fake Account On Social Media Using Machine Learning Algorithms
No ratings yet
Detecting Fake Account On Social Media Using Machine Learning Algorithms
7 pages
650778797 Fake Account Detection
No ratings yet
650778797 Fake Account Detection
33 pages
review-2nd (4)
No ratings yet
review-2nd (4)
24 pages
Fraud Detection in E-Commerce Using Natural Language Processing
No ratings yet
Fraud Detection in E-Commerce Using Natural Language Processing
43 pages
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
Digital Transformation in Banking
No ratings yet
Digital Transformation in Banking
4 pages
Loan Approval Prediction2
No ratings yet
Loan Approval Prediction2
72 pages
Social_Media_Fake_Account_Detection_Report_20Pages
No ratings yet
Social_Media_Fake_Account_Detection_Report_20Pages
8 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Advanced Techniques in Machine Learning and Optimization (3)
No ratings yet
Advanced Techniques in Machine Learning and Optimization (3)
8 pages
Aifb Lab Manual Exp 6 - Aids
No ratings yet
Aifb Lab Manual Exp 6 - Aids
3 pages
SIH LogicPro
No ratings yet
SIH LogicPro
4 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Data Validation in ML
No ratings yet
Data Validation in ML
3 pages
Fake Accounts Detection On Social Media (Instagram and Twitter)
No ratings yet
Fake Accounts Detection On Social Media (Instagram and Twitter)
8 pages
Ardhra 3rd Presentation
No ratings yet
Ardhra 3rd Presentation
22 pages
Fake Profile Identification - Abstract
No ratings yet
Fake Profile Identification - Abstract
3 pages
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
From Everand
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
DAVID MACKAY
No ratings yet
Social_Media_Fake_Account_Detection_Full_Report
No ratings yet
Social_Media_Fake_Account_Detection_Full_Report
3 pages
Batch-21
No ratings yet
Batch-21
20 pages
Artigo_Fraud-Creditcard
No ratings yet
Artigo_Fraud-Creditcard
14 pages
Untitled
100% (2)
Untitled
66 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
ML Final
No ratings yet
ML Final
34 pages
Tarp Da 2 20bce2907
No ratings yet
Tarp Da 2 20bce2907
6 pages
Script
No ratings yet
Script
12 pages
FinalPresentation
No ratings yet
FinalPresentation
12 pages
ML Iat 1
No ratings yet
ML Iat 1
23 pages
4-2 Final Project
No ratings yet
4-2 Final Project
78 pages
BDA assignment 1
No ratings yet
BDA assignment 1
8 pages
Sat - 25.Pdf - Discernment of Autonomous Profiles On Social Networking Services (SNS)
No ratings yet
Sat - 25.Pdf - Discernment of Autonomous Profiles On Social Networking Services (SNS)
11 pages
Report
No ratings yet
Report
14 pages
AI Engineer Interview Prep Guide
No ratings yet
AI Engineer Interview Prep Guide
16 pages
2022 - Keshav - 2022 - Keshav - A Novel Machine Instagram
No ratings yet
2022 - Keshav - 2022 - Keshav - A Novel Machine Instagram
12 pages
Coursework Assessment MFKhan v1.4
No ratings yet
Coursework Assessment MFKhan v1.4
9 pages
Fake Job Entry Detectionnn
No ratings yet
Fake Job Entry Detectionnn
25 pages
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
No ratings yet
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
1 page
AutoML Tools
No ratings yet
AutoML Tools
2 pages
Fake News Detection: Muhammad Hassan Ur Rehman Sufyan Ahmed Huzaifa Shuja Taber Bin Zameer
No ratings yet
Fake News Detection: Muhammad Hassan Ur Rehman Sufyan Ahmed Huzaifa Shuja Taber Bin Zameer
21 pages
Fraud Detection in Banking Data Using Machine Learning
No ratings yet
Fraud Detection in Banking Data Using Machine Learning
17 pages
Service Report Innova GE 2000 RSU Halim
No ratings yet
Service Report Innova GE 2000 RSU Halim
2 pages
Covid 19
No ratings yet
Covid 19
6 pages
2) Memory and Storage Devices: Multimedia Hardware
No ratings yet
2) Memory and Storage Devices: Multimedia Hardware
4 pages
Streaming Scan Network
100% (3)
Streaming Scan Network
51 pages
GraphSQL - Sharing Data in A Microservices Architecture Using GraphQL
No ratings yet
GraphSQL - Sharing Data in A Microservices Architecture Using GraphQL
9 pages
Omega Manual French
No ratings yet
Omega Manual French
23 pages
Flow Meter
No ratings yet
Flow Meter
12 pages
STM32 HTTP Camera
No ratings yet
STM32 HTTP Camera
8 pages
Lecture # 01-1
No ratings yet
Lecture # 01-1
28 pages
Staad - Pro-V8i (Advanced) - Video Training
No ratings yet
Staad - Pro-V8i (Advanced) - Video Training
4 pages
Atollic TrueSTUDIO Installation Guide
No ratings yet
Atollic TrueSTUDIO Installation Guide
35 pages
Lab 1 Solution
No ratings yet
Lab 1 Solution
8 pages
Regular Expressions in QTP
No ratings yet
Regular Expressions in QTP
15 pages
Raluca Mares: Personal Details
No ratings yet
Raluca Mares: Personal Details
4 pages
Asg 2 PDF - Final Compressed
No ratings yet
Asg 2 PDF - Final Compressed
87 pages
Spec HKM0127 - Orbit Star N1
No ratings yet
Spec HKM0127 - Orbit Star N1
4 pages
ASIC and FPGA Verification A Guide to Component Modeling First Edition Richard Munden - Download the full set of chapters carefully compiled
100% (2)
ASIC and FPGA Verification A Guide to Component Modeling First Edition Richard Munden - Download the full set of chapters carefully compiled
44 pages
Digital Life Certificate
No ratings yet
Digital Life Certificate
25 pages
DOC-20241219-WA0001.
No ratings yet
DOC-20241219-WA0001.
8 pages
ECE Newsletter Content July-2021-202
No ratings yet
ECE Newsletter Content July-2021-202
13 pages
Datasheet ACS2000 - FRAUSHER
No ratings yet
Datasheet ACS2000 - FRAUSHER
2 pages
Laboratory Manual: Web Engineering
No ratings yet
Laboratory Manual: Web Engineering
32 pages
Python - Objective 02 Learn How To Use Selection Workbook
No ratings yet
Python - Objective 02 Learn How To Use Selection Workbook
18 pages
T100 Upgrade Guide
No ratings yet
T100 Upgrade Guide
10 pages
RAC SRVCTL Comands
No ratings yet
RAC SRVCTL Comands
8 pages
Starting and Stopping The SAP System
No ratings yet
Starting and Stopping The SAP System
5 pages
Edt 180 Reflection Paper
No ratings yet
Edt 180 Reflection Paper
3 pages

Tarp Da 3

Uploaded by

Tarp Da 3

Uploaded by

Technical Answers for Real World Problems (TARP)

Under the Guidance of

Fall Semester 2023-24

Data Cleaning Data

Fake Accounts Actual Accounts

Fig. 1: Block Diagram

10. Monitoring and Maintenance:

You might also like