0% found this document useful (0 votes)

46 views5 pages

Data Analytics and Visualization of TwitterSpam Dataset

The document analyzes a Twitter spam dataset using data visualization. Density plots show that non-spam accounts tend to have fewer tweets while spam accounts have more. Scatterplots visualize the relationship between account age and followers, showing spam accounts cluster together and non-spam accounts are more dispersed. Regression lines are added, revealing older non-spam accounts gain followers over time while spam accounts variation increases with younger ages.

Uploaded by

Gloria Auma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views5 pages

Data Analytics and Visualization of TwitterSpam Dataset

Uploaded by

Gloria Auma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Analytics and Visualization of TwitterSpam Dataset

Introduction:

We will analyze the TwitterSpam dataset using data analytics and visualization in this post. The dataset
includes 'no_tweets,' 'account_age,' 'no_follower,' and a binary'spam' indicator, among other attributes.
Our main goals are to use density plots to compare spam and non-spam tweets, scatterplots to show the
relationship between "account_age" and "no_follower," and regression lines to make the associations
easier to see.

1. Density Plot for 'no_tweets':

To start our analysis, we first load the TwitterSpam dataset into R Studio and use the ggplot function to
create a density plot for the 'no_tweets' column, comparing spam and non-spam tweets.

R Script:

# Load required libraries

library(ggplot2)

# Load the dataset

twitter_spam <- read.csv("TwitterSpam.txt")

colnames(twitter_spam)

# Create density plot

ggplot(twitter_spam, aes(x = no_tweets, fill = label)) +

geom_density(alpha = 0.6) +

labs(title = "Density Plot of no_tweets by Spam Status",

x = "Number of Tweets",

y = "Density",

fill = "Spam Status") +

theme_minimal()

here is the output:

Observation from Density Plot:

The distribution of 'no_tweets' for both spam and non-spam tweets is displayed in the density plot. We
can see that the density of non-spam tweets is higher in the lower range of the 'no_tweets' axis, showing
that non-spam accounts often have fewer tweets. However, the 'no_tweets' axis' greater range shows a
higher density of spam tweets, indicating that spam accounts typically have more tweets than normal
accounts. This difference in distribution enables us to comprehend how spam and non-spam accounts
differ in terms of the quantity of tweets they post.

2. Scatterplots for 'account_age' and 'no_follower':

Next, we will create scatterplots to visualize the relationship between 'account_age' and 'no_follower,'
with spammer and non-spammer accounts represented using different colors.

R Script:

# Create scatterplot account age

ggplot(twitter_spam, aes(x = accout_age, y = no_follower, color = label)) +

geom_point() +

labs(title = "Scatterplot of Account Age vs. Number of Followers",

x = "Account Age",

y = "Number of Followers",

color = "Spam Status") +

theme_minimal()

Observation from Scatterplots:

The scatterplot shows the connection between the columns "account_age" and "no_follower." We can
see that whereas spam accounts tend to cluster together, the points for non-spam accounts are more
dispersed. This suggests that while spam accounts might have identical account ages and follower
patterns, non-spam accounts have a range of account ages and follower counts. We can distinguish
between the prevalence of spam and non-spam accounts in the scatterplot thanks to the use of different
colors.

3. Adding Regression Lines to the Scatterplot:

To gain further insights into the relationships depicted in the scatterplot, we will add regression lines
separately for spammer and non-spammer accounts.

# Add regression lines to scatterplot

ggplot(twitter_spam, aes(x = account_age, y = no_follower, color = label)) +

geom_point() +

geom_smooth(method = "lm", se = FALSE) +

labs(title = "Scatterplot with Regression Lines of Account Age vs. Number of Followers",
x = "Account Age",

y = "Number of Followers",

color = "Spam Status") +

theme_minimal()

Observation from Regression Lines:

The scatterplot's overall trends and correlations can be visualized using the regression lines. The
regression line for non-spam accounts has a positive slope, indicating that as the account's age rises, so
too does the number of followers. This is consistent with the hypothesis that more time may have
passed for older accounts to acquire followers naturally. The regression line also has a positive slope for
spam accounts, suggesting that certain spam accounts may be able to gain a sizable following despite
having a young account. For spam accounts, the dispersion of data points around the regression line is
greater, indicating more variation in follower numbers for newer spam accounts, as can be shown.

Advantages of Adding Regression Lines:

1. Visual Representation of Trends: Regression lines provide a clear visual representation of the overall
trend between two variables. It helps users identify the general direction of the relationship, such as
positive, negative, or no correlation.

2. Identifying Outliers: The regression line allows us to identify potential outliers more easily. Points that
deviate significantly from the regression line might indicate unusual data points that need further
investigation.

3. Quantifying Relationships: The slope of the regression line provides a quantitative measure of the
relationship's strength and direction. It allows us to estimate how much the dependent variable changes
concerning the independent variable.
Conclusion:
We successfully analyzed spam and non-spam tweets using density plots, showed the link between
"account_age" and "no_follower" using scatterplots, and added regression lines to the visualizations to
better understand the trends through data analytics and visualization. The scatterplot and regression
lines shed light on the correlations between 'account_age' and 'no_follower' for both spam and non-
spam accounts, while the density plot emphasized disparities in the quantity of tweets between spam
and non-spam accounts.

Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
From Everand
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
Jurex Gallo
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
HRIS Test
100% (2)
HRIS Test
3 pages
Gartner-Market Guide For Real-Time Visibility Providers
100% (1)
Gartner-Market Guide For Real-Time Visibility Providers
27 pages
Sas Marketing Optimization Factsheet
No ratings yet
Sas Marketing Optimization Factsheet
4 pages
Exploratory Data Analysis in R
No ratings yet
Exploratory Data Analysis in R
40 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
73 pages
Brand Sentiment Unter Example
No ratings yet
Brand Sentiment Unter Example
10 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
R-Unit 5
No ratings yet
R-Unit 5
76 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
The Power of Graphs
From Everand
The Power of Graphs
Pasquale De Marco
No ratings yet
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
KrutikaKolhe-862467252-HW4
No ratings yet
KrutikaKolhe-862467252-HW4
16 pages
Statistics: Practical Concept of Statistics for Data Scientists
From Everand
Statistics: Practical Concept of Statistics for Data Scientists
John Slavio
No ratings yet
Collection of Raspberry Pi Projects
From Everand
Collection of Raspberry Pi Projects
Guillermo Perez Guillen
5/5 (1)
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Data Visualization
No ratings yet
Data Visualization
30 pages
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
Go Viral in 6 Easy Steps
From Everand
Go Viral in 6 Easy Steps
Steve Whitehouse
No ratings yet
Project Report - AZ
No ratings yet
Project Report - AZ
22 pages
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Module 2 Examining Relationship Quiz Assignment
No ratings yet
Module 2 Examining Relationship Quiz Assignment
14 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Instagram Analysis
No ratings yet
Instagram Analysis
13 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
Ask Analytics - Text Mining in R - Part 3
No ratings yet
Ask Analytics - Text Mining in R - Part 3
5 pages
Practical Advanced TypeScript
From Everand
Practical Advanced TypeScript
Bledar Ramo
No ratings yet
Ai - Phase 3
No ratings yet
Ai - Phase 3
9 pages
C# Mastery: A Comprehensive Guide to Advanced C# Features and Applications
From Everand
C# Mastery: A Comprehensive Guide to Advanced C# Features and Applications
Lena Neill
No ratings yet
Big Data Exercieses
No ratings yet
Big Data Exercieses
6 pages
DSR_Unit 2 -3.3 LineGraphs
No ratings yet
DSR_Unit 2 -3.3 LineGraphs
45 pages
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
From Everand
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
Lee Baker
No ratings yet
Mission JavaScript
From Everand
Mission JavaScript
Sheela Preuitt
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Twitter Network Analysis: Sowmya Vivek
No ratings yet
Twitter Network Analysis: Sowmya Vivek
79 pages
Week3 Cheat Sheet Exploratory Data Analysis
No ratings yet
Week3 Cheat Sheet Exploratory Data Analysis
3 pages
35 Database Examples: A Database Reference Book For Anyone
From Everand
35 Database Examples: A Database Reference Book For Anyone
Mark Hayford
5/5 (1)
Advanced Visualisationv1
No ratings yet
Advanced Visualisationv1
22 pages
EDA LAB MANUAL (1) (1)
No ratings yet
EDA LAB MANUAL (1) (1)
34 pages
Java: Advanced Guide to Programming Code with Java
From Everand
Java: Advanced Guide to Programming Code with Java
Charlie Masterson
No ratings yet
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
From Everand
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
Charlie Masterson
No ratings yet
Data Structures and Algorithms with Go: Create efficient solutions and optimize your Go coding skills (English Edition)
From Everand
Data Structures and Algorithms with Go: Create efficient solutions and optimize your Go coding skills (English Edition)
Dušan Stojanović
No ratings yet
Data Visualization: Six Sigma Thinking, #2
From Everand
Data Visualization: Six Sigma Thinking, #2
Sumeet Savant
No ratings yet
MIS_BA_20232024_practical_chapter02
No ratings yet
MIS_BA_20232024_practical_chapter02
2 pages
Lecture 3&4
No ratings yet
Lecture 3&4
294 pages
R Programming
No ratings yet
R Programming
9 pages
Unit 4 Statistics Notes Scatter Plot 2023-24
No ratings yet
Unit 4 Statistics Notes Scatter Plot 2023-24
15 pages
Data Science Essentials For Dummies
From Everand
Data Science Essentials For Dummies
Lillian Pierson
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Six Sigma Principles with Practice
From Everand
Six Sigma Principles with Practice
John Fraser
3.5/5 (3)
Be a Pro on Twitter
From Everand
Be a Pro on Twitter
Eng. Abdulmalek bin Saud Arrfyyq
No ratings yet
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Learning jqPlot
From Everand
Learning jqPlot
Scott Gottreu
No ratings yet
Exploratory Data Analysis in R
No ratings yet
Exploratory Data Analysis in R
33 pages
Data Visualization With Ggplot2: Install Packages
No ratings yet
Data Visualization With Ggplot2: Install Packages
19 pages
Building Bench Strategic Planning Ceos Executive Succession
No ratings yet
Building Bench Strategic Planning Ceos Executive Succession
19 pages
IIM B Casebook
100% (2)
IIM B Casebook
174 pages
PDF Materi 5 Sejarah Bangsa Indonesia Mapaba Pmii Ucy - Compress
No ratings yet
PDF Materi 5 Sejarah Bangsa Indonesia Mapaba Pmii Ucy - Compress
7 pages
Iron Man Homework
100% (1)
Iron Man Homework
4 pages
Introduction To AI Storytelling
100% (4)
Introduction To AI Storytelling
11 pages
REVISED - Even Sem 2025 Exam Schedule IEMS - Regular - 20250401
No ratings yet
REVISED - Even Sem 2025 Exam Schedule IEMS - Regular - 20250401
18 pages
SMA Discussion Deck 1 ENG
No ratings yet
SMA Discussion Deck 1 ENG
9 pages
SABER_EMIS_Rubric - Questionnaire
No ratings yet
SABER_EMIS_Rubric - Questionnaire
10 pages
Goes Et Al. - 2021 - Unlocking The Potential of Big Data To Support Tac
No ratings yet
Goes Et Al. - 2021 - Unlocking The Potential of Big Data To Support Tac
17 pages
When To Augment Decisions With Artificial Intelligence
No ratings yet
When To Augment Decisions With Artificial Intelligence
10 pages
Sandip Adhvaryu
No ratings yet
Sandip Adhvaryu
12 pages
Big Data
No ratings yet
Big Data
6 pages
Antsomi Company Profile Deck 2023 Jun2023
No ratings yet
Antsomi Company Profile Deck 2023 Jun2023
34 pages
No code data science tools
No ratings yet
No code data science tools
13 pages
B2B Firm: Project Report Submission ON
No ratings yet
B2B Firm: Project Report Submission ON
21 pages
Why Is Supply Chain Analytics Important?
No ratings yet
Why Is Supply Chain Analytics Important?
13 pages
Business Planning Tools
100% (2)
Business Planning Tools
5 pages
Aligining HEI With Data Science Workforce
No ratings yet
Aligining HEI With Data Science Workforce
6 pages
Philippine Skills Framework For Global In-House Center
100% (1)
Philippine Skills Framework For Global In-House Center
242 pages
AnI Parivartana
No ratings yet
AnI Parivartana
120 pages
JD - E-Commerce
No ratings yet
JD - E-Commerce
2 pages
MBA - Online Program Details With Syllabus
No ratings yet
MBA - Online Program Details With Syllabus
146 pages
Tableau interview questions
No ratings yet
Tableau interview questions
53 pages
Infosys Consulting BSchool Campus JD_FY 2024-25
No ratings yet
Infosys Consulting BSchool Campus JD_FY 2024-25
13 pages
Digi Agro
No ratings yet
Digi Agro
64 pages
Steps for Data Analytics
No ratings yet
Steps for Data Analytics
6 pages
Marketing Skills Audit
No ratings yet
Marketing Skills Audit
12 pages

Data Analytics and Visualization of TwitterSpam Dataset

Uploaded by

Data Analytics and Visualization of TwitterSpam Dataset

Uploaded by

Data Analytics and Visualization of TwitterSpam Dataset

1. Density Plot for 'no_tweets':

# Load required libraries

# Load the dataset

twitter_spam <- read.csv("TwitterSpam.txt")

# Create density plot

ggplot(twitter_spam, aes(x = no_tweets, fill = label)) +

labs(title = "Density Plot of no_tweets by Spam Status",

fill = "Spam Status") +

here is the output:

2. Scatterplots for 'account_age' and 'no_follower':

# Create scatterplot account age

ggplot(twitter_spam, aes(x = accout_age, y = no_follower, color = label)) +

labs(title = "Scatterplot of Account Age vs. Number of Followers",

color = "Spam Status") +

Observation from Scatterplots:

3. Adding Regression Lines to the Scatterplot:

# Add regression lines to scatterplot

ggplot(twitter_spam, aes(x = account_age, y = no_follower, color = label)) +

geom_smooth(method = "lm", se = FALSE) +

color = "Spam Status") +

Observation from Regression Lines:

Advantages of Adding Regression Lines:

You might also like