SlideShare a Scribd company logo
Gathering Data from Twitter
Marcello Tomasini – mtomasini@my.fit.edu
Marcos Oliveira – moliveirajun2013@my.fit.edu
Adapted from 2013 Hugo Serrano’s and 2014 Diogo Pacheco’s Lectures
Introduction
▪ What is Twitter
▪ Twitter is an online social network that allows users
to send and read messages of up to 140
characters, known as tweets.
▪ https://twitter.com
Introduction
▪ Twitter
▪ Online Social Network
▪ Users can follow and be followed by other users.
▪ A Real Time Information Network
▪ Tweets can be replied, retweeted, and favorited.
Twitter APIs
▪ REST APIs
▪ Access core primitives of Twitter
▪ provides programmatic access to read and write Twitter data
▪ Author a new Tweet, retweet a Tweet, read author profile and follower data,
and more
▪ Search API
▪ Finding a set of Tweets, Tweets from a particular user, or Tweets with
specific keywords/hashtags
▪ focused on relevance and not completeness
▪ Streaming APIs
▪ Continuously deliver new responses to REST API queries
▪ Useful for Data Mining & Analytics research
▪ When application is rate-limited for over-polling the REST APIs
▪ Large quantity of keywords
▪ Ads APIs
▪ https://dev.twitter.com/overview/documentation
REST APIs
▪ Communicate over HTTP with the same HTTP
verbs (GET, POST)
▪ Twitter limits the number of requests per 15
minutes intervals
▪ https://dev.twitter.com/rest/public/rate-limiting
▪ https://dev.twitter.com/rest/public/rate-limits
▪ Detailed reference of Search API endpoint
▪ https://dev.twitter.com/rest/public/search
▪ https://dev.twitter.com/rest/reference/get/search/
tweets
Streaming APIs
▪ Low latency access to Twitter’s global stream of Tweet
data.
▪ Twitter offers several streaming endpoints
▪ Public Streams
▪ https://dev.twitter.com/streaming/public
▪ Streams of the public data flowing through Twitter.
▪ Suitable for following specific users or topics, and data mining
▪ User Streams
▪ https://dev.twitter.com/streaming/userstreams
▪ Contain roughly all of the data corresponding with a single user’s view of Twitter.
▪ Site Streams
▪ https://dev.twitter.com/streaming/sitestreams
▪ Multi-user version of user streams. Site streams are intended for servers which must
connect to Twitter on behalf of many users.
▪ Require a long-lived HTTP connection
API Objects
▪ The most frequently observed objects in Twitter
ecosystem:
▪ Tweets
▪ Users
▪ Entities
▪ Places
API Object: Tweets
▪ Basic building blocks
▪ Also known as “status updates”
▪ Fields such as
▪ created_at: UTC time when this Tweet was created.
▪ coordinates: the geographic location of this Tweet.
▪ in_reply_to_screen_name: the name of the original Tweet's
author.
▪ in_reply_to_status_id: the id of the original Tweet's author.
▪ retweeted_status: a representation of the original Tweet that
was retweeted.
▪ https://dev.twitter.com/overview/api/tweets
API Object: Users
▪ They tweet, follow, retweet, are mentioned etc.
▪ Fields such as:
▪ id: the unique identification of a user
▪ screen_name: the screen name, handle, or alias that this
user identifies themselves with
▪ followers_count: the number of followers this account
currently has
▪ https://dev.twitter.com/overview/api/users
API Object: Entities
▪ Provide metadata and additional information about
content posted on Twitter:
▪ media
▪ urls
▪ hashtags
▪ user_mentions
▪ https://dev.twitter.com/overview/api/entities
▪ https://dev.twitter.com/overview/api/entities-in-
twitter-objects
API Objects: Places
▪ Specific, named locations with corresponding geo
coordinates
▪ Tweets associated with places are not necessarily
issued from that location but could also potentially
be about that location
▪ Fields such as
▪ country
▪ bounding_box
▪ full_name
▪ place_type
▪ https://dev.twitter.com/overview/api/places
Twitter APIs: OAuth Authentication
▪ Create a Twitter profile
▪ Create an application (you need a phone number)
▪ Create Access Tokens
▪ Use the OAuth tool to get your keys, tokens and
secrets
▪ https://apps.twitter.com
Twitter Libraries
▪ For some libraries (python), the Twitter API website is
the documentation for the developers
▪ https://dev.twitter.com/overview/api/twitter-libraries
-> api.search.tweets(q=“#HowToComplexNetworks”)
Useful Hints
▪ Get accustomed with Twitter App
Useful Hints
▪ Try first Twitter Console before coding
https://dev.twitter.com/rest/tools/console
Useful Hints
▪ Check rate limits in your code – try/catch loops
▪ Do you need database? Relational one?
▪ Remember 2-step process:
▪ Collect data
▪ Construct the network
▪ Intensive traffic/volume, try MongoDB
Useful Hints?
▪ 2014 FIFA World Cup Dataset
▪ +50 millions tweets
▪ +1 million geo-tagged tweets
▪ 1 month window period
▪ Database Optimization
▪ Memory Consuming
▪ Programs Crashes
Twitter api resources
Twitter API - Resources
▪ Timelines
▪ Search
▪ Tweets
▪ Streaming
▪ Direct Messages
▪ Friends & Followers
▪ Users
▪ Suggested Users
▪ Favorites
▪ Lists
▪ Saved Searches
▪ Places & Geo
▪ Trends
▪ Spam Reporting
▪ OAuth
▪ Help
Twitter API - Resources
▪ Search
▪ Let’s search for 5 most recent tweets with “Florida
Tech” on it
▪ Let’s search for 5 most recent tweets within 1mi
Twitter API – Resources
▪ Friends & Followers
▪ Let’s query for the friends of a particular user
▪ Let’s query for the first 100 followers of a particular
user
Twitter API - Resources
▪ Users
▪ Let’s query for the usernames of the first 100
followers of a particular user
© Ronaldo Menezes, Florida Tech
Twitter API - Resources
▪ Streaming
▪ Let’s track tweets with the hashtag #cse5656 and
#netsci
▪ Let’s track the tweets in Brevard County
Questions?
Marcello Tomasini – mtomasini@my.fit.edu
Marcos Oliveira – moliveirajun2013@my.fit.edu
http://my.fit.edu/~mtomasini/classes/ComplexNetworks.html

More Related Content

Similar to CSE5656 Complex Networks - Gathering Data from Twitter (20)

PPTX
Harvesting Data from Twitter Workshop: Hands-on Experience
ASA_Group
 
PPTX
Social Media Data
Will Simm
 
KEY
Twitter API 2.0
Alex Payne
 
PDF
Collecting Twitter Data
Cornelius Puschmann
 
ODP
Twitter
Rajesh Barri
 
PDF
Twitter Trend Analyzer
Matthew Chang
 
PDF
500Startups @ Twitter
Raffi Krikorian
 
PDF
The Open Source... Behind the Tweets
Chris Aniszczyk
 
PPT
Social media analysis in R using twitter API
Mohd Shadab Alam
 
PPT
what is-twitter
aiesecjalandhar
 
PPTX
#tmeetup BirdHackers API 101
jstrellner
 
PDF
20130504 - FeWeb - Twitter API
Pascal Alberty
 
PPTX
Taming Social Media with MongoDB
HumanGeo Group
 
PPTX
MongoDC 2012: Taming Social Media with MongoDB
MongoDB
 
PPTX
Twitter sentiment analysis ppt
SonuCreation
 
PPTX
Harvesting Social Media (in BESOCIAL)
Sven Lieber
 
PPT
Coalmine spie 2012 presentation - jsw -d3
Joshua S. White, PhD [email protected]
 
PPTX
Development of Twitter Application #5 - Users
Myungjin Lee
 
PDF
Twitter streamingapi rubymongodbv2
Jeff Linwood
 
PDF
Consuming the Twitter Streaming API with Ruby and MongoDB
Jeff Linwood
 
Harvesting Data from Twitter Workshop: Hands-on Experience
ASA_Group
 
Social Media Data
Will Simm
 
Twitter API 2.0
Alex Payne
 
Collecting Twitter Data
Cornelius Puschmann
 
Twitter
Rajesh Barri
 
Twitter Trend Analyzer
Matthew Chang
 
500Startups @ Twitter
Raffi Krikorian
 
The Open Source... Behind the Tweets
Chris Aniszczyk
 
Social media analysis in R using twitter API
Mohd Shadab Alam
 
what is-twitter
aiesecjalandhar
 
#tmeetup BirdHackers API 101
jstrellner
 
20130504 - FeWeb - Twitter API
Pascal Alberty
 
Taming Social Media with MongoDB
HumanGeo Group
 
MongoDC 2012: Taming Social Media with MongoDB
MongoDB
 
Twitter sentiment analysis ppt
SonuCreation
 
Harvesting Social Media (in BESOCIAL)
Sven Lieber
 
Coalmine spie 2012 presentation - jsw -d3
Joshua S. White, PhD [email protected]
 
Development of Twitter Application #5 - Users
Myungjin Lee
 
Twitter streamingapi rubymongodbv2
Jeff Linwood
 
Consuming the Twitter Streaming API with Ruby and MongoDB
Jeff Linwood
 

Recently uploaded (20)

PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
big data eco system fundamentals of data science
arivukarasi
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Ad

CSE5656 Complex Networks - Gathering Data from Twitter

  • 1. Gathering Data from Twitter Marcello Tomasini – [email protected] Marcos Oliveira – [email protected] Adapted from 2013 Hugo Serrano’s and 2014 Diogo Pacheco’s Lectures
  • 2. Introduction ▪ What is Twitter ▪ Twitter is an online social network that allows users to send and read messages of up to 140 characters, known as tweets. ▪ https://twitter.com
  • 3. Introduction ▪ Twitter ▪ Online Social Network ▪ Users can follow and be followed by other users. ▪ A Real Time Information Network ▪ Tweets can be replied, retweeted, and favorited.
  • 4. Twitter APIs ▪ REST APIs ▪ Access core primitives of Twitter ▪ provides programmatic access to read and write Twitter data ▪ Author a new Tweet, retweet a Tweet, read author profile and follower data, and more ▪ Search API ▪ Finding a set of Tweets, Tweets from a particular user, or Tweets with specific keywords/hashtags ▪ focused on relevance and not completeness ▪ Streaming APIs ▪ Continuously deliver new responses to REST API queries ▪ Useful for Data Mining & Analytics research ▪ When application is rate-limited for over-polling the REST APIs ▪ Large quantity of keywords ▪ Ads APIs ▪ https://dev.twitter.com/overview/documentation
  • 5. REST APIs ▪ Communicate over HTTP with the same HTTP verbs (GET, POST) ▪ Twitter limits the number of requests per 15 minutes intervals ▪ https://dev.twitter.com/rest/public/rate-limiting ▪ https://dev.twitter.com/rest/public/rate-limits ▪ Detailed reference of Search API endpoint ▪ https://dev.twitter.com/rest/public/search ▪ https://dev.twitter.com/rest/reference/get/search/ tweets
  • 6. Streaming APIs ▪ Low latency access to Twitter’s global stream of Tweet data. ▪ Twitter offers several streaming endpoints ▪ Public Streams ▪ https://dev.twitter.com/streaming/public ▪ Streams of the public data flowing through Twitter. ▪ Suitable for following specific users or topics, and data mining ▪ User Streams ▪ https://dev.twitter.com/streaming/userstreams ▪ Contain roughly all of the data corresponding with a single user’s view of Twitter. ▪ Site Streams ▪ https://dev.twitter.com/streaming/sitestreams ▪ Multi-user version of user streams. Site streams are intended for servers which must connect to Twitter on behalf of many users. ▪ Require a long-lived HTTP connection
  • 7. API Objects ▪ The most frequently observed objects in Twitter ecosystem: ▪ Tweets ▪ Users ▪ Entities ▪ Places
  • 8. API Object: Tweets ▪ Basic building blocks ▪ Also known as “status updates” ▪ Fields such as ▪ created_at: UTC time when this Tweet was created. ▪ coordinates: the geographic location of this Tweet. ▪ in_reply_to_screen_name: the name of the original Tweet's author. ▪ in_reply_to_status_id: the id of the original Tweet's author. ▪ retweeted_status: a representation of the original Tweet that was retweeted. ▪ https://dev.twitter.com/overview/api/tweets
  • 9. API Object: Users ▪ They tweet, follow, retweet, are mentioned etc. ▪ Fields such as: ▪ id: the unique identification of a user ▪ screen_name: the screen name, handle, or alias that this user identifies themselves with ▪ followers_count: the number of followers this account currently has ▪ https://dev.twitter.com/overview/api/users
  • 10. API Object: Entities ▪ Provide metadata and additional information about content posted on Twitter: ▪ media ▪ urls ▪ hashtags ▪ user_mentions ▪ https://dev.twitter.com/overview/api/entities ▪ https://dev.twitter.com/overview/api/entities-in- twitter-objects
  • 11. API Objects: Places ▪ Specific, named locations with corresponding geo coordinates ▪ Tweets associated with places are not necessarily issued from that location but could also potentially be about that location ▪ Fields such as ▪ country ▪ bounding_box ▪ full_name ▪ place_type ▪ https://dev.twitter.com/overview/api/places
  • 12. Twitter APIs: OAuth Authentication ▪ Create a Twitter profile ▪ Create an application (you need a phone number) ▪ Create Access Tokens ▪ Use the OAuth tool to get your keys, tokens and secrets ▪ https://apps.twitter.com
  • 13. Twitter Libraries ▪ For some libraries (python), the Twitter API website is the documentation for the developers ▪ https://dev.twitter.com/overview/api/twitter-libraries -> api.search.tweets(q=“#HowToComplexNetworks”)
  • 14. Useful Hints ▪ Get accustomed with Twitter App
  • 15. Useful Hints ▪ Try first Twitter Console before coding https://dev.twitter.com/rest/tools/console
  • 16. Useful Hints ▪ Check rate limits in your code – try/catch loops ▪ Do you need database? Relational one? ▪ Remember 2-step process: ▪ Collect data ▪ Construct the network ▪ Intensive traffic/volume, try MongoDB
  • 17. Useful Hints? ▪ 2014 FIFA World Cup Dataset ▪ +50 millions tweets ▪ +1 million geo-tagged tweets ▪ 1 month window period ▪ Database Optimization ▪ Memory Consuming ▪ Programs Crashes
  • 19. Twitter API - Resources ▪ Timelines ▪ Search ▪ Tweets ▪ Streaming ▪ Direct Messages ▪ Friends & Followers ▪ Users ▪ Suggested Users ▪ Favorites ▪ Lists ▪ Saved Searches ▪ Places & Geo ▪ Trends ▪ Spam Reporting ▪ OAuth ▪ Help
  • 20. Twitter API - Resources ▪ Search ▪ Let’s search for 5 most recent tweets with “Florida Tech” on it ▪ Let’s search for 5 most recent tweets within 1mi
  • 21. Twitter API – Resources ▪ Friends & Followers ▪ Let’s query for the friends of a particular user ▪ Let’s query for the first 100 followers of a particular user
  • 22. Twitter API - Resources ▪ Users ▪ Let’s query for the usernames of the first 100 followers of a particular user
  • 23. © Ronaldo Menezes, Florida Tech Twitter API - Resources ▪ Streaming ▪ Let’s track tweets with the hashtag #cse5656 and #netsci ▪ Let’s track the tweets in Brevard County
  • 24. Questions? Marcello Tomasini – [email protected] Marcos Oliveira – [email protected] http://my.fit.edu/~mtomasini/classes/ComplexNetworks.html