0% found this document useful (0 votes)
98 views

Twitter API

The document discusses Twitter APIs and how to access Twitter data using various APIs and Python libraries. There are two main types of Twitter APIs - the REST API which is used to search existing tweets, and the Streaming API which accesses tweets in real-time. The Streaming API has two endpoints - the Filter endpoint to filter tweets by keyword, user, location, and the Sample endpoint to extract a random sample of all tweets. Tweepy is a Python library used to connect to the Twitter APIs and collect tweet data. Authentication requires obtaining API keys from the Twitter developer website.

Uploaded by

Again Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Twitter API

The document discusses Twitter APIs and how to access Twitter data using various APIs and Python libraries. There are two main types of Twitter APIs - the REST API which is used to search existing tweets, and the Streaming API which accesses tweets in real-time. The Streaming API has two endpoints - the Filter endpoint to filter tweets by keyword, user, location, and the Sample endpoint to extract a random sample of all tweets. Tweepy is a Python library used to connect to the Twitter APIs and collect tweet data. Authentication requires obtaining API keys from the Twitter developer website.

Uploaded by

Again Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Twitter API

API – method of accessing data


Twitter APIs – Search, Ads and Streaming API
Rest API
- Used to search for existing tweets.
- Often these APIs limit the amount of tweets that can be
retrieved
Streaming API
- Looks into the future; used to access real time tweets
- By keeping the HTTP connection open, one can retrieve all
the tweets that match the filter criteria, as they are published.
- It has 2 endpoints using which one can fetch the data:
a. Filter endpoint – using keyword, user id and location
b. Sample endpoint – extract random sample of whole twitter
data

To summarize, the REST APIs are useful when we want to search


for tweets authored by a specific user or we want to access our
own timeline, while the Streaming API is useful when we want to
filter a particular keyword and download a massive amount of
tweets about it.

To use any of the APIs provided by Twitter, first collect a series of


Twitter API keys, which is used to connect to such API.
1. Create Twitter developer account
2. Create an app and fill the form to create the application
3. Go to “keys and Tokens” tab to collect the tokens.
4. Create an Access token and access token secret.
Tweepy is used to connect to the Twitter API.
To use Twitter’s REST API:-
Create a python application to authenticate with Twitter
1. auth = tweepy.OAuthHandler(consumer_keys,
consumer_secret)
2. auth.set_access_token(access_token, access_token_secret)
3. api = tweepy.API(auth)
4. Create Cursor instance to extract tweets. It takes query ,
tweet_node type, etc.

To use Twitter’s Streaming API :-


Create a python application to authenticate with Twitter
1. Create the class that will handle the tweet stream with
StreamListener as argument.
2. In the class define on_data() and on_error() function with self,
data and self, status as arguments respectively.
3. Create class object.
4. Create OAuthHandler instance. Into it consumer key and
consumer secret key is passed. (To register client application
with Twitter).
5. Set access token with the help of set_access_token()
method. In it access_token and access_token_secret is
passed as the argument.
6. Create Stream object in which auth and the above created
class object is passed as the argument.
7. Begin collecting data
Stream.filter()
8. Save the data in txt file.
9. Open file and append the tweet in the list.
10. Convert the tweet data into a pandas DataFrame to
simplify the data manipulation.

Using Streaming API -


Tweepy requires an object called ‘SListener’ which tells it
how to handle incoming data
SListener object inherits from a general ‘StreamListener’
class included with tweepy.
It opens a new timestamped file in which to store tweets and
takes an optional API argument

OAuthentication, the authentication protocol which the Twitter


API uses, requires four tokens which we obtain from the
twitter developer site.
We pass the ‘OAuthHandler’ our consumer key and
consumer secret. Then the access token and the access
token secret is set. Finally the auth object is passed to the
tweepy API object.

Now collecting data with tweepy


To take the random sample of all the Twitter, we would use
the sample endpoint.
First , instantiate SListener object.
Then instantiate stream object.
Lastly call the sample method to begin collecting data.
Contents of Twitter Json
- How many retweets, favorites
- What language is used
- Reply to which tweet
- Reply to which user
- Created_at
- Unique id
- Text etc.
Twitter JSON contains several important child JSON objects
like –
- user: it contains all the useful info abt user who tweeted,
their name , twitter handle , twitter bio, location and if
they’re verified
- place: info abt geolocation of the tweet
- extended_tweet : contains tweets over 140 characters.
- retweeted_status and quoted_status : contains all tweet
information of retweets and quoted tweets.
Accessing JSON
open() and read() methods are used to load the JSON file
into JSON object.
Then json package and the loads method is used to convert
the JSON into a python dictionary.
Lastly, the value of interest is accessed by using its
appropriate key.

Child tweet json – can be accessed as nested dictionaries.


Flattening of JSON –

Basic text analysis:-


counting words - # times that word appears in text and how
many times it has appeared compared to other keywords in
the text.
str.contains method is used to count words (it is a pandas
Series string method) ; return Boolean Series ; case
insensitive search , case = False.

References –
1. https://medium.com/@z_ai/downloading-data-from-twitter-
using-the-streaming-api-3ac6766ba96c
2. https://docs.tweepy.org/en/v3.4.0/streaming_how_to.html
Using selenium to scrape data from Instagram:-
1. Install chrome driver
2. Create a new driver with the help of webdriver library of selenium module which
will result in a new browser window popping up and navigating to the provided
page.

3. Automating the login and search process


a. Open inspect tool and search for the attributes for the input fields which you
want to access.
Eg. To be able to access username input field. First inspect it in the code
and search for the name attribute which takes the value entered by the user.

Similarly automate the process for handling other items.


4. Search for a keyword
For searching we need to access the search input field and then press enter
key. For targeting the input field where the placeholder is equal to “Search”.
After accessing the input field keyword and the ENTER key is passed to the
target field.

You might also like