SlideShare a Scribd company logo
Good morning!

Enjoy your coffee and install
Putty and NotepadPlus via "Software Maintance/Application
Catalgue". And the Pattern-package (see my e-mail). Thanks.
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Hands-on-Workshop
Big (Twitter) Data
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam

30 January 2014
9.30
#bigdata

Damian Trilling
Analyzing social media with Python and other tools (1/4)
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

The next one and a half days
You’ll hear about
• Collecting social media data via APIs, RSS and scraping (and

the tools for it)
• Technical infrastructure (via surfsara)
• Python
• Sentiment analysis
• Automated coding
• Frequencies and other statistics
• Social network analysis with Gephi
• ...

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

In this session (1/4):
1 Big Data? What are we talking about?

Exploring the field
Some examples
2 The process: collect, store, analyze

A scheme
Our implementation
3 Python

What it is
When to use it
When not to use it
4 Questions?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What’s big data?
What are we talking about?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?
Today, it’s a hands-on workshop, so let’s keep this important (!)
discussion for later.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?

So, no definition, but some brief thoughts
• Existing data ( = experiments or surveys)
• Too big to code manually
• Too big to handle with normal tools
• New research questions
• Call to revisit the relationship between theory and empirical

research

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?
Today, . . .
• we are not going to talk about REALLY BIG data,
• but we will have some exercises on datasets a normal

computer can handle

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?
Today, . . .
• we are not going to talk about REALLY BIG data,
• but we will have some exercises on datasets a normal

computer can handle

Tomorrow, . . .
• we will also learn about scaling up these techniques
• SurfSARA provides infrastructure for this

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?

Some sources
• Social Network Sites
• RSS-feeds
• Databases
• Scraping text from the web
• ...

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

It’s out there!
You only have to collect it.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

But why should we care?
We can answer new questions
• Find needles in haystacks
• Identify networks, co-word analysis, linguistic analysis, . . .
• Verify our theories in larger datasets

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

But why should we care?
We can answer new questions
• Find needles in haystacks
• Identify networks, co-word analysis, linguistic analysis, . . .
• Verify our theories in larger datasets

It makes sense
• There are things that computers are simply better at than

humans, e.g. in counting things
• Having human coders look for words in texts is like calculating

a regression analysis by hand

#bigdata

Damian Trilling
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent master thesis

The needle in the haystack

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent master thesis

The needle in the haystack
Imagine you want to analyze some very rare content.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent master thesis

The needle in the haystack
Imagine you want to analyze some very rare content.
Normal sampling won’t work, that’s for sure.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites
1

Collect all articles from nine news sites during a period of two
months, resulting in a database with 74.000 articles.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites
1

Collect all articles from nine news sites during a period of two
months, resulting in a database with 74.000 articles.

2

Filter articles containing specific keywords.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites
1

Collect all articles from nine news sites during a period of two
months, resulting in a database with 74.000 articles.

2

Filter articles containing specific keywords.

3

Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

It’s just one line of code!

url.txt
http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung
http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten
http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik
http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food
...
...
...

#bigdata

wget-commando
wget -i urls.txt

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent bachelor thesis

Tone in tweets

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.
She used a Python-script to check which type of words was used to
refer to opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.
She used a Python-script to check which type of words was used to
refer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Frame adoption on Twitter

Which phrases used by Merkel and Steinbrück on TV make it
to the #tvduell discussion on Twitter?
Identify frequently used words in the transcript of the debate and
in tweets.
Find co-occurrances.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Frame adoption on Twitter

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

A scheme

The process: collect, store, analyze
A scheme

#bigdata

Damian Trilling
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl
yourTwapperkeeper
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl
yourTwapperkeeper
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.

rsshond
Calls the RSS-feeds of news sites 1x/hour,
saves title, time, header, and teaser of all new
articles into a CSV-table, follows the link to
the full text and downloads them.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl
yourTwapperkeeper
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.

rsshond
Calls the RSS-feeds of news sites 1x/hour,
saves title, time, header, and teaser of all new
articles into a CSV-table, follows the link to
the full text and downloads them.

snapshot
Visits some URLs every 4x/day and downloads
them.
#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?
Apache-webserver
Download the data from
http://datacollection.
followthenews-uva.cloudlet.sara.nl.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?
Apache-webserver
Download the data from
http://datacollection.
followthenews-uva.cloudlet.sara.nl.

SSH (scp)
Transfer data directly to your computer or
another server (like
speeltuin.followthenews-uva.cloudlet.sara.nl)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?
Apache-webserver
Download the data from
http://datacollection.
followthenews-uva.cloudlet.sara.nl.

SSH (scp)
Transfer data directly to your computer or
another server (like
speeltuin.followthenews-uva.cloudlet.sara.nl)

Beehub
Connect the server to beehub, which can be
mounted like the "p-schijf" or accessed online.
#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to rule them all?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to rule them all?

Of course there are ready-made tool for some of the questions we
want to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to rule them all?

Of course there are ready-made tool for some of the questions we
want to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.

fun!

#bigdata

And it’s

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to

process
• You can run it on every platform

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to

process
• You can run it on every platform
• And yet it is easy to learn!

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to

process
• You can run it on every platform
• And yet it is easy to learn!

It is widely used for content analysis
• Many online ressources and toolkits
• Books about NLP and Web Scraping with Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer. If you know how to
write SPSS or STATA syntax, you
will understand Python.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer. If you know how to
write SPSS or STATA syntax, you
will understand Python.
(But if you have ever had contact with whatever programming language,
it helps.)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer. If you know how to
write SPSS or STATA syntax, you
will understand Python.
(But if you have ever had contact with whatever programming language,

It’s enough if you can read and
modify the code.
it helps.)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1

#bigdata

The data structure: You have a folder with articles

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1
2

#bigdata

The data structure: You have a folder with articles
The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1
2

The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned

3

#bigdata

The data structure: You have a folder with articles

A typical task for a short Python script!

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You need someting like this:
for every file in folder:
read the file
count actors
add new row to table with filename and actor counts
save table
(such a notation is called pseudo-code)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

mypath ="C:UsersRicardaDocumentsArtikelen"
regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’)
filename_list=[]
matchcount54=0
matchcount54_list=[]
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
for f in onlyfiles:
matchcount54=0
artikel=open(join(mypath,f),"r")
for line in artikel:
matches54 = regex54.findall(line)
for word in matches54:
matchcount54=matchcount54+1
filename_list.append(f)
matchcount54_list.append(matchcount54)
artikel.close()
output=zip(filename_list,matchcount54_list)
writer = csv.writer(open("overzichtstabel.csv", ’wb’))
writer.writerows(output)
#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

This is not too different from a script Jelle uses for his dissertation.
The main difference: He doesn’t code regular expressions, but
calculates document similarity.
slides-jelle.pdf

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

When to use Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

1st group of tasks

Highly repetitive tasks
Simple tasks (counting things, comparing texts, . . . ) that can be
described in a formalized way. Saves time even with few cases, but
there is virtually no size limit.
Example: Retweets start with RT, optionally followed by a space,
and some letters. So it is very easy to identify them automatically

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

2nd group of tasks

Task for which specific Python modules exist
There are thousands of modules suitable for text analysis. You
basically only have to write code for data input and output.
Example: Sentiment analysis

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

3rd group of tasks

API’s, RSS, webscraping . . .
You can use Python if you want to collect and store information.
Example: Collecting bio’s of Twitter users, scraping the web (data
journalism!), downloading Facebook data

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

When not to use Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Maybe you do not need to write a Python script . . .

. . . when there are already suitable tools available.
Sometimes, the perfect ready-made tool already exists.

Example: Axel Bruns’ awk-scripts for Twitter analysis
(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in
Python, but hey, he did it already with awk and it works.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Maybe you do not need to write a Python script . . .

. . . when there are already suitable tools available.
Sometimes, the perfect ready-made tool already exists.
But still, sometimes it is more efficient to write something that does exactly
what you want
Example: Axel Bruns’ awk-scripts for Twitter analysis
(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in
Python, but hey, he did it already with awk and it works.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

And, let’s face it,. . .

. . . we are no programmers.
So maybe, some tasks are too complex for us to program ourselves.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

And, let’s face it,. . .

. . . we are no programmers.
So maybe, some tasks are too complex for us to program ourselves.
But there is a huge online community that helps you.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Recap
1 Big Data? What are we talking about?

Exploring the field
Some examples
2 The process: collect, store, analyze

A scheme
Our implementation
3 Python

What it is
When to use it
When not to use it
4 Questions?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

After the break

Hand’s on! Exploring a basic Python script

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Vragen of opmerkingen?

Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
#bigdata

Damian Trilling
Ad

More Related Content

What's hot (20)

BDACA - Lecture4
BDACA - Lecture4BDACA - Lecture4
BDACA - Lecture4
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
Department of Communication Science, University of Amsterdam
 
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Lecture2
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6
Department of Communication Science, University of Amsterdam
 
BDACA - Lecture5
BDACA - Lecture5BDACA - Lecture5
BDACA - Lecture5
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5
Department of Communication Science, University of Amsterdam
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA - Lecture7
Department of Communication Science, University of Amsterdam
 
BDACA - Tutorial5
BDACA - Tutorial5BDACA - Tutorial5
BDACA - Tutorial5
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
Department of Communication Science, University of Amsterdam
 
BD-ACA week5
BD-ACA week5BD-ACA week5
BD-ACA week5
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
Department of Communication Science, University of Amsterdam
 
BDACA - Lecture6
BDACA - Lecture6BDACA - Lecture6
BDACA - Lecture6
Department of Communication Science, University of Amsterdam
 
BDACA - Lecture8
BDACA - Lecture8BDACA - Lecture8
BDACA - Lecture8
Department of Communication Science, University of Amsterdam
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
Frank van Harmelen
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Skillspeed
 
Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018
Kim Hammar
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
Jonathan Stray
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
Frank van Harmelen
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Skillspeed
 
Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018
Kim Hammar
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
Jonathan Stray
 

Similar to Analyzing social media with Python and other tools (1/4) (20)

Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Trieu Nguyen
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
tkisason
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Python PPT
Python PPTPython PPT
Python PPT
Edureka!
 
BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1
Department of Communication Science, University of Amsterdam
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
stelligence
 
Google Cloud - Google's vision on AI
Google Cloud - Google's vision on AIGoogle Cloud - Google's vision on AI
Google Cloud - Google's vision on AI
BigDataExpo
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
c.titus.brown
 
Introduction To Data Science With Python
Introduction To Data Science With PythonIntroduction To Data Science With Python
Introduction To Data Science With Python
Spotle.ai
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Paco Nathan
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PyData
 
BD-ACA week1b
BD-ACA week1bBD-ACA week1b
BD-ACA week1b
Department of Communication Science, University of Amsterdam
 
Searching tech2
Searching tech2Searching tech2
Searching tech2
Hugh Barnard
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Big Data
Big DataBig Data
Big Data
Santhosh Shankar
 
Foundations of Big Data: Concepts, Techniques, and Applications
Foundations of Big Data: Concepts, Techniques, and ApplicationsFoundations of Big Data: Concepts, Techniques, and Applications
Foundations of Big Data: Concepts, Techniques, and Applications
hoisala6sludger
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
Kamila Stępniowska
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
Roy Cecil
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
Skillwise Consulting
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
Diego Oppenheimer
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Trieu Nguyen
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
tkisason
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Python PPT
Python PPTPython PPT
Python PPT
Edureka!
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
stelligence
 
Google Cloud - Google's vision on AI
Google Cloud - Google's vision on AIGoogle Cloud - Google's vision on AI
Google Cloud - Google's vision on AI
BigDataExpo
 
Introduction To Data Science With Python
Introduction To Data Science With PythonIntroduction To Data Science With Python
Introduction To Data Science With Python
Spotle.ai
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Paco Nathan
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PyData
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Foundations of Big Data: Concepts, Techniques, and Applications
Foundations of Big Data: Concepts, Techniques, and ApplicationsFoundations of Big Data: Concepts, Techniques, and Applications
Foundations of Big Data: Concepts, Techniques, and Applications
hoisala6sludger
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
Kamila Stępniowska
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
Roy Cecil
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
Diego Oppenheimer
 
Ad

More from Department of Communication Science, University of Amsterdam (8)

BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA - Tutorial1
Department of Communication Science, University of Amsterdam
 
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
BDACA - Lecture1
Department of Communication Science, University of Amsterdam
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
Department of Communication Science, University of Amsterdam
 
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
Department of Communication Science, University of Amsterdam
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
Department of Communication Science, University of Amsterdam
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
Department of Communication Science, University of Amsterdam
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
Department of Communication Science, University of Amsterdam
 
BDACA1516s2 - Lecture4
 BDACA1516s2 - Lecture4 BDACA1516s2 - Lecture4
BDACA1516s2 - Lecture4
Department of Communication Science, University of Amsterdam
 
Ad

Recently uploaded (20)

How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 

Analyzing social media with Python and other tools (1/4)

  • 1. Good morning! Enjoy your coffee and install Putty and NotepadPlus via "Software Maintance/Application Catalgue". And the Pattern-package (see my e-mail). Thanks.
  • 2. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Hands-on-Workshop Big (Twitter) Data Damian Trilling [email protected] @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 30 January 2014 9.30 #bigdata Damian Trilling
  • 4. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? The next one and a half days You’ll hear about • Collecting social media data via APIs, RSS and scraping (and the tools for it) • Technical infrastructure (via surfsara) • Python • Sentiment analysis • Automated coding • Frequencies and other statistics • Social network analysis with Gephi • ... #bigdata Damian Trilling
  • 5. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? In this session (1/4): 1 Big Data? What are we talking about? Exploring the field Some examples 2 The process: collect, store, analyze A scheme Our implementation 3 Python What it is When to use it When not to use it 4 Questions? #bigdata Damian Trilling
  • 6. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What’s big data? What are we talking about? #bigdata Damian Trilling
  • 7. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, it’s a hands-on workshop, so let’s keep this important (!) discussion for later. #bigdata Damian Trilling
  • 8. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? So, no definition, but some brief thoughts • Existing data ( = experiments or surveys) • Too big to code manually • Too big to handle with normal tools • New research questions • Call to revisit the relationship between theory and empirical research #bigdata Damian Trilling
  • 9. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, . . . • we are not going to talk about REALLY BIG data, • but we will have some exercises on datasets a normal computer can handle #bigdata Damian Trilling
  • 10. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, . . . • we are not going to talk about REALLY BIG data, • but we will have some exercises on datasets a normal computer can handle Tomorrow, . . . • we will also learn about scaling up these techniques • SurfSARA provides infrastructure for this #bigdata Damian Trilling
  • 11. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Some sources • Social Network Sites • RSS-feeds • Databases • Scraping text from the web • ... #bigdata Damian Trilling
  • 12. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field It’s out there! You only have to collect it. #bigdata Damian Trilling
  • 13. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field But why should we care? We can answer new questions • Find needles in haystacks • Identify networks, co-word analysis, linguistic analysis, . . . • Verify our theories in larger datasets #bigdata Damian Trilling
  • 14. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field But why should we care? We can answer new questions • Find needles in haystacks • Identify networks, co-word analysis, linguistic analysis, . . . • Verify our theories in larger datasets It makes sense • There are things that computers are simply better at than humans, e.g. in counting things • Having human coders look for words in texts is like calculating a regression analysis by hand #bigdata Damian Trilling
  • 17. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Some examples #bigdata Damian Trilling
  • 18. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack #bigdata Damian Trilling
  • 19. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack Imagine you want to analyze some very rare content. #bigdata Damian Trilling
  • 20. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack Imagine you want to analyze some very rare content. Normal sampling won’t work, that’s for sure. #bigdata Damian Trilling
  • 21. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 22. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 23. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. 2 Filter articles containing specific keywords. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 24. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. 2 Filter articles containing specific keywords. 3 Those 292 articles where then manually coded. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 25. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  • 26. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples It’s just one line of code! url.txt http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food ... ... ... #bigdata wget-commando wget -i urls.txt Damian Trilling
  • 27. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets #bigdata Damian Trilling
  • 28. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. #bigdata Damian Trilling
  • 29. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. Do you really want to go through thousands of tweets by hand? #bigdata Damian Trilling
  • 30. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 31. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 32. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. She used a Python-script to check which type of words was used to refer to opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 33. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. She used a Python-script to check which type of words was used to refer to opponents. For further analysis, the results where imported in SPSS. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 34. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  • 35. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  • 36. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Frame adoption on Twitter Which phrases used by Merkel and Steinbrück on TV make it to the #tvduell discussion on Twitter? Identify frequently used words in the transcript of the debate and in tweets. Find co-occurrances. #bigdata Damian Trilling
  • 37. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Frame adoption on Twitter #bigdata Damian Trilling
  • 38. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? A scheme The process: collect, store, analyze A scheme #bigdata Damian Trilling
  • 45. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl #bigdata Damian Trilling
  • 46. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. #bigdata Damian Trilling
  • 47. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. rsshond Calls the RSS-feeds of news sites 1x/hour, saves title, time, header, and teaser of all new articles into a CSV-table, follows the link to the full text and downloads them. #bigdata Damian Trilling
  • 48. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. rsshond Calls the RSS-feeds of news sites 1x/hour, saves title, time, header, and teaser of all new articles into a CSV-table, follows the link to the full text and downloads them. snapshot Visits some URLs every 4x/day and downloads them. #bigdata Damian Trilling
  • 49. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? #bigdata Damian Trilling
  • 50. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. #bigdata Damian Trilling
  • 51. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. SSH (scp) Transfer data directly to your computer or another server (like speeltuin.followthenews-uva.cloudlet.sara.nl) #bigdata Damian Trilling
  • 52. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. SSH (scp) Transfer data directly to your computer or another server (like speeltuin.followthenews-uva.cloudlet.sara.nl) Beehub Connect the server to beehub, which can be mounted like the "p-schijf" or accessed online. #bigdata Damian Trilling
  • 53. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Python #bigdata Damian Trilling
  • 54. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? #bigdata Damian Trilling
  • 55. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? Of course there are ready-made tool for some of the questions we want to answer. But for many, there isn’t. Python offers us the possibility to build exactly the tool we need. #bigdata Damian Trilling
  • 56. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? Of course there are ready-made tool for some of the questions we want to answer. But for many, there isn’t. Python offers us the possibility to build exactly the tool we need. fun! #bigdata And it’s Damian Trilling
  • 57. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform #bigdata Damian Trilling
  • 58. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform • And yet it is easy to learn! #bigdata Damian Trilling
  • 59. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform • And yet it is easy to learn! It is widely used for content analysis • Many online ressources and toolkits • Books about NLP and Web Scraping with Python #bigdata Damian Trilling
  • 60. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. #bigdata Damian Trilling
  • 61. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. #bigdata Damian Trilling
  • 62. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. (But if you have ever had contact with whatever programming language, it helps.) #bigdata Damian Trilling
  • 63. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. (But if you have ever had contact with whatever programming language, It’s enough if you can read and modify the code. it helps.) #bigdata Damian Trilling
  • 64. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? #bigdata Damian Trilling
  • 65. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 #bigdata The data structure: You have a folder with articles Damian Trilling
  • 66. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 2 #bigdata The data structure: You have a folder with articles The desired output: You want a table with the file names and a column per actor, counting how often they are mentioned Damian Trilling
  • 67. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 2 The desired output: You want a table with the file names and a column per actor, counting how often they are mentioned 3 #bigdata The data structure: You have a folder with articles A typical task for a short Python script! Damian Trilling
  • 68. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You need someting like this: for every file in folder: read the file count actors add new row to table with filename and actor counts save table (such a notation is called pseudo-code) #bigdata Damian Trilling
  • 69. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is mypath ="C:UsersRicardaDocumentsArtikelen" regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’) filename_list=[] matchcount54=0 matchcount54_list=[] onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ] for f in onlyfiles: matchcount54=0 artikel=open(join(mypath,f),"r") for line in artikel: matches54 = regex54.findall(line) for word in matches54: matchcount54=matchcount54+1 filename_list.append(f) matchcount54_list.append(matchcount54) artikel.close() output=zip(filename_list,matchcount54_list) writer = csv.writer(open("overzichtstabel.csv", ’wb’)) writer.writerows(output) #bigdata Damian Trilling
  • 70. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is This is not too different from a script Jelle uses for his dissertation. The main difference: He doesn’t code regular expressions, but calculates document similarity. slides-jelle.pdf #bigdata Damian Trilling
  • 71. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it When to use Python #bigdata Damian Trilling
  • 72. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 1st group of tasks Highly repetitive tasks Simple tasks (counting things, comparing texts, . . . ) that can be described in a formalized way. Saves time even with few cases, but there is virtually no size limit. Example: Retweets start with RT, optionally followed by a space, and some letters. So it is very easy to identify them automatically #bigdata Damian Trilling
  • 73. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 2nd group of tasks Task for which specific Python modules exist There are thousands of modules suitable for text analysis. You basically only have to write code for data input and output. Example: Sentiment analysis #bigdata Damian Trilling
  • 74. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 3rd group of tasks API’s, RSS, webscraping . . . You can use Python if you want to collect and store information. Example: Collecting bio’s of Twitter users, scraping the web (data journalism!), downloading Facebook data #bigdata Damian Trilling
  • 75. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it When not to use Python #bigdata Damian Trilling
  • 76. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Maybe you do not need to write a Python script . . . . . . when there are already suitable tools available. Sometimes, the perfect ready-made tool already exists. Example: Axel Bruns’ awk-scripts for Twitter analysis (www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in Python, but hey, he did it already with awk and it works. #bigdata Damian Trilling
  • 77. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Maybe you do not need to write a Python script . . . . . . when there are already suitable tools available. Sometimes, the perfect ready-made tool already exists. But still, sometimes it is more efficient to write something that does exactly what you want Example: Axel Bruns’ awk-scripts for Twitter analysis (www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in Python, but hey, he did it already with awk and it works. #bigdata Damian Trilling
  • 78. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it And, let’s face it,. . . . . . we are no programmers. So maybe, some tasks are too complex for us to program ourselves. #bigdata Damian Trilling
  • 79. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it And, let’s face it,. . . . . . we are no programmers. So maybe, some tasks are too complex for us to program ourselves. But there is a huge online community that helps you. #bigdata Damian Trilling
  • 80. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Recap 1 Big Data? What are we talking about? Exploring the field Some examples 2 The process: collect, store, analyze A scheme Our implementation 3 Python What it is When to use it When not to use it 4 Questions? #bigdata Damian Trilling
  • 81. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it After the break Hand’s on! Exploring a basic Python script #bigdata Damian Trilling
  • 82. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Vragen of opmerkingen? Damian Trilling [email protected] @damian0604 www.damiantrilling.net #bigdata Damian Trilling