Data Driven Journalism
Data Driven Journalism
Adam Westbrook, author of ‘Next Generation Journalist’, source: Interview by EJC, 2010
“One of our big goals in the storytelling process is
to humanize the statistics. It’s hard for people to
care about numbers, especially large numbers. How
do you get your head around the death of 800,000
people in the Rwandan genocide? I think if you
meet the individuals - see and hear the stories of the
survivors - you can gain a better insight into the
tragedy.”
The idea is to provide both experienced journalists and newcomers with a well-
structured primer, while breaking down some of the barriers holding back coders
and non-coders alike from starting to experiment. We need better solutions, good
platforms and better reporting.
Data is not an entirely new field, nor is storytelling. But deep changes are afoot:
while journalism’s old business models are crumbling, working with data provides
new and attractive opportunities. Should you agree or disagree with anything in this
paper, please drop us a line. It can only improve.
Summary goals
The aim is to provide food for thought and practical information, similar to the
great papers of Mindy McAdams (Reporters Guide to Multimedia Proficiency) and
Adam Westbrook (6x6 series, Next Generation Journalist). To provide a reference
point for journalists adapting to new workflows, we present the most important
information in one place: as a first overview on what data-driven journalism might
mean and how it can provide a new perspective for journalists.
Additional material
An important addition here are chapters providing links to good articles on data-
driven journalism, a long list of tools that allow you to play with data and dig
deeper, even if you don’t know how to code.
Introduction: Why data matters for journalism
Ten, even five years ago, the use of data as a basis for reporting was difficult and
costly, requiring IT skills far beyond what is common in the media. Databases were
used mainly by investigative journalists. Editors and reporters usually relied on
information provided by outside sources.
Today there is a notable change. Collections of data are becoming available online,
often for free. There is a whole stack of tools for digging into ‘big data‘. Open source
tools allow navigation and analysis of large amounts of data rather quickly. There
are online applications that allow us to share and visualize data.
Developing the know-how to use the available data more effectively, to understand
it, communicate and generate stories based on it, could be a huge opportunity to
breathe new life into journalism. Reporters can find new roles as ‘sense-makers’ by
digging deep into data, in turn making journalism more socially relevant. If done
well, delivering credible information and advice could even generate revenues,
opening up new business models beyond subscriptions and advertising.
In this context, the European Journalism Centre (EJC) in collaboration with the
University of Amsterdam organized the first roundtable on data-driven journalism.
The one day event gathered specialists in fields that intersect with data-driven
journalism: data mining, data visualization and multimedia storytelling to discuss
the possibilities of this emerging field, examine and understand key tools and
workflows, while sharing their budding expertise in data-driven journalism. What
can we learn from current projects? How can we integrate existing tools into
journalistic workflows? What skills are needed to enter this field?
There is a lot of confusion about this, with more questions than answers. To move
towards the latter, the roundtable presented speakers from a variety of backgrounds
including writing, information architecture, and visual design. The following
chapters include a range of perspectives from various nations and disciplines, all
with a shared interest in making better use of data.
The roundtable was planned and chaired by Mirko Lorenz, DDJ Project Leader for
the EJC and a freelancing information architect from Deutsche Welle, who works
on innovation projects. The event was organized by the EJC and partially funded by
the Dutch Ministry of Education, Culture and Science.
Websites
http://www.ejc.net
http://community.ejc.net/group/datadrivenjournalism
Mirko Lorenz: Status and Outlook for data-driven
journalism
Yet another buzzword is making waves: data-driven journalism or DDJ for short. It
is based on the hope that journalism will find new structure and meaning by
bringing data into the reporting workflow. It is debatable whether this is something
really new or just a new flavour - but there is something in the air that hints that
this new trend might be bigger and lasting for some time.
Interest in this field has grown only recently. The goal of the roundtable in
Amsterdam was therefore to debate the status and outlook for journalists. At this
stage, very basic questions have to be answered: are we all talking about the same
thing? Where do we agree and (probably just as important) where do we disagree?
Is the use of data an opportunity, and if that is the case, what are the barriers
keeping us from actually doing it?
To do that journalists will have to learn new tricks. They have to get used to working
with tools that will help them to make data flow. Quite a few commentators on
media and journalism think that this is a major obstacle because the average writer
is not good with numbers. Is this true?
What?
Data in the house
Journalists have always worked with data and technology. Facts are data in some
form or another, and facts are the basis for any story that can claim to be a
journalistic work. In the past heavy number crunching was done elsewhere, in big
rooms with specialists feeding machines. Journalists used the end result of this
process, ranging from statistics, studies or stock quotes to new findings in science.
Only very few journalists worked directly with the raw data at the start of this
process.
Sure, there were specialists practising ‘CAR’ (Computer Assisted Reporting). They
were trained and had technical skills. But CAR was and is primarily a technique, not
a process affecting the whole workflow of journalism in a fundamental way. This is
not about devaluing CAR. The use of computer searches in large databases remains
an extremely important skill for investigative journalism. But data-driven
journalism is rather different, making CAR one element in the chain of future
events.
Today data-driven journalism can be defined as a workflow, where data is the basis
for analysis, visualization and - most importantly - storytelling. With cloud
computing, powerful PCs on the desktop and high bandwidth, we all have the
potential to use technology to filter and process data that would have been a
sensation just 15, 10, maybe even 5 years ago.
It is still unclear, however, what platforms are needed to achieve that. Today’s
content management systems are page-orientated: they help you put together an
article. But they greatly ignore the data in the text. Facts, which have been collected
earlier, become a mess in this process. One other thing is business models: where
and how could journalists make a living based on data? What formats would create
new income streams, either from subscriptions, advertising or plain selling of
‘information nuggets’?
Brian Storm, a multimedia journalist who has influenced so many of us with his
superb storytelling on ‘Mediastorm’ boils it down to this: “Don’t be the noise in the
middle”. You can set out to create hilarious videos on YouTube with cats spinning
around at one end of the market. Or go in the other direction towards artful, truth-
orientated stories inquiring into the ‘state of the human condition’, as Storm
describes his goals.
Good writing is simply not enough; with millions of bloggers out there you will find
many voices from people who might not be journalists by trade or institution. But
people understand and relate to good information.
The core technology of journalism has long been the printing press. Until recently,
only media companies were able to print millions of copies overnight, distribute
them into the home and create awareness for issues (and advertising). The Internet
has ended this.
Understanding data might be an avenue to the future. But before starting this, we
should ask questions where this might bring us. Skipping the questions would be
like hopping into a Ferrari or a Porsche, thinking that driving this machine will be
no different from a normal commuter ride, given that there is a steering wheel, a
gas pedal and a brake. Beware: once you get up to speed the physics change
dramatically - so the first road bump or curve might result in disaster.
Not asking any questions about what data-driven journalism might become would
be a repetition of what people did in the ‘New Economy’, hopping into companies
that by and large collapsed once the fever was over.
How can we regain relevancy with journalistic content?
A good example of what to look comes from the ‘New York Times’: go to Google in
almost any country and punch in the search term ‘rent or buy’, chances are high
that on the first page and even in the first listing will be a link to an interactive
graphic from the New York Times. The little tool allows you to quickly fill in some
personal data such as price of house, interest rate, etc. and it will tell you after how
many years it is better to buy instead of rent.
This is an interesting tool. First, because it’s very usable. Second, because the New
York Times enters a new field here, using state of the art browser technologies to
help people make a decision on whether or not to buy. The reason why this is so
interesting is partly because buying (and keeping) a home is the biggest financial
deal most people will make in their lives. A journalistic publication by nature has
different interests when providing such a tool than, say a real estate agent. Instead
of trying to sell right away, the medium works as an intermediary, a consultant, an
advisor. Could we build on such examples and develop new platforms? If you think
real estate is essentially boring, look at Curbed. The blog is a big success, letting
people peek into flats, condos and houses that are on the market. Connect this with
a growing number of interactive tools that provide very clear answers based on a
constant inflow of data, and you see one of the perspectives data-driven journalism
might provide. Instead of commenting and reporting on the side, media brands
could be the first destination to inform yourself before signing a cheque for
anything.
There is a proverb in Spain: ‘before you jump high, make sure that you stand on
solid ground’. And this is why the questions asked might be puzzling to many, but
should be discussed in depth.
What is more telling is the sub line which reads: ‘Enron, intelligence and the perils
of too much information’. Starting with the trials against defendant Jeffrey Skilling
following the Enron collapse, he argues that there is a huge difference between
‘puzzles’ and ‘mysteries’.
“The distinction is not trivial,” says Gladwell. “If you consider September 11th to be
mainly a puzzle, for instance, then the logical response is to increase the collection
of intelligence, recruit more spies, add to the volume of information we have about
Al Qaeda. If you consider September 11th a mystery, though, you’d have to wonder
whether adding to the volume of information will only make things worse. You’d
want to improve the analysis within the intelligence community; you’d want more
thoughtful and sceptical people with the skills to look more closely at what we
already know.”1
A little further, the difference between a puzzle and a mystery gets clearer: “If things
go wrong with a puzzle, identifying the culprit is easy: it’s the person who withheld
information. Mysteries, though, are a lot murkier: sometimes the information we’ve
been given is inadequate, and sometimes we aren’t very smart about making sense
of what we’ve been given, and sometimes the question itself cannot be answered.
Puzzles come to satisfying conclusions. Mysteries often don’t.”
All this is important, as it defines how journalists should prepare for a data-driven
future. As Gladwell points out: “The principal elements of a puzzle...require the
application of energy and persistence...Mysteries demand experience and insight”.
This will be important when a current demand comes into reality: the opening of
large data vaults, from statistical offices and governments. Once all this information
is available (and presumably it will be) there will be a rapid shift from puzzles
(missing information) to mysteries (making sense of too much information).
Most people today are confronted with usually too much information. Whatever
product you look up, you can get it in many varieties. But is that after market, no-
name replacement battery for a MacBook really as reliable as the one offered by the
company? How can it be that the price is 50 per cent less? Often, people make
decisions based on simple recipes, often the low price leads them to make a quick
purchase. Only later do they find out that the bargain comes at a price.
1 Quote from: Malcolm Gladwell, “The Formula. Enron, intelligence, and the perils of too much information”,
The result is that - more often than not - people stick even more strongly to their
(wrongly-held) beliefs. So journalists should expect data-driven journalism to take a
long time to become an accepted element in the view of the public. Before
developing a sustainable model, there will be a build-up process in order to gain
trust.2
But some relief should come from the fact that making money by selling
information is neither new nor fading away. Just read a number of company stories
to learn more. Take, for example, Thomson Reuters. Why are they named Thomson
Reuters today? Arguably, because The Thomson Corporation, that started with a
single newspaper in Canada and grew into one of the largest media companies in
the world, divested early from newspaper holdings. Instead Thomson entered the
field of specialized information, starting in the late 70s. Most of the brands the
company owned were never of any interest to the public, but of high value to
professionals in need of reliable information. This is not so far from what data-
driven journalism promises.
2 Joe Keohane, ‘How facts backfire - Researches discover a surprising threat to democracy: our brains’,
There are quite a few more examples of stable and successful companies, ranging
from Bloomberg (which ‘packaged’ its offerings in the equivalent of a Porsche with
its Bloomberg terminals) to eMarketer (successfully selling marketing insights via
the web).
• Enable decisions
When we connect the dots from Gladwell’s findings to ‘how facts backfire’ the ability
to help people make a clear and easy decision might be another opportunity. Right
now, the process of searching and comparing anything (from bicycle tires to
insurance) has become an increasingly time-consuming and complex task. Many
service platforms offer to ‘compare insurances premiums’ and usually the first step
is harvesting the address of the prospective client. Media and journalists could enter
this market and simply be what everyone hopes they are: trustworthy.
About
Mirko Lorenz is a journalist/information architect based in Cologne, Germany. He
holds a Master in History and Economics from the University of Cologne. After
working for newspapers he founded an Internet strategy office in 1995, researching
or editing for clients like Sony, Handelsblatt and others. Since 2007 he is a member
of the innovation projects team at Deutsche Welle. Work assignments include
development of future systems in areas such as P2P, semantics and cloud
computing.
Session 1: Data Production, Usage and Integration
Before journalists can work with data to find meaning, they first need to
learn how to access, structure and filter information. The first session
discussed experiences so far and prospects for the near future.
Participants:
• Jonathan Gray, Community Coordinator, The Open Knowledge Foundation
• Lorenz Matzat, freelance journalist, Medienkombinat Berlin
• Richard Rogers, Chair in New Media & Digital Culture, University of
Amsterdam
• Simon Rogers, Editor, Guardian Datablog and Datastore
• Tony Hirst, Lecturer in the department of Communication and Systems, The
Open University
Jonathan Gray: Open Data and Data Driven
Journalism
Additional information:
Definitions
The Open Knowledge Definition (OKD) sets out principles to define ‘openness’ in
knowledge – that’s any kind of content or data ‘from sonnets to statistics, genes to
geodata’. The definition can be summed up in the statement: “A piece of
knowledge is open if you are free to use, reuse, and redistribute it —
subject only, at most, to the requirement to attribute and share-alike”.
Source: http://www.opendefinition.org/
They organize events like OKCon, run projects like Open Shakespeare, and develop
tools like CKAN and KnowledgeForge to help people create, find and share open
material. A full list of projects and events can be found on their homepage.
Contact:
[email protected]
http://twitter.com/jwyg
Lorenz Matzat: Weatherstations - Citizen-Apps,
eParticipation and Data journalism
Gov 2.0/eParticipation
• eConsultation
• Citizen Budgets (Bürgerhaushalte)
Examples:
Links:
Guardian Data Blog
http://www.guardian.co.uk/news/datablog
Guardian Data Store
http://www.guardian.co.uk/data-store
About:
Simon Rogers edits the Guardian Datablog and Datastore - and is a news editor for
the Guardian.
Tony Hirst: How to make the Data Flow
In Amsterdam he showed how by using an array of free tools data can be moved
from a spreadsheet inside the Wikipedia into a Google map mashup, e.g. displaying
the location of MPs in the UK.
The main message here: no (real) coding involved. Journalists and anybody else
interested in this simply have to know what they want to do. The tools to do that are
out there: Yahoo Pipes, Google Docs and Spreadsheets, Many Eyes, Fusion Tables.
And they are all free to use.
It would do Tony no justice to display just a shortened version of his talk here, so
please check out his presentation (and many others he gave on other occasions) on
Slideshare. His OUseful.blog comes up with other hints and tips almost every week.
Blog
http://blog.ouseful.info/
About
Tony Hirst is a lecturer at the Open University in the UK.
Session 2: Data Visualization
Stefan Fichtel: Data-driven visualization
But specialists working on the visualization must understand and work with the
data. They have to root their work deeply in data, not only in a journalist’s
instructions.
Examples of work:
http://www.kircher-burkhardt.com
About:
Stefan Fichtel is chief infografics designer at Kircher Burkhardt Consulting, Berlin.
Frank van Ham: How to use Data Visualization
ManyEyes
• ManyEyes has evolved into a platform where a wide variety of visualizations
can be produced, even with large data sets.
• Users can upload, visualize and publish their projects
Wordle
producing word clouds from text in a matter of seconds
http://www.wordle.net/
Tableau
Commercial visualization software, simple to use
http://www.tableausoftware.com/
ManyEyes
Fernanda Viégas and Martin Wattenberg, the two visualization creators of
ManyEyes, have recently been hired by Google.
http://manyeyes. alphaworks. ibm. com/ manyeyes/
Session 3: Storytelling with Data
Cynthia O'Murchu: Datastories
Oil and gas chief executives: are they worth it? (November 29, 2009)
http://www.ft.com/cms/s/0/190f9e7c-bd8d-11de-9f6a-00144feab49a.html
Currencies in context (October 12/2009)
http://www.ft.com/cms/s/0/9a2fdf12-b725-11de-96f2-00144feab49a.html?ftcamp=rss
About
Cynthia O’Murchu has worked in the media for over a decade on a range of projects
in print, documentary film, radio and interactive. Until recently she was Deputy
Interactive Editor at the Financial Times where she researched and produced
multimedia features and data visualisations.
In her current position as investigative reporter, also at the FT, she uses her data
analysis and computer-assisted reporting skills to produce stories across a variety of
beats, both financial and non-financial.
Alan McLean: The data is dead, long live the data!
What seems so obvious and simple is far from the norm. Actually people from
newsrooms and developers don’t mix, at least not on any chart. But this one move
can make the difference, leading to a better understanding on how to get from data
to story faster and better.
Based on examples from The New York Times, Alan showed how the integration of
data as a fundamental part of better and deeper stories is “almost to easy” when
considering that many of the constraints that such big stories face in a pure print
environment are actually going away: there is an infinite number of stories if the
data makes the stories flowing from it fit for publication.
This entry is intentionally kept very short, as the presentation of Alan is available on
Slideshare and tells the story better. We have added a link to another presentation
that shares some of the main points from Amsterdam, but goes even deeper into
how the technology has changed, providing new opportunities.
Link:
http://www.slideshare.net/amclean/data-driven-journalism-telling-stories-
online?from=ss_embed
Also by Alan McLean is a longer presentation: ‘Hacking the News’. It shares a few
slides with the one above but further towards the need for new technology in
newsrooms.
http://www.slideshare.net/amclean/hacking-news
Eric Ulken: Building a data desk
The longer version of this advice can be found at Online Journalism Review,
http://www.ojr.org/ojr/people/eulken/200811/1581/
This approach has been put to the test recently with the Afghanistan War logs.
Having a number of big media companies ahead of them, the team at OWNI made a
quick plan (‘look at the documents from a French angle’), gathered volunteers and
came up with an application very quickly.
This approach is clever and will hopefully find growing demand: instead of months
of development and complex IT-integration, the app-style development philosophy
is very close to how the Guardian solves the problem of suddenly having to deal
with a new data set, a new situation and the goal to find an elegant, quick and useful
view for that particular information.
http://owni.fr/2010/07/29/warlogs-european-collaborative-investigation-app/
About Nicolas Kayser-Bril
Nicolas is 24 years old and Head of Data at OWNI. In an interview with the Online
Journalism Blog, he described his goals for this new idea as follows: “We want to
enhance information with the power of computers and the web. Through software,
Nicolas already has an interesting track record: during school he and others
produced a story about violence in French schools, partly by counting the number
of security cameras installed. He spoke about ‘The World according to newspapers’,
the visualizations from which have been a favourite of many visualization blogs in
2008, when the study was published. Together with Giles Bruno he created an
interactive map, in which the relative size of the world’s regions change according to
the number of articles published about that region in a Western Country.
As for the data: the sets are usually published right away, only then will the
examination start. What started out as an experiment remains something like a
hobby. But the approach has clearly caused a stir. That can be told from quite a
number of positive remarks from other journalists and academics. As Gavin
describes on the site: “The Story is dedicated to sharing documents, combing and
combining data and promoting transparency in public life: an experiment in journalism
and crowdsourcing hoping to shed light on the government. If you’re spending the Irish
taxpayers’ money, you’re on the radar.”
This approach - getting things moving in reality, not planning for ages, is impressive.
There are many, many data sets collected with tax payers money around the world, some
are published, many are not. And even the data that you can get, is often presented in a
way that makes it extremely complicated to get a clue what is means.
The Story.ie is a new approach: first crash the gates, then start to ask questions. The
answers will be found over time, don’t you think?
Speakers:
• Stijn Debrouwere, freelance journalist and programmer
• Burt Herman, Founder, Hacks/Hackers
Andrew Lyons, Commercial Director, Ultra Knowledge
• Julian Burgess, Editorial Developer, The Times
Stijn Debrouwere: Baking a better cake
This problem - Content Management Systems that are not designed to structure
data - is usually beyond the control of a journalist. Still, without switching to better
systems, publishers are more or less powerless and in need of help from others,
such as Google and even Facebook to structure, filter and distribute stories in an
efficient way.
Stijn was invited to the conference based on his excellent text titles ‘We are in the
information business’ (see links below). He noted the main points of his talk in
Amsterdam in another text published on his website.
The article is part of a series covering ‘Information architecture for news websites’.
The other installations can be found here:
http://stdout.be/2010/information-architecture-for-news-websites/
About
Stijn Debrouwere is an information architect mainly working in media projects,
based in Belgium.
Burt Herman: Storify - making sense of the world
Founder, Hacks/Hackers
Links:
Hacks/Hackers
http://hackshackers.com/
This site aims to bring together journalists and technologists. The main idea is to
bring the journalism and technology communities together at casual face-to-face
gatherings to trade ideas and find potential collaborators. The founders of the site
and network are Aron Pilhofer of the New York Times and Rich Gordon from
Northwestern University’s Medill School of journalism and Burt Herman, Knight
journalism Fellow at Stanford University.
About
Burt Herman graduated from Stanford University and started his journalism career
in 1996, as a reporter for the Associated Press. Other positions for AP included
being the editor of the International Desk in New York, followed by postings to
Berlin, Moscow, and Uzbekistan. He has reported on events such as the Afghanistan
War and the Beslan school massacre, Russia, the U.S. invasion of Iraq and the
Republic of Georgia‘s ‘rose revolution’. From 2004 he was bureau chief in Korea. He
is a John S. Knight Fellow.
Julian Burgess: The Role of Analytics
The way this is done is usually through APIs (Application Programming Interfaces),
allowing the user to extract and look at data from big platforms such as Twitter.
In many newsrooms, however, the use of analytics is not a part of the reporting
process, as blogs are usually analyzed elsewhere in the organization. But the trend
in analytics is accelerating towards real time analytics (instead of being sent a chart
once a month or once a week).
Closing the gap on this is another avenue to be explored in the future. Journalists
should be aware when a story breaks, how readers/users react to stories published
and should be enabled to act on such information.
The possible uses of such data range from being a tool in reporting to optimizing
the pages of a news website, to make information easy to find.
About
Julian is a developer who became the first programmer to be hired specifically by
The Times’ editorial team. He works with Jonathan Richards on data and
visualisation projects.
Below is additional material on this particular issue that has been discussed widely
in the journalism space:
Online Metrics Report (September 2010) - The Journalism School, Columbia University
http://www.journalism.columbia.edu/cs/ContentServer?pagename=JRN/Render/DocURL&
binaryid=1212612808691
Future perspectives for DDJ
What came out of the roundtable was that there will be a wide range of activities to
be understood: from data scraping to programming new platforms to handle
content better, from filtering and visualizing to storytelling with HD cameras.
For the future we need journalists who are sympathetic to data and welcome every
developer or self-taught data-digger. Future reporters should be specialists in one of
the areas of application and have a good understanding of the problems and
opportunities of other crafts. But in general it will be better to have ‘real’, trained
programmers (who may also be journalists) and ‘real’, trained photographers,
cutters, producers, etc. If data-driven journalism will grow beyond today’s
beginnings, specialization will become more important.
What is there to learn? You may be tired of this much repeated question. But this is
so important to the future success of data-driven journalism that we ask it yet again.
For a start: we need really good examples of data-driven journalism, big stories like
the MP expenses scandal or the Afghanistan War Logs. We need them more often,
in evolving quality - because this will open up a market for data-driven journalists.
The one resource that is scarce today is an intangible one: trust. The Internet is a
dangerous place and so is the physical world. Just one example: how many billions
in euros, dollars, etc. are wrenched from consumers around the world by banks not
telling them about hidden fees and kick-backs for sales of mortgages, mutual funds?
How much money is lost because we as humans are not very good at calculating
compound interest (the interest that is adding up on a debt over time and that
defines the final sum you actually pay)? Differences of half a per cent on a mortgage
translate into thousands of euros or dollars, but most people don‘t know or care and
as a result this ignorance are exploited day by day, year by year.
If we further develop applications like the ‘rent or buy calculator’, make them more
versatile, clear and easy to use, my guess is that data-driven journalism can turn
into a new revenue stream. As journalists have a strong motivation to be
trustworthy, we go to lengths to make a story right and check the facts.
Let’s extend this thought of helping people to make decisions with data. Nobody
wants to pay for online news. But will the picture change if we offer tools and
databases to help people make clear and informed decisions? How to finance a
house and keep it? How to find the best deal for your next car? Yes, it will be
difficult to make people pay for such calculators. But how about offering to print out
their personal finance plan towards their dream house as a full colour A3 poster, so
that it can be posted on the wall? How about becoming the first place to visit?
There is still a lot of work to be done: we need to reduce the time needed to find an
answer. We should create destination points where Google can (and will) send
people to get unbiased, trustworthy advice. We could enhance the experience that
people are usually not paying for information (this feels like paying for talking to
someone), but that they do pay for ‘souvenirs’ as can be demonstrated on the app
markets, in the games business, and in other markets.
Future formats represent another avenue for data-driven journalism. Did you see
the little experiment on re-designing boarding passes? It is just one example of
what can be done. To provide a clear model: research and information collections
are today delivered in page formats, often as bulky PDFs. So go into any office and
you will see droves of people spending their day ‘re-formatting’ information that is
already there but does not look good: they copy sentences and pictures from PDFs
and transfer them into text processors and presentations charts. Day by day, hour
by hour.
What if future news organizations would better understand that the single items
they are producing as articles could actually be transferred into meaningful analysis
and would ideally be delivered in PowerPoint for large corporate clients? This might
create a specialized information market worth thousands of subscription fees, year
by year. There are examples of this, so this thought is not that new. Have a look at
companies like eMarketer, The Real Story (reporting on CMS/IT systems), Statista
and others. If you don‘t believe in anything proposed here, please read about the
transformation of The Thomson Corporation from one Canadian newspaper to a
media empire and then into a powerhouse for specialized information. It has been
done before!
The last word on this, is a quote from Adam Westbrook, which puts a good
perspective on what to expect and how to find new and better ideas to make this
happen:
We are looking for people to advance or challenge these ideas. Can you contribute
to that story?
How to start working with data: A brief checklist for data-driven
journalism
For anyone interested in entering this emerging field of journalism, here are five
easy steps that can help you get to grips with data, visualizing and telling stories
based on data.
1. Watch these two videos: Hans Rosling (2006) and David McCandless
(2010)
http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html
http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html
Websites
Data
Data.gov (US), http://www.data.gov/
Data.gov.uk (UK) http://data.gov.uk/
Worldbank, Data http://data.worldbank.org/
Scraperwiki, http://scraperwiki.com/
Open Knowledge Foundation, http://okfn.org/
Visualization
Visual Complexity, http://www.visualcomplexity.com/vc/
Flowing Data, http://flowingdata.com/
Well-formed Data, http://well-formed-data.net/
Information Aesthetics, http://infosthetics.com/
Good Magazine, http://www.good.is/
University of Amsterdam, http://www.digitalmethods.net/
Simple Complexity, http://simplecomplexity.net/
A Beautiful WWW, http://abeautifulwww.com/
Infografistas, http://infografistas.blogspot.com/
Visual Editors, http://www.coolinfographics.com/
Cool Infographics, http://www.coolinfographics.com/
Datenjournalist. German blog on ddj. http://www.datenjournalist.de/
Data Tools
Document Cloud, http://www.documentcloud.org/home
Google Docs and Spreadsheets (find on the web)
Google Fusion, - with access to many open data sets,
http://tables.googlelabs.com/Home
Google Code Playground: Helps to explore Google data and tools.
http://code.google.com/apis/ajax/playground/
Zemanta, http://www.zemanta.com/
Open Calais, http://www.opencalais.com/
ScraperWiki, http://scraperwiki.com/
API Playground - helping journalists to understand API Data.
http://apiplayground.org/
Data Converter (Shan Carter): Simple Tool, transfers CSV to web-friendly formats,
includes JSAON and XML.,
http://www.shancarter.com/data_converter/index.html
Gapminder Desktop, http://www.gapminder.org/desktop/
Yahoo Pipes, http://pipes.yahoo.com/pipes/
Tableau, http://www.tableausoftware.com/public/
ManyEyes, http://manyeyes.alphaworks.ibm.com/manyeyes/
Open Knowledge Foundation, CKAN, http://www.ckan.net/
Python/Django http://www.djangoproject.com/
Multimedia/Storytelling
DSLR Newshooter: Examples and information of storytelling, HD video technology,
tools and gear. http://www.dslrnewsshooter.com/
Multimediashooter: Collects videos, often points to interesting new productions.
http://www.multimediashooter.com/
MediaStorm. The place to see some prime examples of storytelling. Starting tips:
Sandwich Generation, The Marlboro Marine, Take Care.
http://www.mediastorm.com/
Advancing the story, http://www.advancingthestory.com/
Finding the Frame: Interesting site where you can hand in multimedia pieces for
review, http://www.findingtheframe.com/
Spill the Beans, German platform, presenting examples of multimedia storytelling.
http://www.spillthebeans.de/
Deutsche Welle Lab: German Blog, edited by Steffen Leidel and Marcus Bösch.
Provides regular insights into online journalism. http://training.dw-
world.de/ausbildung/blogs/lab/
Innovative Interactivity. The site, created and edited by Tracy Boyer, is a ‘watering
hole for multimedia enthusiasts’, http://www.innovativeinteractivity.com/
Data People (2010)
Our intention was to create a short, conceivable list. It is slightly unfair to single out
just a few people, as most of them work in teams. This is not meant to exclude
anybody, but driven by the intention to filter down to a reduced list of innovative
people working on new ideas for data-driven journalism, better information
filtering and storytelling. If you think somebody should clearly be on this list, again
send us an e-mail.
Florence Nightingale: Polar chart. Perfect example how data, visualization can
uncover the truth and make change happen.
http://understandinguncertainty.org/coxcombs
BBC: Super Power - Visualising the Internet - based on Top 100 websites, treemap
with several layers, http://news.bbc.co.uk/2/hi/8562801.stm
Mediastorm: Never coming home, turns the data points of casualties in the US
Army into personal stories, by Zac Barr, Andrew Lichtenstein and Tim Klimowicz,
http://www.mediastorm.com/publication/never-coming-home
The New York Times: Afghanistan War Logs. Structured and extended overview on
a complex issue, http://www.nytimes.com/interactive/world/war-logs.html
New York Times: Rent or buy calculator. Interactive service, helping people to make
a decision on whether it is better to rent or to buy.
http://www.nytimes.com/interactive/business/buy-rent-calculator.html
Where does my money go? Great visualization of the UK budget. They got calls from
other government departments who did not have that kind of overview before and
therefore wanted a poster. http://www.wheredoesmymoneygo.org/dashboard/
Gapminder Website: Changing the view on complex issues with a very smart
platform. Be sure to watch Hans Rosling’s presentations on Ted.com, if you haven‘t
so far, http://www.gapminder.org/
CityCrawlers Berlin. How far from here to there? And others questions answered
based on city data and attractive visualization. http://citycrawlers.eu/berlin/
New York Times: Obama’s 2011 Budget Proposal
http://www.nytimes.com/interactive/2010/02/01/us/budget.html
Eric Ulken: Building the data-desk: lessons from the L.A. Times, The Online
Journalism Review, Nov. 21, 2008,
http://www.ojr.org/ojr/people/eulken/200811/1581/
Rich Gordon, What Will Journalist-Programmers Do?, MediaShift Idealab, Nov. 18,
2007, http://www.pbs.org/idealab/2007/11/what-will-journalist--programmers-
do005.html
Rich Gordon: Data as journalism, journalism as data, Readership Institute, Nov. 14,
2007, http://getsmart.readership.org/2007/11/data-as-journalism-journalism-as-
data.html
Data Books
Nick Davis, ‘Flat Earth News’, 2008. Criticism of ‘churnalism’ in newspapers, some
very revealing back-stories how the truth is sometimes distorted by journalists. The
most revealing story might be the crusade against heroin, which may have
unintentionally helped to create a drug market. If the journalists had known the
data, would they have argued differently?
Dan Roam, ‘The back of the Napkin’. Helps you to develop new skills in visualizing
(by hand), but can be a good creative source for the future.
Ian Ayres, ‘Super Crunchers, Why thinking by numbers is the new way to be smart’,
2007 - sums up and describes techniques for data mining and introduces examples
of how big data can help to make predictions for the future.
Malcolm Gladwell, ‘Outliers. The Story of Success’. Essentially most of these stories
are data-driven stories, the findings are sometimes amazing and fun to read. E.g.
the 10,000 hour rule that may help to understand the successful careers of
programmers and musicians such as the Beatles.
Data Companies
This is not meant as promotion. Instead the idea is to guide the way to outfits and
start-ups that work with data, will be able to help you if you are looking for someone
to help with a project or for a job in this field.
Mailing List
Additionally you can subscribe to the data-driven journalism mailing list, which is
run by the Open Data Network (English): http://wiki.opendata-network.org/DDJ-
Mailinglist. The EJC will also use this list for future announcements.
Twitter #ddj
Acknowledgements
The first roundtable on data-driven journalism was made possible by the European
Journalism Centre (EJC), a non-profit organization made up of ‘journalists working
for journalists’.
Anna Lena Schiller, a graphic artist based in Berlin, did what should be one goal of
data-driven journalism in the future: boiling down each presentation into a single
picture, and so filtering out the really relevant from an overload of information.
Her hand drawings from the event are part of each chapter on the following pages.
The visuals are available for download via Flickr:
http://wiki.opendata-network.org/Data_Driven_Journalism
Contributors
Mirko Lorenz, DDJ project lead for EJC/Information Architect Deutsche Welle
Texts, excerpts from talks and links
Liliana Bounegru
Organization of roundtable for EJC
Wilfried Rütten
Director EJC
Imprint
The material collected is published by the European Journalism Centre. You are
free to distribute and share this material, but kindly asked to give credit when used
elsewhere.