Guide to Automated Journalism
Guide to Automated Journalism
uk
Provided by Columbia University Academic Commons
3
Guide to
Automated
Journalism
Andreas Graefe
Tow Center for Digital Journalism Funded by the Tow Foundation and the
A Tow/Knight Guide James S. and John L. Knight Foundation
5
Acknowledgments
This work was funded by the Tow Foundation and the John S. and James
L. Knight Foundation. Thanks to interviewees Saim Alkan, Reginald Chua,
Lou Ferrara, Tom Kent, and James Kotecki. Thanks also to Peter Brown,
Arjen van Dalen, Nick Diakopoulos, Konstantin Dörr, Mario Haim, Noam
Lemelshtrich Latar, and Claire Wardle for providing comments and sugges-
tions.
January 2016
Contents
Executive Summary 9
Status Quo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Key Questions and Implications . . . . . . . . . . . . . . . . . . . . 11
Introduction 13
Further Reading 53
Citations 57
Executive
Summary
In recent years, the use of algorithms to automatically generate news
from structured data has shaken up the journalism industry—most es-
pecially since the Associated Press, one of the world’s largest and most
well-established news organizations, has started to automate the produc-
tion of its quarterly corporate earnings reports. Once developed, not only
can algorithms create thousands of news stories for a particular topic, they
also do it more quickly, cheaply, and potentially with fewer errors than any
human journalist. Unsurprisingly, then, this development has fueled jour-
nalists’ fears that automated content production will eventually eliminate
newsroom jobs, while at the same time scholars and practitioners see the
technology’s potential to improve news quality. This guide summarizes re-
cent research on the topic and thereby provides an overview of the current
state of automated journalism, discusses key questions and potential impli-
cations of its adoption, and suggests avenues for future research. Some of
the key points can be summarized as follows.
Status Quo
Market phase
Potential
• Algorithms are able to generate news faster, at a larger scale, and poten-
tially with fewer errors than human journalists.
• Algorithms can use the same data to tell stories in multiple languages
and from different angles, thus personalizing them to an individual
reader’s preferences.
• Algorithms have the potential to generate news on demand by creating
stories in response to users’ questions about the data.
Limitations
For society
• Automated journalism will substantially increase the amount of available
news, which will further increase people’s burden to find content that is
most relevant to them.
• An increase in automated—and, in particular, personalized—news is
likely to reemphasize concerns about potential fragmentation of public
opinion.
• Little is known about potential implications for democracy if algorithms
are to take over part of journalism’s role as a watchdog for government.
by software or, more precisely, an algorithm. Granted, the piece may sound
a bit technical and boring, but it provides all the facts a journalist is likely
to cover and in which an investor is likely to be interested.
This technological innovation, known as automated journalism, is a
relatively new phenomenon in the area of computational journalism. Au-
tomated journalism refers to the process of using software or algorithms to
automatically generate news stories without human intervention—after the
initial programming of the algorithm, of course. Thus, once the algorithm
is developed, it allows for automating each step of the news production
process, from the collection and analysis of data, to the actual creation and
publication of news. Automated journalism—also referred to as algorith-
mic1 or, somewhat misleadingly, robot journalism2 —works for fact-based
stories for which clean, structured, and reliable data are available. In such
situations, algorithms can create content on a large scale, personalizing it
to the needs of an individual reader, quicker, cheaper, and potentially with
fewer errors than any human journalist.
While computation has long assisted journalists in different phases of the
news production process—as in the collection, organization, and analysis of
data, as well as the communication and dissemination of news—journalists
have remained the authority for actually creating the news. This division of
labor is changing, which, not surprisingly, has shaken up journalism in re-
cent years. The World Editors Forum listed automated journalism as a top
2015 newsroom trend,3 and both researchers and practitioners are debating
the implications of this development.4 For example, while some observers
see potential for automating routine tasks to increase news quality, journal-
ists’ fears that the technology will eventually eliminate newsroom jobs often
dominates the public debate.5
In any case, opinions run strong on the use of automated journalism,
which is why the technology has attracted so much attention. Popular
media coverage includes NPR’s Planet Money podcast, which had one
of its most experienced reporters compete with an algorithm to write a
news story,6 and The New York Times’s quiz that allows readers to guess
whether a human or an algorithm wrote a particular story.7 Even The
Daily Show’s humorous coverage of the topic sheds light on potentials and
concerns of increased usage.8
events in the data. Those may include unusual events, a player’s extraor-
dinary performance, or the decisive moment for the outcome of a game.
Third, the software classifies and prioritizes the identified insights by im-
portance and, fourth, arranges the newsworthy elements by following pre-
defined rules to generate a narrative. Finally, the story can be uploaded to
the publisher’s content management system, which could publish it auto-
matically.
During this process, the software relies on a set of predefined rules that
are specific to the problem at hand and which are usually derived from
collaboration between engineers, journalists, and computer linguists. For
example, within the domain of baseball, the software has to know that
the team with the most runs—but not necessarily the most hits—wins the
game. Furthermore, domain experts are necessary to define criteria of news-
worthiness, according to which the algorithm looks for interesting events
and ranks them by importance. Finally, computer linguists use sample
texts to identify the underlying, semantic logic and translate them into
a rule-based system that is capable of constructing sentences. If no such
sample texts are available, trained journalists pre-write text modules and
sample stories with the appropriate frames and language and adjust them
to the official style guide of the publishing outlet.
for more important work (see Chapter 3): “The more routine tasks can
be handled by a computer, thereby freeing the meteorologist for the more
challenging roles of meteorological consultant and specialist on high-impact
weather situations.”12
Another domain in which organizations have long used automation is
financial news, where the speed in which information can be provided is the
key value proposition. For example, companies such as Thomson Reuters
and Bloomberg extract key figures from press releases and insert them
into pre-written templates to automatically create news alerts for their
clients. In this business, automation is not about freeing up time. It is a
necessity. Reginald Chua, executive editor for editorial operations, data,
and innovation at Thomson Reuters, told me: “You can’t compete if you
don’t automate.”
In more recent years, automated journalism also found its way into news-
rooms to address other types of problems, often in the form of custom-
made, in-house solutions. A prominent example is the work at the Los
Angeles Times on automating homicide and earthquake reporting de-
scribed in case studies 1 and 2. When asked to describe the algorithms,
Ken Schwencke, who developed them (and now works for The New York
Times), noted that the underlying code is “embarrassingly simple,” as it
merely extracts numbers from a database and composes basic news sto-
ries from pre-written text modules.13 Despite—or perhaps because of—its
simplicity, Schwencke’s work marks an important step in the era of auto-
mated journalism, demonstrating how simple in-house solutions can help to
increase both the speed and breadth of news coverage.
Many newsrooms, however, lack the necessary resources and skills to de-
velop automated journalism solutions in-house. Media organizations have
thus started to collaborate with companies that specialize in developing
natural language generation technology to automatically generate stories
from data for a variety of domains. In 2012, for example, Forbes.com an-
nounced its use of Narrative Science’s Quill platform to automatically cre-
ate company earnings previews.14 A year later, ProPublica used the same
technology to automatically generate descriptions for each of the more than
52,000 schools for its Opportunity Gap news application.15 In 2014, auto-
mated journalism made its way into the public’s focus when the Associated
Press, one of the world’s major news organizations, began automating its
quarterly company earnings reports using Automated Insights’ Wordsmith
platform. As described in Case Study 3, the project was a success and, as a
result, the AP recently announced the expansion of its automated coverage
to sports.16
Mary Lynn Young and Alfred Hermida describe the evolution of the
the Los Angeles Times’s Homicide Report as an early example of au-
tomated journalism.17 Before the project’s launch in January of 2007,
the Times’s print edition covered only about ten percent of the nearly
1,000 annual homicides in L.A. County. Thereby, the coverage typi-
cally focused on the most newsworthy cases, which were often the most
sensational ones and therefore did not provide a representative picture
of what was really happening. The goal of the Homicide Report was
to address this bias in the media coverage by providing comprehensive
coverage of all annual homicides. The project originally started as a
blog that posted basic information about each homicide, such as the
victim’s race and gender or where the body was found. A few months
later, an interactive map was added to visualize the information. Soon,
however, it became clear that the project was too ambitious. Due to
limited newsroom resources, as well as technical and data issues, it was
impossible to report every homicide. The project was put on hold in
November 2008. When the Homicide Report was relaunched in January
2010, it relied on structured data from the L.A. County coroner’s office,
which includes information such as the date, location, time, race or eth-
nicity, age, jurisdiction, and neighborhood of all homicides in the area.
The revised Homicide Report used these data to automatically produce
short news snippets and publish them on the blog. While these news
reports were simple, providing only the most rudimentary informa-
tion, they accomplished the project’s original goal to cover every single
homicide and were able to do so in a quick and efficient manner. As
noted by Ken Schwencke, who wrote the code for automatically gener-
Potentials
In automating traditional journalistic tasks, such as data collection and
analysis, as well as the actual writing and publication of news stories, there
are two obvious economic benefits: increasing the speed and scale of news
coverage. Advocates further argue that automated journalism could poten-
tially improve the accuracy and objectivity of news coverage. Finally, the
future of automated journalism will potentially allow for producing news
on demand and writing stories geared toward the needs of the individual
reader.
Speed
Automation allows for producing news in nearly real time, or at the ear-
liest point that the underlying data are available. For example, the AP’s
quarterly earnings report on Apple (see Chapter 1) was published only min-
utes after the company released its figures. Another example is Los Angeles
Times’s Quakebot, which first broke the news about an earthquake in the
Los Angeles area in 2014 (see Case Study 2).
Scale
Automation allows for expanding the quantity of news by producing stories
that were previously not covered due to limited resources. For example,
both the Los Angeles Times (for homicide reports; case study 1) and the
Associated Press (for company earnings reports; case study 3) reported that
automation increased the amount of published stories by more then ten
times. Similarly, while human journalists have traditionally only covered
earthquakes that exceeded a certain magnitude or left significant damage,
Quakebot provides comprehensive coverage of all earthquakes detected by
seismographic sensors in Southern California (case study 2). While any one
of these articles may attract only a few hits in targeting a small audience,
total traffic increases with positive effects on advertising revenues.
Accuracy
Algorithms do not get tired or distracted, and—assuming that they are pro-
gramed correctly and the underlying data are accurate—they do not make
simple mistakes like misspellings, calculation errors, or overlooking facts.
Advocates thus argue that algorithms are less error-prone than human
journalists. For example, Lou Ferrara, former vice president and managing
editor for entertainment, sports, and interactive media at the Associated
Press, reports that automation has decreased the rate of errors in AP’s
company earning reports from about seven percent to only about one per-
cent, mostly by eliminating typos or transposed digits. “The automated
reports almost never have grammatical or misspelling errors,” he told me,
“and the errors that do remain are due to mistakes in the source data.”
Yet, Googling “generated by automated insights correction” lists thou-
when deciding whether or not to publish the news. Or, even better,
Quakebot could be updated so that its algorithm accounts for this in-
formation and automatically publishes a story if the number of tweets
in a respective area is above a certain threshold.
Objectivity
Algorithms strictly follow predefined rules for analyzing data and convert-
ing the results into written stories. Advocates argue that automated news
provides an unbiased account of facts. This argument of course assumes
that the underlying data are correct and the algorithms are programmed
without bias, a view that, as discussed in the next chapter, is false or too
optimistic at best.26 That said, experimental evidence available to date sug-
gests that readers perceive automated news as more credible than human-
written news (see Textbox I).
Personalization
Automation allows for providing relevant information for very small audi-
ences and in multiple languages. In the most extreme case, automation can
even create news for an audience of one. For instance, Automated Insights
generates personalized match day reports (a total of more than three hun-
dred million in 2014) for each player of Yahoo Fantasy Football, a popular
online game in which people can create teams of football players and com-
pete against each other in virtual leagues. Similarly, one of Narrative Sci-
ence’s core businesses is to automatically generate financial market reports
for individual customers. It is easy to imagine similar applications for other
areas. For example, algorithms could create recaps of a sports event that
focus on the performance of a particular player that interests the reader
most (e.g., grandparents interested in the performance of their grandchild).
Furthermore, as shown with Automated Insights’ Fantasy Football match
day reports, the algorithms could even tell the same story in a different
tone depending on the reader’s needs. For example, the recap of a sporting
event could be written in an enthusiastic tone for supporters of the winning
team and in a sympathetic tone for supporters of the losing one.
News on demand
The ability to personalize stories and analyze data from different angles
also provides opportunities for generating news on demand. For example,
algorithms could generate stories that answer specific questions by com-
paring the historical performance of different baseball players. Algorithms
could also answer what-if scenarios, such as how well a portfolio would have
performed if a trader had bought stock X as compared to stock Y. While
algorithms for generating news on demand are currently not yet available,
they will likely be the future of automated journalism. In October 2015,
Automated Insights announced a new beta version of its Wordsmith plat-
form, which enables users to upload their own data, pre-write article tem-
plates, and automatically create narratives from the data.27 The German
company AX Semantics provides a similar functionality with its ATML3
programming language.
Limitations
Algorithms for generating automated news follow a set of predefined rules
and thus cannot innovate. Therefore, their application is limited to pro-
viding answers to clearly defined problems for which data are available.
Furthermore, at least at the current stage, the quality of writing is limited.
was more complicated than expected due to issues with the underlying
data. Since the data are often entered by coaches and do not undergo strict
verification procedures, they can be messy and contain errors.
Validation
Algorithms can add value by generating insights from data analysis. In ap-
plying statistical methods to identify outliers or correlations between mul-
tiple variables, algorithms could find interesting events and relationships,
which in turn could lead to new stories. However, algorithms that analyze
correlations cannot establish causality or add meaning. That is, while al-
gorithms can provide accounts of what is happening, they cannot explain
why things are happening.28 As a result, findings derived from statistical
analysis—regardless of their statistical significance—can be completely
meaningless (see www.tylervigan.com for examples of statistically signifi-
cant but completely spurious correlations). Humans still need to validate
the findings by applying logic and reasoning.29
Ingenuity
Once the findings have been validated, algorithms can contribute knowl-
edge. Yet, this contribution is limited to providing answers to prewritten
questions by analyzing given data. Algorithms cannot use the knowledge to
ask new questions, detect needs, recognize threats, solve problems, or pro-
vide opinions and interpretation on, for example, matters regarding social
and policy change. In other words, algorithms lack ingenuity and cannot in-
novate. As a result, automated journalism is limited in its ability to observe
society and fulfill journalistic tasks, such as orientation and public opinion
formation.30
Writing quality
Another often mentioned limitation of automated news is the quality of the
writing. Current algorithms are limited in understanding and producing nu-
ances of human language, like humor, sarcasm, and metaphors. Automated
news can sound technical and boring, and experimental evidence shows that
people prefer reading human-written to automated news (see Textbox I).
The AP was not the first major news organization to use natural
language generation for writing company earnings stories. Since 2012,
http://www.forbes.com/ has been cooperating with Narrative Science
to automatically create company earnings previews. The goal of this
project was to provide cost-effective, broad, and deep market coverage
for its readers. Similar to the experience at the AP, Forbes’s automat-
ing has allowed for generating more stories while freeing up resources.
As a result of the additional coverage, Forbes’s audience has broadened,
and site traffic and advertising revenues have increased.35
Relevance
The number of media organizations that automated journalism providers
currently report as customers is small. Few providers offer actual journal-
istic products, and most products available to date are limited to routine
topics, such as sports and finance, for which reliable and structured data
are available. Automated journalism is thus still in an experimental or, at
best, early-market expansion phase.36
This may change quickly, however. Apart from ongoing advances in
computing power, big data analytics, and natural language generation
technology, the most important driver of automated journalism is the ever-
increasing availability of structured and machine-readable data provided by
organizations, sensors, or the general public. First, in an attempt to make
government more transparent and accountable, many countries are launch-
ing open data initiatives to make data publicly available. Second, our world
is increasingly equipped with sensors that automatically generate and col-
lect data. Currently, sensors constantly track changes in an environment’s
temperature, seismological activity, or air pollution. Sensors are also in-
creasingly used to provide fine-grained data on real world events. The NFL
now uses sensors to track each player’s field position, speed, distance trav-
eled, acceleration, and even the direction he is facing—which provides many
new opportunities for data-driven reporting. Third, users are generating
an increasing amount of data on social networks or among parents at local
youth sporting events.
Furthermore, automated journalism fits into the broader trend within
For Journalists
Since automated journalism is often perceived as a threat to the livelihood
of classic journalism, it is not surprising that it has attracted a lot of atten-
tion from journalists. In particular, journalists have focused on the question
of how the technology will alter their own roles and required skillsets. Two
studies analyzing the content of news articles and blog posts about auto-
mated journalism provide insight into journalists’ expectations. The first
study analyzed sixty-eight articles published in 2010, which covered Stat-
sheet (the predecessor of Automated Insights), a service that automatically
created match reports and previews of all three hundred forty-five NCAA
Division 1 college basketball teams.41 The second study analyzed sixty-
three articles that reported on Narrative Science’s technology and discussed
its impact on journalism.42 The articles were published from 2010 to early
2014 and thus cover a longer and more recent period of journalists’ expo-
sure to automated news.
Both studies found that journalists expected automation to change the
way they work, although the extent to which automation technology will
replace or complement human journalists will depend on the task and the
skills of the journalist. For routine and repetitive tasks, such as sports re-
caps or company earnings reports—merely a conversion of raw data into
standard writing—there was a consensus among journalists that they will
34 Automated Journalism
not be able to compete with the speed and scale of automated content.
Their reaction to this development usually fit either an optimistic or pes-
simistic frame.
According to the optimistic “machine liberates man” frame, the ability
to automate routine tasks may offer opportunities to improve journalistic
quality. The argument is that automation frees up journalists from rou-
tine tasks and thus allows them to spend more time on providing in-depth
analysis, commentary, and investigative work, which are in turn skills that
will become more important. This appears to be the case at the Associated
Press, which reports that the resources freed up as a result of automation
have been used to improve reporting in other areas (see Case Study 3).
According to the pessimistic “machine versus man” frame, automated
journalism competes with human journalists. That is, automated journal-
ism is portrayed as yet another way to cut costs and replace those journal-
ists who merely cover routine tasks with software. Indeed, if an increasing
share of news will eventually be automated, the logical consequence is that
journalists who used to cover such content will need to either produce a
better product or focus on tasks and skills for which humans outperform
Transparency
For critical and controversial topics, as in automated stories that use
polling data to write about a candidate’s chance of winning an election,
reflect what audiences actually think. In fact, there may not even be a de-
mand for algorithmic transparency on the user side, as probably only few
people are even aware of the major role that algorithms play in journalism.
This, of course, may change quickly once automated news becomes more
widespread, and especially when errors occur. For example, imagine a situ-
ation in which an algorithm generates a large number of erroneous stories,
either due to a programming error or because it was hacked. Such an event
would immediately lead to calls for algorithmic transparency.
In his summary of the workshop results, Nicholas Diakopoulos points
to two areas that would be most fruitful for future research on algorith-
mic transparency.54 First, we need to better understand users’ demands
around algorithmic transparency, as well as how the disclosed informa-
tion could be used in the public interest. Second, we need to find ways for
how to best disclose information without disturbing the user experience, in
particular, for those who are not interested in such information. The New
York Times offers an example for how to achieve the latter in its “Best and
Worst Places to Grow Up,” which provides automated stories about how
children’s economic future is affected by where they are raised.55 When
users click on a different county, the parts of the story that change are
highlighted for a short period of time.
Source data
News organizations need to ensure that, first, they have the legal right to
modify and publish the source data and, second, the data are accurate.
Data provided by governments and companies are probably more reliable
and less error-prone than user-generated data like scores from local youth
sporting entered into a database by coaches or the players’ parents. That
said, as demonstrated in the case of earthquake reporting (see Case Study
2), even government data may contain errors or false information. Data
problems may also arise if the structure of the source data changes, a com-
mon problem for data scraped from websites. Thus, news organizations
need to implement data management and verification procedures, which
could be either performed automatically or by a human editor.
Data processing
If the underlying data or the algorithms that process them contain errors,
automation may quickly generate large numbers of erroneous stories, which
could have disastrous consequences for a publisher’s reputation. News or-
ganizations therefore need to engage in thorough testing before initial pub-
lication of automated news. When publication starts, Kent recommends
having human editors check each story before it goes live, although, as
demonstrated by the Quakebot (Case Study 2), this so-called “hand break”
solution is not error-free either. Once the error rate is down to an accept-
able level, the publication process can be fully automated, with occasional
spot checks. The latter is the approach the AP currently uses for its com-
pany earnings reports.
Output
Regarding the final output, Kent recommends that the writing match the
official style guide of the publishing organization and be capable of us-
ing varied phrasing for different stories. Furthermore, news organizations
should be aware of legal and ethical issues that may arise when the text is
automatically enhanced with videos or images without proper checking. For
such content, publishing rights may not be available or the content may vi-
olate standards of taste. News organizations must also provide a minimum
level of transparency by disclosing that the story was generated automati-
cally, for example, by adding information about the source of the data and
how the content was generated. The AP adds the following information at
the end of its fully automated company earnings reports:
This story was generated by Automated Insights () using data from
Zacks Investment Research. Access a Zacks stock report on ACN at .
Of course, news consumers may be unfamiliar with these companies and
their technologies, and therefore unaware that the content is provided by
an algorithm. It remains unclear whether readers actually understand the
meaning of such bylines. Further research on how they are perceived would
be useful. Also, since more and more stories are the result of collaboration
between algorithms and humans, the question arises of how to properly
disclose when certain parts of a story were automated. The AP currently
deals with such cases by modifying the first sentence in the above state-
ment to “Elements of this story were generated by Automated Insights.”58
That said, Kent noted that the discussion about how to properly byline
automated news may be a temporary one. Once automated news becomes
standard practice, some publishers may choose not to reveal which parts of
a story were automatically generated.
Accountability
Automation advocates argue that algorithms allow for an unbiased account
of facts. This view, however, assumes that the underlying data are com-
plete and correct and, more importantly, the algorithms are programmed
correctly and without bias. Like any other model, algorithms for generating
automated news rely on data and assumptions, both of which are subject to
biases and errors.59 First, the underlying data may be wrong, biased, or in-
complete. Second, the assumptions built into the algorithms may be wrong
or reflect the (conscious or unconscious) biases of those who developed or
commissioned them. As a result, algorithms could produce outcomes that
were unexpected and unintended, and the resulting stories could contain
information that is inaccurate or simply false.60
In such situations, it is not enough to state that an article was generated
by software, in particular when covering critical or controversial topics for
which readers’ requirements of transparency and accountability may be
higher. When errors occur, news organizations may come under pressure
to publish the source code behind the automation. At the very least, they
should be able to explain how a story was generated, rather than simply
stating that “the computer did it.”61 From a legal standpoint, algorithms
cannot be held accountable for errors. The liability is with a natural per-
son, which could be the publisher or the person who made a mistake when
feeding the algorithm with data.62
While providers of automated news could—and in some cases proba-
bly should—be transparent about many details of their algorithms, there
was consensus among experts at the Tow workshop on algorithmic trans-
parency that most organizations are unlikely to voluntarily provide full
transparency, especially without a clear value proposition. However, if news
organizations and software developers do not fully disclose their algorithms,
it remains unclear how to evaluate the quality of the algorithms and the
content produced, in particular, its sensitivity to changes in the underly-
ing data. A promising yet complex approach might be reverse engineering,
which aims at decoding an algorithm’s set of rules by varying certain input
parameters and assessing the effects on the outcome.63 Another important
question for future research is whether, and if so to what extent, users of
automated content ultimately care about transparency, in which case the
provision of such information could be a competitive advantage by increas-
ing a publisher’s credibility and legitimacy.64
For Society
Due to its ability to create content quickly, cheaply, at large scale, and
potentially personalized to the needs of individual readers, automated jour-
nalism is expected to substantially increase the amount of available news.
While this development might be helpful in meeting people’s demand for
information, it could also further increase people’s burden to find content
that is most relevant to them. To cope with the resulting information over-
load, the importance of search engines and personalized news aggregators,
such as Google News, are likely to increase further.
Search engine providers claim to analyze individual user data (e.g., lo-
cation and historical search behavior) to provide news consumers with the
content that most interests them. In doing so, different news consumers
might receive different results for the same keyword searches, which would
bear the risk of partial information blindness, the so-called “filter bubble”
hypothesis.65 According to this idea, personalization will lead individuals
to consume more and more of the same information, as algorithms provide
only content that users like to read or agree with. Consequently, people
would be less likely to encounter information that challenges their views
or contradicts their interests, which could carry risks for the formation of
public opinion in a democratic society.
The filter bubble hypothesis has become widely popular among aca-
demics, as well as the general public. Eli Pariser’s 2011 book, The Filter
Bubble: How the New Personalized Web Is Changing What We Read and
How We Think,66 has not only become a New York Times bestseller but
has attracted more than 1,000 citations on Google Scholar through October
2015. However, despite the theory’s popularity and appeal, empirical evi-
dence available to date does not support the existence of the filter bubble:
Most studies find either no, or only very small, effects of personalization
on search results.67 Of course, this may change as the amount of available
content—and thus the need for personalization—increases and algorithms
for personalizing content continue to improve. The study of potential effects
from personalization, whether positive or negative, remains an important
area of research.
More generally, a further increase and more sophisticated use of au-
mated news evolves over time. In particular, it might consider how people’s
expectations toward and perceptions of such content change—especially for
controversial and critical topics, such as election campaign coverage, which
are not merely fact-based and involve uncertainty.
15. Scott Klein, “How To Edit 52,000 Stories at Once,” ProPublica, 2013, https://
www.propublica.org/nerds/item/how-to-edit-52000-stories-at-once.
16. “AP, NCAA to Grow College Sports Coverage With Automated Game Stories.”
17. Hermida and Young, “From Mr. and Mrs. Outlier to Central Tendencies.”
18. Ibid.
19. Ibid.
20. Diakolpoulos, “Towards a Standard for Algorithmic Transparency in the Media.”
21. “Netflix Misses Street 2Q Forecasts,” Associated Press, 15 July 2015, http://
finance.yahoo.com/news/netflix-misses-street-2q-forecasts-202216117.html.
22. Celeste Lecompte, “Automation in the Newsroom,” Nieman Foundation, 1 Septem-
ber 2015, http://niemanreports.org/articles/automation-in-the-newsroom/.
23. Joanna Plucinska, “How an Algorithm Helped the LAT Scoop Monday’s Quake,”
Columbia Journalism Review, 18 March 2014, http://www.cjr.org/united_states_
project/how_an_algorithm_helped_the_lat_scoop_mondays_quake.php.
24. Brandon Mercer, “Two Powerful Earthquakes Did Not Hit Northern California,
Automated Quake Alerts Fail USGS, LA Times After Deep Japan Quake,” CBS SF
Bay Area, 30 May 2015, http://sanfrancisco.cbslocal.com/2015/05/30/4-8-and-5-
5-magnitude-earthquakes-did-not-hit-northern-california-automated-quake-alerts-
fail-usgs-la-times-a-2nd-and-3rd-time/.
25. Daniel C. Bowden, Paul S. Earle, and Michelle Guy, “Twitter Earthquake De-
tection: Earthquake Monitoring in a Social World,” Annals of Geophysics, no. 6 (2011):
708–715.
26. David Lazer et al., “The Parable of Google Flu: Traps in Big Data Analysis,”
Science, no. 6176 (2014): 1203–1205.
27. James Kotecki, “New Data-driven Writing Platform Enables Professionals to Cre-
ate Personalized Content at Unprecedented Scale,” Automated Insights, 20 October
2015, http://www.prweb.com/releases/2015/10/prweb13029986.htm.
28. Lazer et al., “The Parable of Google Flu: Traps in Big Data Analysis.”
29. Noam Lemelshtrich Latar, “The Robot Journalist in the Age of Social Physics:
The End of Human Journalism?” In The New World of Transitioned Media, ed. Gail
Einav (New York: Springer, 2015), 65–80, http : / / link . springer . com / chapter / 10 .
1007/978-3-319-09009-2_6.
30. Ibid.
31. Kurt Schlegel, “Hype Cycle for Business Intelligence and Analytics, 2015,” 4 Au-
gust 2015, https://www.gartner.com/doc/3106118/hype-cycle-business-intelligence-
analytics.
32. Latar, “The Robot Journalist in the Age of Social Physics: The End of Human
Journalism?”
33. Erin Medigan White, “Automated Earnings Stories Multiply,” Associated Press,
29 January 2015, https://blog.ap.org/announcements/automated-earnings-stories-
multiply.
34. Ibid.
35. “Case Study: Forbes,” Narrative Science, 19 May 2015, http://resources.narrativescience
com/h/i/83535927-case-study-forbes.
36. Dörr, “Mapping the Field of Algorithmic Journalism.”
37. Nicole S. Cohen, “From Pink Lips to Pink Slime: Transforming Media Labor in
a Digital Age,” The Communication Review, no. 2 (2015): 98–122.
38. Alexander Siebert, “Roboterjournalismus im Jahre 2020—Acht Thesen,” The Huff-
ington Post, 8 August 2014, http://www.huffingtonpost.de/alexander-siebert/roboterjournalis
im-jahre-2020---acht-thesen_b_5655061.html.
39. Levy, “Can an Algorithm Write a Better News Story Than a Human Reporter?”
40. Dörr, “Mapping the Field of Algorithmic Journalism.”
41. Dalen, “The Algorithms Behind the Headlines.”
42. Carlson, “The Robotic Reporter.”
43. Siegfried Weischenberg, Maja Malik, and Armin Scholl, “Journalism in Germany
in the 21st Century,” in The Global Journalist in the 21st Century, ed. David Weaver
and Lars Willnat (New York: Routledge, 2012), 205–219.
44. Hermida and Young, “From Mr. and Mrs. Outlier to Central Tendencies.”
45. Christer Clerwall, “Enter the Robot Journalist,” Journalism Practice, no. 5 (2014):
519–531; Hille van der Kaa and Emiel Krahmer, “Journalist Versus News Consumer:
The Perceived Credibility of Machine Written News,” Computation Journalism Con-
ference, Columbia University, New York, 2014; Andreas Graefe et al., “Readers’ Per-
ception of Computer-Written News: Credibility, Expertise, and Readability,” Dubrovnik
Media Days Conference, University of Dubrovnik, 2015.
46. Clerwall, “Enter the Robot Journalist.”
47. Lance Ulanoff, “Need to Write 5 Million Stories a Week? Robot Reporters to the
Rescue,” Mashable, 1 July 2014, http://mashable.com/2014/07/01/robot-reporters-
add-data-to-the-five-ws/#jlMMJqbFtSq4.
48. Kaa and Krahmer, “Journalist Versus News Consumer: The Perceived Credi-
bility of Machine Written News.”
49. Graefe et al., “Readers’ Perception of Computer-Written News: Credibility, Ex-
pertise, and Readability.”
50. Kaa and Krahmer, “Journalist Versus News Consumer: The Perceived Credi-
bility of Machine Written News.”
51. Graefe et al., “Readers’ Perception of Computer-Written News: Credibility, Ex-
pertise, and Readability.”
52. Ibid.
53. Diakolpoulos, “Towards a Standard for Algorithmic Transparency in the Media.”
54. Ibid.
55. Gregor Aisch et al., “The Best and Worst Places to Grow Up: How Your Area
Compares,” The New York Times, 3 May 2015, http://www.nytimes.com/interactive/
2015 / 05 / 03 / upshot / the - best - and - worst - places - to - grow - up - how - your - area -
compares.html?_r=0.
56. Tetyana Lokot and Nicholas Diakopoulos, “News Bots: Automating News and
Information Dissemination on Twitter,” Digital Journalism, 15 September 2013, http:
//dx.doi.org/10.1080/21670811.2015.1081822.
57. Tom Kent, “An Ethical Checklist for Robot Journalism,” Medium, 24 February
2015, https://medium.com/@tjrkent/an- ethical- checklist- for- robot- journalism-
1f41dcbd7be2.
58. David Koenig, “Exxon 3Q Profit Falls by Nearly Half Amid Low Oil Prices,” As-
sociated Press, 30 October 2015, http://www.salon.com/2015/10/30/exxon_3q_
profit_falls_by_nearly_half_amid_low_oil_prices/.
59. Lazer et al., “The Parable of Google Flu: Traps in Big Data Analysis.”
60. Diakolpoulos, “Towards a Standard for Algorithmic Transparency in the Media”;
Nicholas Diakolpoulos, “Algorithmic Accountability: Journalistic Investigation of Com-
putational Power Structures,” Digital Journalism, no. 3 (2015): 398–415.
61. Kent, “An Ethical Checklist for Robot Journalism.”
62. Lin Weeks, “Media Law and Copyright Implications of Automated Journalism,”
Journal of Intellectual Property and Entertainment Law, no. 1 (2014): 67–94; Pieter-
Jan Ombelet, Aleksandra Kuczerawy, and Peggy Valcke, “Supervising Automated Jour-
nalists in the Newsroom: Liability for Algorithmically Produced News Stories,” Dubrovnik
Media Days: Artificial Intelligence, Robots and the Media Conference, University of
Dubrovnik, 2015.
63. Diakolpoulos, “Algorithmic Accountability: Journalistic Investigation of Com-
putational Power Structures .”
64. Diakolpoulos, “Towards a Standard for Algorithmic Transparency in the Media.”
65. Eli Pariser, The Filter Bubble: What the Internet Is Hiding From You (New York:
Penguin Press, 2011).
66. Ibid.
67. Lada Adamic, Eytan Bakshy, and Solomon Messing, “Exposure to Ideologically
Diverse News and Opinion on Facebook,” Science, no. 6239 (2015): 1130–1132; Seth
Flaxman, Sharad Goel, and Justin Rao, “Filter Bubbles, Echo Chambers, and On-
line News Consumption,” 2015, https://5harad.com/papers/bubbles.pdf; Florian
Arendt, Mario Haim, and Sebastian Scherr, “Abyss or Shelter? On the Relevance of
Web Search Engines’ Search Results When People Google for Suicide,” Health Com-
munication, 2015; Martin Feuz, Matthew Fuller, and Felix Stalder, “Personal Web Search-
ing in the Age of Semantic Capitalism: Diagnosing the Mechanisms of Personalisa-
tion,” First Monday, no. 2 (2011), http://firstmonday.org/ojs/index.php/fm/article/
view/3344/2766.