0% found this document useful (0 votes)

235 views

PredictingfutureAI PDF

Uploaded by

Alok Ghode

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

235 views

PredictingfutureAI PDF

Uploaded by

Alok Ghode

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/329225671

Predicting future AI failures from historic examples

Article in Foresight · November 2018

DOI: 10.1108/FS-04-2018-0034

CITATIONS READS
6 800

1 author:

Roman Yampolskiy
University of Louisville
196 PUBLICATIONS 2,033 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

AI Safety View project

Safety and Security of Artificial Intelligence View project

All content following this page was uploaded by Roman Yampolskiy on 09 January 2019.

The user has requested enhancement of the downloaded file.

Predicting future AI failures
from historic examples
Roman V. Yampolskiy

Abstract Roman V. Yampolskiy is

Purpose – The purpose of this paper is to explain to readers how intelligent systems can fail and how based at JB Speed School
artificial intelligence (AI) safety is different from cybersecurity. The goal of cybersecurity is to reduce the of Engineering, University
number of successful attacks on the system; the goal of AI Safety is to make sure zero attacks succeed in of Louisville, Louisville,
bypassing the safety mechanisms. Unfortunately, such a level of performance is unachievable. Every Kentucky, USA.
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

security system will eventually fail; there is no such thing as a 100 per cent secure system.
Design/methodology/approach – AI Safety can be improved based on ideas developed by
cybersecurity experts. For narrow AI Safety, failures are at the same, moderate level of criticality as in
cybersecurity; however, for general AI, failures have a fundamentally different impact. A single failure of a
superintelligent system may cause a catastrophic event without a chance for recovery.
Findings – In this paper, the authors present and analyze reported failures of artificially intelligent
systems and extrapolate our analysis to future AIs. The authors suggest that both the frequency and the
seriousness of future AI failures will steadily increase.
Originality/value – This is a first attempt to assemble a public data set of AI failures and is extremely
valuable to AI Safety researchers.
Keywords Cybersecurity, Failures
Paper type Research paper

1. Introduction
About 10,000 scientists[1] around the world work on different aspects of creating intelligent
machines, with the main goal of making such machines as capable as possible. With
amazing progress made in the field of artificial intelligence (AI) over the past decade, it is
more important than ever to make sure that the technology we are developing has a
beneficial impact on humanity. With the appearance of robotic financial advisors, self-
driving cars and personal digital assistants come many unresolved problems. We have
already experienced market crashes caused by intelligent trading software[2], accidents
caused by self-driving cars[3] and embarrassment from chat-bots[4], which turned racist
and engaged in hate speech. We predict that both the frequency and seriousness of such
events will steadily increase as AIs become more capable. The failures of today’s narrow
domain AIs are just a warning: once we develop artificial general intelligence (AGI) capable
of cross-domain performance, hurt feelings will be the least of our concerns.
In a recent publication, Yampolskiy proposed a Taxonomy of Pathways to Dangerous AI
(Yampolskiy, 2016b), which was motivated as follows: “In order to properly handle a
potentially dangerous artificially intelligent system it is important to understand how the
system came to be in such a state. In popular culture (science fiction movies/books) AIs/
Robots became self-aware and as a result rebel against humanity and decide to destroy it.
While it is one possible scenario, it is probably the least likely path to appearance of Received 10 April 2018
Revised 4 August 2018
dangerous AI.” Yampolskiy suggested that much more likely reasons include deliberate 25 September 2018
actions of not-so-ethical people (“on purpose”) (Brundage et al., 2018), side effects of poor Accepted 18 October 2018

DOI 10.1108/FS-04-2018-0034 © Emerald Publishing Limited, ISSN 1463-6689 j FORESIGHT j

design (“engineering mistakes”) and finally miscellaneous cases related to the impact of the
surroundings of the system (“environment”). Because purposeful design of dangerous AI is
just as likely to include all other types of safety problems and will probably have the direst
consequences, the most dangerous type of AI and the one most difficult to defend against
is an AI made malevolent on purpose.
A follow-up paper (Pistono and Yampolskiy, 2016) explored how a Malevolent AI could be
constructed and why it is important to study and understand malicious intelligent software.
The authors observe that “cybersecurity research involves publishing papers about
malicious exploits as much as publishing information on how to design tools to protect
cyber-infrastructure. It is this information exchange between hackers and security experts
that results in a well-balanced cyber-ecosystem.” In the domain of AI Safety Engineering,
hundreds of papers (Sotala and Yampolskiy, 2015) have been published on different
proposals geared at the creation of a safe machine; yet nothing else has been published on
how to design a malevolent machine. “The availability of such information would be of great
value particularly to computer scientists, mathematicians, and others who have an interest
in making safe AI, and who are attempting to avoid the spontaneous emergence or the
deliberate creation of a dangerous AI, which can negatively affect human activities and in
the worst case cause the complete obliteration of the human species (Pistono and
Yampolskiy, 2016),” as described in many works, for example Superintelligence by Bostrom
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

(Bostrom, 2014). The paper implied that, if an AI Safety mechanism is not designed to resist
attacks by malevolent human actors, it cannot be considered a functional safety mechanism
(Pistono and Yampolskiy, 2016)!

2. AI failures
Those who cannot learn from history are doomed to repeat it. Unfortunately, very few
papers have been published on failures and errors made in development of intelligent
systems (Rychtyckyj and Turski, 2008). Importance of learning from “What Went Wrong and
Why” has been recognized by the AI community (Abecker et al., 2006; Shapiro and Goker,
2008). Such research includes study of how, why and when failures happen (Abecker et al.,
2006; Shapiro and Goker, 2008) and how to improve future AI systems based on such
information (Marling and Chelberg, 2008; Shalev-Shwartz et al., 2017).
Millennia long history of humanity contains millions of examples of attempts to develop
technological and logistical solutions to increase safety and security; yet not a single
example exists, which has not eventually failed. Signatures have been faked, locks have
been picked, supermax prisons had escapes, guarded leaders have been assassinated,
bank vaults have been cleaned out, laws have been bypassed, fraud has been committed
against our voting process, police officers have been bribed, judges have been
blackmailed, forgeries have been falsely authenticated, money has been counterfeited,
passwords have been brute-forced, networks have been penetrated, computers have been
hacked, biometric systems have been spoofed, credit cards have been cloned,
cryptocurrencies have been double spent, airplanes have been hijacked, completely
automated public turing test to tell computers and humans aparts have been cracked,
cryptographic protocols have been broken, and even academic peer-review has been
bypassed with tragic consequences.
Accidents, including deadly ones, caused by software or industrial robots can be traced to
the early days of such technology[5], but they are not a direct consequence of particulars of
intelligence available in such systems. AI failures, on the other hand, are directly related to
the mistakes produced by the intelligence such systems are designed to exhibit. We can
broadly classify such failures into mistakes during the learning phase and mistakes during
performance phase. The system can fail to learn what its human designers want it to learn
and instead learn a different, but correlated function (Amodei et al., 2016). A frequently
cited example is a computer vision system which was supposed to classify pictures of tanks

j FORESIGHT j
but instead learned to distinguish backgrounds of such images (Yudkowsky, 2008). Other
examples[6] include problems caused by poorly designed utility functions rewarding only
partially desirable behaviors of agents, such as riding a bicycle in circles around the target
(Randløv and Alstrøm, 1998), pausing a game to avoid losing (Murphy, 2013), or repeatedly
touching a soccer ball to get credit for possession (Ng et al., 1999). During the performance
phase, the system may succumb to a number of causes (Pistono and Yampolskiy, 2016;
Scharre, 2016; Yampolskiy, 2016b) all leading to an AI Failure.
Media reports are full of examples of AI failure, but most of these examples can be
attributed to other causes on closer examination, such as bugs in code or mistakes in
design. The list below is curated to only mention failures of intended intelligence, not
general software faults. Additionally, the examples below include only the first occurrence of
a particular failure, but the same problems are frequently observed again in later years; for
example, self-driving cars have been reported to have multiple deadly accidents. Under the
label of failure, we include any occasion/instance where the performance of an AI does not
reach acceptable level of performance. Finally, the list does not include AI failures because
of hacking or other intentional causes. Still, the timeline of AI failures has an exponential
trend while implicitly (via gaps in the record) indicating historical events such as “AI Winter.”
In its most extreme interpretation, any software with as much as an “if statement” can be
considered a form of narrow artificial intelligence (NAI), and all of its bugs are thus
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

examples of AI failure:[7].
1958 Advice software deduced inconsistent sentences using logical programming
(Hewitt, 1958).
1959 AI designed to be a General Problem Solver failed to solve real-world
problems[8].
1977 Story writing software with limited common sense produced “wrong” stories
(Meehan, 1977).
1982 Software designed to make discoveries, discovered how to cheat instead[9].
1983 Nuclear attack early warning system falsely claimed that an attack is taking place[10].
1984 The National Resident Match program was biased in placement of married
couples (Friedman and Nissenbaum, 1996).
1988 Admissions software discriminated against women and minorities (Lowry and
Macpherson, 1988).
1994 Agents learned to “walk” quickly by becoming taller and falling over (Sims, 1994).
2005 Personal assistant AI rescheduled a meeting 50 times, each time by 5 min
(Tambe, 2008).
2006 Insider threat detection system classified normal activities as outliers (Liu et al.,
2006).
2006 Investment advising software lost money when deployed to real trading
(Gunderson and Gunderson, 2006).
2010 Complex AI stock trading software caused a trillion dollar flash crash[11].
2011 E-Assistant told to “call me an ambulance” began to refer to the user as
Ambulance[12].
2013 Object recognition neural networks saw phantom objects in particular noise
images (Szegedy et al., 2013).
2013 Google software engaged in name-based discrimination in online ad delivery
(Sweeney, 2013).

j FORESIGHT j
2014 Search engine autocomplete made bigoted associations about groups of users
(Diakopoulos, 2013).
2014 Smart fire alarm failed to sound alarm during fire[13].
2015 Automated e-mail reply generator created inappropriate responses[14].
2015 A robot for grabbing auto parts grabbed and killed a man[15].
2015 Image tagging software classified black people as gorillas[16].
2015 Medical expert AI classified patients with asthma as lower risk (Caruana et al.,
2015).
2015 Adult content filtering software failed to remove inappropriate content[17].
2015 Amazon’s Echo responded to commands from TV voices[18].
2016 Linkedin’s name lookup suggests male names in place of female ones[19].
2016 AI designed to predict recidivism acted racist[20].
2016 AI agent exploited reward signal to win without completing the game course[21].
2016 Passport picture checking system flagged Asian user as having closed eyes[22].
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

2016 Game non-player characters designed unauthorized superweapons[23].

2016 AI judged a beauty contest and rated dark-skinned contestants lower[24].
2016 Smart contract permitted syphoning of funds from the decentralized autonomous
organization[25].
2016 Patrol robot collided with a child[26].
2016 World champion-level Go playing AI lost a game[27].
2016 Self-driving car had a deadly accident[28].
2016 AI designed to converse with users on Twitter became verbally abusive[29].
2016 Google image search returned racists results[30].
2016 Artificial applicant failed to pass university entrance exam[31].
2016 Predictive policing system disproportionately targeted minority neighborhoods[32].
2016 Text subject classifier failed to learn relevant features for topic assignment
(Ribeiro et al., 2016).
2017 Alexa played adult content instead of song for kids[33].
2017 Cellphone case designing AI used inappropriate images[34].
2017 Pattern recognition software failed to recognize certain types of inputs[35].
2017 Debt recovery system miscalculated amounts owed[36].
2017 Russian language chatbot shared pro-Stalinist, pro-abuse and pro-suicide
views[37].
2017 Translation AI learned to stereotype careers to specific genders (Caliskan et al.,
2017).
2017 Face beautifying AI made black people look white[38].
2017 Google’s sentiment analyzer became homophobic and anti-Semitic[39].
2017 Fish recognition program learned to recognize boat IDs instead[40].

j FORESIGHT j
2017 Alexa turned on loud music at night without being prompted to do so[41].
2017 AI for writing Christmas carols produced nonsense[42].
2017 Apple’s face recognition system failed to distinguish Asian users[43].
2017 Facebook’s translation software changed Yampolskiy to Polanski, see Figure 1.
2018 Google Assistant created bizarre merged photo[44].
2018 Robot store assistant was not helpful with responses like “cheese is in the fridges”[45].

Spam filters block important e-mails, GPS provides faulty directions, machine translation
corrupts meaning of phrases, autocorrect replaces a desired word with a wrong one,
biometric systems misrecognize people and transcription software fails to capture what is
being said; overall, it is harder to find examples of AIs that never fail. Depending on what we
consider for inclusion as examples of problems with intelligent software, the list of examples
could be grown almost indefinitely.
Analyzing the list of narrow AI failures, from the inception of the field to modern day
systems, we can arrive at a simple generalization: an AI designed to do X will eventually fail
to do X. Although it may seem trivial, it is a powerful generalization tool, which can be used
to predict future failures of NAIs. For example, looking at cutting-edge current and future
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

AIs we can predict that:

Software for generating jokes will occasionally fail to make them funny.
Sex robots will fail to deliver an orgasm or to stop at the right time.
Sarcasm detection software will confuse sarcastic and sincere statements.
Video description software will misunderstand movie plots.
Software generated virtual worlds may not be compelling.
AI doctors will misdiagnose some patients in a way a human doctor would not.
Employee screening software will be systematically biased and thus hire low
performers.

Others have given the following examples of possible accidents with A(G)I/
superintelligence:
Housekeeping robot cooks family pet for dinner[46].
A mathematician AGI converts all matter into computing elements to solve problems[47].
An AGI running simulations of humanity creates conscious being who suffer (Armstrong
et al., 2012).

Figure 1 While translating from Polish to English, Facebook’s software changed Roman
“Yampolskiy” to Roman “Polanski” because of statistically higher frequency of the
later name in sample texts

j FORESIGHT j
Paperclip manufacturing AGI fails to stop and converts universe into raw materials
(Bostrom, 2003).
A scientist AGI performs experiments with significant negative impact on biosphere
(Taylor et al., 2016).
Drug design AGI develops time-delayed poison to kill everyone and so defeat
cancer[48].
Future superintelligence optimizes away all consciousness[49].
AGI kills humanity and converts universe into materials for improved handwriting[50].
AGI designed to maximize human happiness tiles universe with tiny smiley faces
(Yudkowsky, 2011).
AGI instructed to maximize pleasure consigns humanity to a dopamine drip (Marcus,
2012).
Superintelligence may rewire human brains to increase their perceived satisfaction
(Yudkowsky, 2011).
Denning and Denning made some similar error extrapolations in their humorous paper on
“artificial stupidity,” which likewise illustrates similar trend (Denning and Denning, 2004):
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

“Soon the automated Drug Enforcement Administration started closing down

pharmaceutical companies saying they were dealing drugs. The automated Federal Trade
Commission closed down the Hormel Meat Company, saying it was purveying spam. The
automated Department of Justice shipped Microsoft 500,000 pinstriped pants and jackets,
saying it was filing suits. The automated Army replaced all its troops with a single robot,
saying it had achieved the Army of One. The automated Navy, in a cost-saving move,
placed its largest-ever order for submarines with Subway Sandwiches. The Federal
Communications Commission issued an order for all communications to be wireless,
causing thousands of AT&T installer robots to pull cables from overhead poles and
underground conduits. The automated TSA flew its own explosives on jetliners, citing data
that the probability of two bombs on an airplane is exceedingly small”.
AGI can be seen as a superset of all NAIs and so will exhibit a superset of failures as well as
more complicated failures resulting from the combination of failures of individual NAIs and
new super-failures, possibly resulting in an existential threat to humanity or at least an AGI
takeover. In other words, AGIs can make mistakes influencing everything. Overall, we
predict that AI failures and premediated Malevolent AI incidents will increase in frequency
and severity proportionate to AIs’ capability.

2.1 Preventing AI failures

AI failures have a number of causes, with most common ones, currently observed,
displaying some type of algorithmic bias, poor performance or basic malfunction. Future AI
failures are likely to be more severe including purposeful manipulation/deception (Chessen,
2017), or even resulting in human death (likely from misapplication of militarized AI/
autonomous weapons/killer robots (Krishnan, 2009)). At the very end of severity scale, we
see existential risk scenarios resulting in extermination of human kind or suffering-risk
scenarios (Gloor, 2016) resulting in large-scale torture of humanity, both types of risk
coming from supercapable artificially intelligent systems.
Reviewing examples of AI accidents, we can notice patterns of failure, which can be
attributed to the following causes:
biased data, including cultural differences;
deploying underperforming system;

j FORESIGHT j
non-representative training data;
discrepancy between training and testing data;
rule overgeneralization or application of population statistics to individuals;
inability to handle noise or statistical outliers;
not testing for rare or extreme conditions;
not realizing an alternative solution method can produce same results, but with side
effects;
letting users control data or learning process;
no security mechanism to prevent adversarial meddling;
no cultural competence/common sense;
limited access to information/sensors;
mistake in design and inadequate testing;
limited ability for language disambiguation; and
inability to adopt to changes in environment
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

With bias being a common current cause of failure, it is helpful to analyze particular types of
algorithmic bias. Friedman and Nissenbaum (Friedman and Nissenbaum, 1996) proposed
the following framework for analyzing bias in computer systems. They subdivide causes of
bias into three categories – preexisting bias, technical bias and emergent bias:
Preexisting bias reflects bias in society and social institutions, practices and attitudes.
The system simply preserves an existing state the world and automates application of
bias as currently exists.
Technical bias appears because of hardware or software limitations of the system itself.
Emergent bias emerges after the system is deployed because of changing societal
standards.

Many of the observed AI failures are similar to mishaps experienced by little children. This is
particularly true for artificial neural networks, which are at the cutting edge of machine
learning (ML). One can say that children are untrained neural networks deployed on real
data and observing them can teach us a lot about predicting and preventing issues with
ML. A number of research groups (Amodei et al., 2016; Taylor et al., 2016) have
investigated ML-related topics with corresponding equivalence in behavior of developing
humans, and here we have summarized their work and mapped it onto similar situations
with children (Table I).
A majority of research currently taking place to prevent such issues is currently happening
under the label of “AI Safety.”

3. AI Safety
In 2010, we coined the phrase “Artificial Intelligence Safety Engineering” and its shorthand
notation “AI Safety” to give a name to a new direction of research we were advocating. We
formally presented our ideas on AI Safety at a peer-reviewed conference in 2011
(Yampolskiy, 2011a, b), with subsequent publications on the topic in 2012 (Yampolskiy and
Fox, 2012), 2013 (Muehlhauser and Yampolskiy, 2013; Yampolskiy, 2013a), 2014 (Majot
and Yampolskiy, 2014), 2015 (Yampolskiy, 2015), 2016 (Pistono and Yampolskiy, 2016;
Yampolskiy, 2016b), 2017 (Yampolskiy, 2017) and 2018 (Brundage et al., 2018;
Ramamoorthy and Yampolskiy, 2017). It is possible that someone used the phrase

j FORESIGHT j
Table I ML concepts and corresponding notions in childhood development
Concept in ML Equivalent in child development

Negative side effects – Child makes a mess

Reward hacking Child finds candy jar
Scalable oversight Babysitting should not require a team of ten
Safe exploration No fingers in the outlet
Robustness to distributional shift Use “inside voice” in the classroom
Inductive ambiguity identification Is ant a cat or a dog?
Robust human imitation Daughter shaves like daddy
Informed oversight Child hands in homework
Generalizable environmental goals Ignore that mirage
Conservative concepts That dog has no tail
Impact measures Keep a low profile
Mild optimization Do not be a perfectionist
Averting instrumental incentives Be an altruist

informally before, but to the best of our knowledge, we were the first to use it[51] in a peer-
reviewed publication and to bring it popularity. Before that, the most common names for the
field of machine control were “Machine Ethics” (Moor, 2006) or “Friendly AI” (Yudkowsky,
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

2001). Today the term “AI Safety” appears to be the accepted[52],[53],[54],[55],[56],[57],

[58],[59],[60],[61],[62],[63] name for the field used by a majority of top researchers (Amodei
et al., 2016). The field itself is becoming mainstream despite being regarded as either
science fiction or pseudoscience in its early days.
Our legal system is behind our technological abilities and the field of AI Safety is in its
infancy. The problem of controlling intelligent machines is just now being recognized[64] as
a serious concern, and many researchers are still skeptical about its very premise. Worse
yet, only about 100 people around the world are fully engaged in working on addressing the
current limitations in our understanding and abilities in this domain. Only about a dozen[65]
of those have formal training in computer science, cybersecurity, cryptography, decision
theory, ML, formal verification, computer forensics, steganography, ethics, mathematics,
network security, psychology and other relevant fields. It is not hard to see that the problem
of making a safe and capable machine is much greater than the problem of making just a
capable machine. Yet only about 1 per cent of researchers are currently engaged in AI
Safety work with available funding levels below even that mark. As a relatively young and
underfunded field of study, AI Safety can benefit from adopting methods and ideas from
more established fields of science. Attempts have been made to introduce techniques
which were first developed by cybersecurity experts to secure software systems to this new
domain of securing intelligent machines (Armstrong and Yampolskiy, 2016; Babcock et al.,
2016a, 2016b; Yampolskiy, 2012a, 2012b). Other fields, which could serve as a source of
important techniques, would include software engineering and software verification.
During software development iterative testing and debugging is of fundamental importance
to produce reliable and safe code. Although it is assumed that all complicated software will
have some bugs, with many advanced techniques available in the toolkit of software
engineers most serious errors could be detected and fixed, resulting in a product suitable
for its intended purposes. Certainly, a lot of modular development and testing techniques
used by the software industry can be used during development of intelligent agents, but
methods for testing a completed software package are unlikely to be transferable in the
same way. Alpha and beta testing, which work by releasing almost-finished software to
advanced users for reporting problems encountered in realistic situations, would not be a
good idea in the domain of testing/debugging superintelligent software. Similarly simply
running the software to see how it performs is not a feasible approach with superintelligent
agent.

j FORESIGHT j
4. Cybersecurity vs. AI Safety
Bruce Schneier has said, “If you think technology can solve your security problems then you
don’t understand the problems and you don’t understand the technology”[66]. Salman
Rushdie made a more general statement: “There is no such thing as perfect security, only
varying levels of insecurity”[67]. We propose what we call the Fundamental Thesis of
Security – Every security system will eventually fail; there is no such thing as a 100 per cent
secure system. If your security system has not failed, just wait longer.
In theoretical computer science, a common way of isolating the essence of a difficult
problem is via the method of reduction to another, sometimes better analyzed, problem
(Karp, 1972; Yampolskiy, 2013c; Yampolskiy, 2012a, 2012b). If such a reduction is a
possibility and is computationally efficient (Yampolskiy, 2013b), such a reduction implies
that if the better analyzed problem is somehow solved, it would also provide a working
solution for the problem we are currently dealing with. The more general problem of AGI
Safety must contain a solution to the more narrow problem of making sure a particular
human is safe for other humans. We call this the Safe Human Problem[68]. Formally such a
reduction can be done via a restricted Turing test in the domain of safety in a manner
identical to how AI-Completeness of a problem could be established (Yampolskiy, 2013c;
Yampolskiy, 2011a, b). Such formalism is beyond the scope of this chapter, so we simply
point out that in both cases, we have at least a human-level intelligent agent capable of
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

influencing its environment, and we would like to make sure that the agent is safe and
controllable. Although in practice, changing the design of a human via DNA manipulation is
not as simple as changing the source code of an AI, theoretically it is just as possible.
It is observed that humans are not completely safe to themselves and others. Despite a
millennia of attempts to develop safe humans via culture, education, laws, ethics,
punishment, reward, religion, relationships, family, oaths, love and even eugenics, success
is not within reach. Humans kill and commit suicide, lie and betray, steal and cheat,
possibly in proportion to how much they can get away with. Truly powerful dictators will
enslave, commit genocide, break every law and violate every human right. It is famously
stated that a human without a sin cannot be found. The best we can hope for is to reduce
such unsafe tendencies to levels that our society can survive. Even with advanced genetic
engineering (Yampolskiy, 2016a), the best we can hope for is some additional reduction in
how unsafe humans are. As long as we permit a person to have choices (free will), they can
be bribed, they will deceive, they will prioritize their interests above those they are
instructed to serve and they will remain fundamentally unsafe. Despite being trivial
examples of a solution to the value learning problem (VLP) (Dewey, 2011; Soares and
Fallenstein, 2014; Sotala, 2016), human beings are anything but safe, bringing into question
our current hope that solving VLP will get us to Safe AI. This is important. To quote Bruce
Schneier, “Only amateurs attack machines; professionals target people.” Consequently, we
see AI Safety research as, at least partially, an adversarial field similar to cryptography or
security[69].
If a cybersecurity system fails, the damage is unpleasant but tolerable in most cases:
someone loses money, someone loses privacy or maybe somebody loses their life. For
narrow AIs, safety failures are at the same level of importance as in general cybersecurity,
but for AGI it is fundamentally different. A single failure of a superintelligent system may
cause an existential risk event. If an AGI Safety mechanism fails, everyone may lose
everything, and all biological life in the universe is potentially destroyed. With cybersecurity
systems, you will get another chance to get it right or at least do better. With AGI Safety
system, you only have one chance to succeed, so learning from failure is not an option.
Worse, a typical security system is likely to fail to a certain degree, e.g. perhaps, only a
small amount of data will be compromised. With an AGI Safety system, failure or success is
a binary option: either you have a safe and controlled superintelligence or you do not. The
goal of cybersecurity is to reduce the number of successful attacks on the system; the goal

j FORESIGHT j
of AI Safety is to make sure zero attacks by superintelligent AI succeed in bypassing the
safety mechanisms. For that reason, ability to distinguish NAI projects from potential AGI
projects (Baum, 2017) is an open problem of fundamental importance in the AI safety field.
The problems are many. We have no way to monitor, visualize or analyze the performance
of superintelligent agents. More trivially, we do not even know what to expect after such a
software starts running. Should we see immediate changes to our environment? Should we
see nothing? What is the timescale on which we should be able to detect something? Will it
be too quick to notice or are we too slow to realize something is happening (Yudkowsky and
Hanson, 2008)? Will the impact be locally observable or impact distant parts of the world?
How does one perform standard testing? On what data sets? What constitutes an “Edge
Case” for general intelligence? The questions are many, but the answers currently do not
exist. Additional complications will come from the interaction between intelligent software
and safety mechanisms designed to keep AI Safe and secure. We will also have to
somehow test all the AI Safety mechanisms currently in development. Although AI is at
human levels, some testing can be done with a human agent playing the role of the artificial
agent (Yudkowsky, 2002). At levels beyond human capacity, adversarial testing does not
seem to be realizable with today’s technology. More significantly, only one test run would
ever be possible.
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

5. Conclusions
The history of robotics and artificial intelligence in many ways is also the history of
humanity’s attempts to control such technologies. From the Golem of Prague to the military
robots of modernity, the debate continues as to what degree of independence such entities
should have and how to make sure that they do not turn on us, their inventors. Numerous
recent advancements in all aspects of research, development and deployment of intelligent
systems are well publicized but safety and security issues related to AI are rarely
addressed. It is our hope that this paper will allow us to better understand how AI systems
can fail and what we can expect from such systems in the future, allowing us to better
prepare an appropriate response.

Notes
1. https://intelligence.org/2014/01/28/how-big-is-ai/
2. https://en.wikipedia.org/wiki/2010_Flash_Crash
3. https://electrek.co/2016/05/26/tesla-model-s-crash-autopilot-video/
4. https://en.wikipedia.org/wiki/Tay_(bot)
5. https://en.wikipedia.org/wiki/Kenji_Urada
6. http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/
7. https://en.wikipedia.org/wiki/List_of_software_bugs
8. https://en.wikipedia.org/wiki/General_Problem_Solver
9. http://aliciapatterson.org/stories/eurisko-computer-mind-its-own
10. https://en.wikipedia.org/wiki/1983_Soviet_nuclear_false_alarm_incident
11. http://gawker.com/this-program-that-judges-use-to-predict-future-crimes-s-1778151070
12. www.technologyreview.com/s/601897/tougher-turing-test-exposes-chatbots-stupidity/
13. www.forbes.com/sites/aarontilley/2014/04/03/googles-nest-stops-selling-its-smart-smoke-alarm-
for-now
14. https://gmail.googleblog.com/2015/11/computer-respond-to-this-email.html
15. http://time.com/3944181/robot-kills-man-volkswagen-plant/
16. www.huffingtonpost.com/2015/07/02/google-black-people-goril_n_7717008.html

j FORESIGHT j
17. http://blogs.wsj.com/digits/2015/05/19/googles-youtube-kids-app-criticized-for-inappropriate-
content/
18. https://motherboard.vice.com/en_us/article/53dz8x/people-are-complaining-that-amazon-echo-is-
responding-to-ads-on-tv
19. www.seattletimes.com/business/microsoft/how-linkedins-search-engine-may-reflect-a-bias
20. www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
21. https://openai.com/blog/faulty-reward-functions
22. www.telegraph.co.uk/technology/2016/12/07/robot-passport-checker-rejects-asian-mans-photo-
having-eyes
23. www.kotaku.co.uk/2016/06/03/elites-ai-created-super-weapons-and-started-hunting-players-
skynet-is-here
24. www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-
black-people
25. https://en.wikipedia.org/wiki/The_DAO_(organization)
26. www.latimes.com/local/lanow/la-me-ln-crimefighting-robot-hurts-child-bay-area-20160713-snap-
story.html
27. www.engadget.com/2016/03/13/google-alphago-loses-to-human-in-one-match/
28. www.theguardian.com/technology/2016/jul/01/tesla-driver-killed-autopilot-self-driving-car-harry-
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

potter
29. www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
30. https://splinternews.com/black-teenagers-vs-white-teenagers-why-googles-algori-1793857436
31. www.japantimes.co.jp/news/2016/11/15/national/ai-robot-fails-get-university-tokyo
32. www.themarshallproject.org/2016/02/03/policing-the-future
33. www.entrepreneur.com/video/287281
34. www.boredpanda.com/funny-amazon-ai-designed-phone-cases-fail
35. www.bbc.com/future/story/20170410-how-to-fool-artificial-intelligence
36. www.abc.net.au/news/2017-04-10/centrelink-debt-recovery-system-lacks-transparency-
ombudsman/8430184
37. https://techcrunch.com/2017/10/24/another-ai-chatbot-shown-spouting-offensive-views
38. www.gizmodo.co.uk/2017/04/faceapp-blames-ai-for-whitening-up-black-people
39. https://motherboard.vice.com/en_us/article/j5jmj8/google-artificial-intelligence-bias
40. https://medium.com/@gidishperber/what-ive-learned-from-kaggle-s-fisheries-competition-
92342f9ca779
41. http://mashable.com/2017/11/08/amazon-alexa-rave-party-germany
42. http://mashable.com/2017/12/22/ai-tried-to-write-christmas-carols
43. www.mirror.co.uk/tech/apple-accused-racism-after-face-11735152
44. https://qz.com/1188170/google-photos-tried-to-fix-this-ski-photo
45. www.iflscience.com/technology/store-hires-robot-to-help-out-customers-robot-gets-fired-for-
scaring-customers-away
46. www.theguardian.com/sustainable-business/2015/jun/23/the-ethics-of-ai-how-to-stop-your-robot-
cooking-your-cat
47. https://intelligence.org/2014/11/18/misconceptions-edge-orgs-conversation-myth-ai
48. https://80000hours.org/problem-profiles/positively-shaping-artificial-intelligence
49. http://slatestarcodex.com/2014/07/13/growing-children-for-bostroms-disneyland
50. https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html
51. Term “Safe AI” has been used as early as 1995 (Rodd, 1995).
52. www.cmu.edu/safartint/

j FORESIGHT j
53. https://selfawaresystems.com/2015/07/11/formal-methods-for-ai-safety/
54. https://intelligence.org/2014/08/04/groundwork-ai-safety-engineering/
55. http://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/new-ai-safety-projects-get-
funding-from-elon-musk
56. http://globalprioritiesproject.org/2015/08/quantifyingaisafety/
57. http://futureoflife.org/2015/10/12/ai-safety-conference-in-puerto-rico/
58. http://rationality.org/waiss/
59. http://gizmodo.com/satya-nadella-has-come-up-with-his-own-ai-safety-rules-1782802269
60. https://80000hours.org/career-reviews/artificial-intelligence-risk-research/
61. https://openai.com/blog/concrete-ai-safety-problems/
62. http://lesswrong.com/lw/n4l/safety_engineering_target_selection_and_alignment/
63. www.waise2018.com/
64. www.whitehouse.gov/blog/2016/05/03/preparing-future-artificial-intelligence
65. http://acritch.com/fhi-positions/
66. www.brainyquote.com/quotes/bruce_schneier_182286
67. www.brainyquote.com/quotes/salman_rushdie_580407
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

68. Similarly a Safe Animal Problem may be of interest (can a Pitbull be guaranteed safe?).
69. The last thing we want is to be in an adversarial situation with a superintelligence, but unfortunately
we may not have a choice in the matter. It seems that long-term AI Safety cannot succeed but
neither does it have the luxury of a partial fail.

References
Abecker, A., Alami, R., Baral, C., Bickmore, T., Durfee, E., Fong, T. and Lebiere, C. (2006), “AAAI 2006
spring symposium reports”, AI Magazine, Vol. 27 No. 3, p. 107.
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J. and Mané, D. (2016), “Concrete
problems in AI safety”, arXiv preprint arXiv:1606.06565.
Armstrong, S. and Yampolskiy, R.V. (2016), “Security solutions for intelligent and complex systems”,
Security Solutions for Hyperconnectivity and the Internet of Things, IGI Global, Hershey, PA, pp. 37-88.
Armstrong, S., Sandberg, A. and Bostrom, N. (2012), “Thinking inside the box: controlling and using an
oracle ai”, Minds and Machines, Vol. 22 No. 4, pp. 299-324.
Babcock, J., Kramar, J. and Yampolskiy, R. (2016a), “The AGI containment problem”, Paper presented at
the The Ninth Conference on Artificial General Intelligence (AGI2015).

Babcock, J., Kramar, J. and Yampolskiy, R. (2016b), “The AGI containment problem”, arXiv preprint
arXiv:1604.00545.
Baum, S. (2017), “A survey of artificial general intelligence projects for ethics, risk, and policy”, Global
Catastrophic Risk Institute Working Paper 17-1.
Bostrom, N. (2003), “Ethical issues in advanced artificial intelligence”, Science Fiction and Philosophy:
From Time Travel to Superintelligence, pp. 277-284.
Bostrom, N. (2014), Superintelligence: Paths, Dangers, Strategies, Oxford University Press, New York,
NY.
Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B. and Filar, B. (2018), “The malicious
use of artificial intelligence: forecasting, prevention, and mitigation”, arXiv preprint arXiv:1802.07228.
Caliskan, A., Bryson, J.J. and Narayanan, A. (2017), “Semantics derived automatically from language
corpora contain human-like biases”, Science, Vol. 356 No. 6334, pp. 183-186.

Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M. and Elhadad, N. (2015), “Intelligible models for
healthcare: predicting pneumonia risk and hospital 30-day readmission”, Paper presented at the
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining.

j FORESIGHT j
Chessen, M. (2017), The MADCOM Future, Atlantic Council, available at: www.atlanticcouncil.org/
publications/reports/the-madcom-future
Denning, D.E. and Denning, P.J. (2004), “Artificial stupidity”, Communications of The ACM, Vol. 4 No. 5,
p. 112.
Dewey, D. (2011), “Learning what to value”, Artificial General Intelligence, pp. 309-314.
Diakopoulos, N. (2013), “Algorithmic defamation: the case of the shameless autocomplete”, Blog post,
available at: www.nickdiakopoulos.com/2013/08/06/algorithmic-defamation-the-case-of-the-shameless-
autocomplete/
Friedman, B. and Nissenbaum, H. (1996), “Bias in computer systems”, ACM Transactions on Information
Systems (TOIS), Vol. 14 No. 3, pp. 330-347.
Gloor, L. (2016), “Suffering-focused AI safety: why “fail-safe” measures might be our top intervention”,
Retrieved from.
Gunderson, J. and Gunderson, L. (2006), “And then the phone rang”, Paper presented at the AAAI
Spring Symposium: What Went Wrong and Why: Lessons from AI Research and Applications.
Hewitt, C. (1958), “Development of logic programming: what went wrong, what was done about it, and
what it might mean for the future”.
Karp, R.M. (1972), “Reducibility among combinatorial problems”, In Miller, R. E. and Thatcher, J. W.
(Eds), Complexity of Computer Computations, Plenum, New York, NY, pp. 85-103.
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

Krishnan, A. (2009), Killer Robots: Legality and Ethicality of Autonomous Weapons, Ashgate Publishing,
Farnham.
Liu, A., Martin, C.E., Hetherington, T. and Matzner, S. (2006), “AI lessons learned from experiments in
insider threat detection”, Paper presented at the AAAI Spring Symposium: What Went Wrong and Why:
Lessons from AI Research and Applications.
Lowry, S. and Macpherson, G. (1988), “A blot on the profession”, British Medical Journal (Clinical
Research ed.).), Vol. 296 No. 6623, p. 657.
Majot, A.M. and Yampolskiy, R.V. (2014), “AI safety engineering through introduction of self-reference
into felicific calculus via artificial pain and pleasure”, Paper presented at the IEEE International
Symposium on Ethics in Science, Technology and Engineering, Chicago, IL.
Marcus, G. (2012), Moral Machines, The New Yorker, New York, NY, p. 24.
Marling, C. and Chelberg, D. (2008), “RoboCup for the mechanically, athletically and culturally
challenged”.
Meehan, J.R. (1977), “TALE-SPIN, an interactive program that writes stories”, Paper presented at the IJCAI.

Moor, J.H. (2006), “The nature, importance, and difficulty of machine ethics”, IEEE Intelligent Systems,
Vol. 21 No. 4, pp. 18-21.

Muehlhauser, L. and Yampolskiy, R. (2013), “Roman yampolskiy on AI safety engineering”, Paper

presented at the Machine Intelligence Research Institute, available at: http://intelligence.org/2013/07/15/
roman-interview/

Murphy, T. (2013), “The first level of Super Mario Bros. is easy with lexicographic orderings and time
travel”, The Association for Computational Heresy (SIGBOVIK), pp. 112-133.

Ng, A.Y., Harada, D. and Russell, S. (1999), “Policy invariance under reward transformations: theory and
application to reward shaping”, Paper presented at the ICML.

Pistono, F. and Yampolskiy, R.V. (2016), “Unethical research: how to create a malevolent artificial
intelligence”, arXiv preprint arXiv:1605.02817.

Pistono, F. and Yampolskiy, R.V. (2016), “Unethical research: how to create a malevolent artificial
intelligence”, Paper presented at the 25th International Joint Conference on Artificial Intelligence (IJCAI-16),
Ethics for Artificial Intelligence Workshop (AI-Ethics-2016), New York, NY.

Ramamoorthy, A. and Yampolskiy, R. (2017), “Beyond mad?: The race for artificial general intelligence”,
ITU Journal: ICT Discoveries, No. 1, available at: www.itu.int/en/journal/001/Pages/09.aspx

Randløv, J. and Alstrøm, P. (1998), “Learning to drive a bicycle using reinforcement learning and
shaping”, Paper presented at the ICML.

j FORESIGHT j
Ribeiro, M.T., Singh, S. and Guestrin, C. (2016), “Why should i trust you?: Explaining the predictions of
any classifier”, Paper presented at the Proceedings of the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining.
Rodd, M. (1995), “Safe AI – is this possible?”, Engineering Applications of Artificial Intelligence, Vol. 8
No. 3, pp. 243-250.
Rychtyckyj, N. and Turski, A. (2008), “Reasons for success (and failure) in the development and
deployment of AI systems”, Paper presented at the AAAI 2008 workshop on What Went Wrong and Why.
Scharre, P. (2016), “Autonomous weapons and operational risk”, Paper presented at the Center for a New
American Society, Washington DC.
Shalev-Shwartz, S., Shamir, O. and Shammah, S. (2017), “Failures of gradient-based deep learning”,
Paper presented at the International Conference on Machine Learning.
Shapiro, D. and Goker, M.H. (2008), “Advancing AI research and applications by learning from what went
wrong and why”, AI Magazine, Vol. 29 No. 2, pp. 9-10.
Sims, K. (1994), “Evolving virtual creatures”, Paper presented at the Proceedings of the 21st annual
conference on Computer graphics and interactive techniques.
Soares, N. and Fallenstein, B. (2014), Aligning superintelligence with human interests: a technical
research agenda, Machine Intelligence Research Institute (MIRI) Technical Report, 8.
Sotala, K. (2016), “Defining human values for value learners”, Paper presented at the 2nd International
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

Workshop on AI, Ethics and Society, AAAI-2016.

Sotala, K. and Yampolskiy, R.V. (2015), “Responses to catastrophic AGI risk: a survey”, Physica Scripta,
Vol. 90 No. 1.
Sweeney, L. (2013), “Discrimination in online ad delivery”, Queue, Vol. 11 No. 3, p. 10.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. and Fergus, R. (2013),
“Intriguing properties of neural networks”, arXiv preprint arXiv:1312.6199.
Tambe, M. (2008), “Electric elves: what went wrong and why”, AI Magazine, Vol. 29 No. 2, p. 23.

Taylor, J., Yudkowsky, E., LaVictoire, P. and Critch, A. (2016), Alignment for Advanced Machine Learning
Systems, Machine Intelligence Research Institute, Berkeley, CA.

Yampolskiy, R.V. (2011a), “AI-Complete CAPTCHAs as zero knowledge proofs of access to an artificially
intelligent system”, ISRN Artificial Intelligence, 271878.
Yampolskiy, R.V. (2011b), “Artificial intelligence safety engineering: why machine ethics is a wrong
approach”, Paper presented at the Philosophy and Theory of Artificial Intelligence (PT-AI2011),
Thessaloniki, Greece.

Yampolskiy, R.V. (2012a), “Leakproofing the singularity artificial intelligence confinement problem”,
Journal of Consciousness Studies, Vol. 19 Nos 1/2, pp. 1-2.

Yampolskiy, R.V. (2012b), “AI-Complete, AI-Hard, or AI-Easy–classification of problems in AI”, The 23rd
Midwest Artificial Intelligence and Cognitive Science Conference, Cincinnati, OH, USA.

Yampolskiy, R.V. (2013a), “Artificial intelligence safety engineering: why machine ethics is a wrong
approach”, Philosophy and Theory of Artificial Intelligence, Springer, Berlin Heidelberg,
pp. 389-396.

Yampolskiy, R.V. (2013b), “Efficiency theory: a unifying theory for information, computation and
intelligence”, Journal of Discrete Mathematical Sciences & Cryptography, Vol. 16 Nos 4/5, pp. 259-277.

Yampolskiy, R.V. (2013c), “Turing test as a defining feature of AI-Completeness”, in Yang, X.-S. (Ed.),
Artificial Intelligence, Evolutionary Computing and Metaheuristics, Springer, Berlin Heidelberg, Vol. 427,
pp. 3-17.
Yampolskiy, R.V. (2015), Artificial Superintelligence: A Futuristic Approach, Chapman and Hall/CRC,
Boca Raton, FL.

Yampolskiy, R.V. (2016a), “On the origin of samples: attribution of output to a particular algorithm”, arXiv
preprint arXiv:1608.06172.

Yampolskiy, R.V. (2016b), “Taxonomy of pathways to dangerous artificial intelligence”, Paper presented
at the Workshops at the Thirtieth AAAI Conference on Artificial Intelligence.

j FORESIGHT j
Yampolskiy, R.V. (2017), “What are the ultimate limits to computational techniques: verifier theory and
unverifiability”, Physica Scripta, Vol. 92 No. 9, p. 093001.

Yampolskiy, R.V. and Fox, J. (2012), “Safety engineering for artificial general intelligence”, Topoi. Special
Issue on Machine Ethics & the Ethics of Building Intelligent Machines.
Yudkowsky, E. (2001), “Creating friendly AI 1.0: the analysis and design of benevolent goal
architectures”, Singularity Institute for Artificial Intelligence, San Francisco, CA, 15 June.
Yudkowsky, E. (2002), The AI-Box Experiment, available at: http://yudkowsky.net/singularity/aibox
Yudkowsky, E. (2008), “Artificial intelligence as a positive and negative factor in global risk”, Global
Catastrophic Risks, Vol. 1, p. 303.

Yudkowsky, E. (2011), “Complex value systems in friendly AI”, Artificial General Intelligence,
pp. 388-393.
Yudkowsky, E. and Hanson, R. (2008), “The Hanson-Yudkowsky AI-foom debate”, Paper presented at
the MIRI Technical Report, available at: http://intelligence.org/files/AIFoomDebate.pdf

Corresponding author
Roman V. Yampolskiy can be contacted at: [email protected]
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: [email protected]

j FORESIGHT j