PredictingfutureAI PDF
PredictingfutureAI PDF
net/publication/329225671
CITATIONS READS
6 800
1 author:
Roman Yampolskiy
University of Louisville
196 PUBLICATIONS 2,033 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Roman Yampolskiy on 09 January 2019.
security system will eventually fail; there is no such thing as a 100 per cent secure system.
Design/methodology/approach – AI Safety can be improved based on ideas developed by
cybersecurity experts. For narrow AI Safety, failures are at the same, moderate level of criticality as in
cybersecurity; however, for general AI, failures have a fundamentally different impact. A single failure of a
superintelligent system may cause a catastrophic event without a chance for recovery.
Findings – In this paper, the authors present and analyze reported failures of artificially intelligent
systems and extrapolate our analysis to future AIs. The authors suggest that both the frequency and the
seriousness of future AI failures will steadily increase.
Originality/value – This is a first attempt to assemble a public data set of AI failures and is extremely
valuable to AI Safety researchers.
Keywords Cybersecurity, Failures
Paper type Research paper
1. Introduction
About 10,000 scientists[1] around the world work on different aspects of creating intelligent
machines, with the main goal of making such machines as capable as possible. With
amazing progress made in the field of artificial intelligence (AI) over the past decade, it is
more important than ever to make sure that the technology we are developing has a
beneficial impact on humanity. With the appearance of robotic financial advisors, self-
driving cars and personal digital assistants come many unresolved problems. We have
already experienced market crashes caused by intelligent trading software[2], accidents
caused by self-driving cars[3] and embarrassment from chat-bots[4], which turned racist
and engaged in hate speech. We predict that both the frequency and seriousness of such
events will steadily increase as AIs become more capable. The failures of today’s narrow
domain AIs are just a warning: once we develop artificial general intelligence (AGI) capable
of cross-domain performance, hurt feelings will be the least of our concerns.
In a recent publication, Yampolskiy proposed a Taxonomy of Pathways to Dangerous AI
(Yampolskiy, 2016b), which was motivated as follows: “In order to properly handle a
potentially dangerous artificially intelligent system it is important to understand how the
system came to be in such a state. In popular culture (science fiction movies/books) AIs/
Robots became self-aware and as a result rebel against humanity and decide to destroy it.
While it is one possible scenario, it is probably the least likely path to appearance of Received 10 April 2018
Revised 4 August 2018
dangerous AI.” Yampolskiy suggested that much more likely reasons include deliberate 25 September 2018
actions of not-so-ethical people (“on purpose”) (Brundage et al., 2018), side effects of poor Accepted 18 October 2018
(Bostrom, 2014). The paper implied that, if an AI Safety mechanism is not designed to resist
attacks by malevolent human actors, it cannot be considered a functional safety mechanism
(Pistono and Yampolskiy, 2016)!
2. AI failures
Those who cannot learn from history are doomed to repeat it. Unfortunately, very few
papers have been published on failures and errors made in development of intelligent
systems (Rychtyckyj and Turski, 2008). Importance of learning from “What Went Wrong and
Why” has been recognized by the AI community (Abecker et al., 2006; Shapiro and Goker,
2008). Such research includes study of how, why and when failures happen (Abecker et al.,
2006; Shapiro and Goker, 2008) and how to improve future AI systems based on such
information (Marling and Chelberg, 2008; Shalev-Shwartz et al., 2017).
Millennia long history of humanity contains millions of examples of attempts to develop
technological and logistical solutions to increase safety and security; yet not a single
example exists, which has not eventually failed. Signatures have been faked, locks have
been picked, supermax prisons had escapes, guarded leaders have been assassinated,
bank vaults have been cleaned out, laws have been bypassed, fraud has been committed
against our voting process, police officers have been bribed, judges have been
blackmailed, forgeries have been falsely authenticated, money has been counterfeited,
passwords have been brute-forced, networks have been penetrated, computers have been
hacked, biometric systems have been spoofed, credit cards have been cloned,
cryptocurrencies have been double spent, airplanes have been hijacked, completely
automated public turing test to tell computers and humans aparts have been cracked,
cryptographic protocols have been broken, and even academic peer-review has been
bypassed with tragic consequences.
Accidents, including deadly ones, caused by software or industrial robots can be traced to
the early days of such technology[5], but they are not a direct consequence of particulars of
intelligence available in such systems. AI failures, on the other hand, are directly related to
the mistakes produced by the intelligence such systems are designed to exhibit. We can
broadly classify such failures into mistakes during the learning phase and mistakes during
performance phase. The system can fail to learn what its human designers want it to learn
and instead learn a different, but correlated function (Amodei et al., 2016). A frequently
cited example is a computer vision system which was supposed to classify pictures of tanks
j FORESIGHT j
but instead learned to distinguish backgrounds of such images (Yudkowsky, 2008). Other
examples[6] include problems caused by poorly designed utility functions rewarding only
partially desirable behaviors of agents, such as riding a bicycle in circles around the target
(Randløv and Alstrøm, 1998), pausing a game to avoid losing (Murphy, 2013), or repeatedly
touching a soccer ball to get credit for possession (Ng et al., 1999). During the performance
phase, the system may succumb to a number of causes (Pistono and Yampolskiy, 2016;
Scharre, 2016; Yampolskiy, 2016b) all leading to an AI Failure.
Media reports are full of examples of AI failure, but most of these examples can be
attributed to other causes on closer examination, such as bugs in code or mistakes in
design. The list below is curated to only mention failures of intended intelligence, not
general software faults. Additionally, the examples below include only the first occurrence of
a particular failure, but the same problems are frequently observed again in later years; for
example, self-driving cars have been reported to have multiple deadly accidents. Under the
label of failure, we include any occasion/instance where the performance of an AI does not
reach acceptable level of performance. Finally, the list does not include AI failures because
of hacking or other intentional causes. Still, the timeline of AI failures has an exponential
trend while implicitly (via gaps in the record) indicating historical events such as “AI Winter.”
In its most extreme interpretation, any software with as much as an “if statement” can be
considered a form of narrow artificial intelligence (NAI), and all of its bugs are thus
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
examples of AI failure:[7].
1958 Advice software deduced inconsistent sentences using logical programming
(Hewitt, 1958).
1959 AI designed to be a General Problem Solver failed to solve real-world
problems[8].
1977 Story writing software with limited common sense produced “wrong” stories
(Meehan, 1977).
1982 Software designed to make discoveries, discovered how to cheat instead[9].
1983 Nuclear attack early warning system falsely claimed that an attack is taking place[10].
1984 The National Resident Match program was biased in placement of married
couples (Friedman and Nissenbaum, 1996).
1988 Admissions software discriminated against women and minorities (Lowry and
Macpherson, 1988).
1994 Agents learned to “walk” quickly by becoming taller and falling over (Sims, 1994).
2005 Personal assistant AI rescheduled a meeting 50 times, each time by 5 min
(Tambe, 2008).
2006 Insider threat detection system classified normal activities as outliers (Liu et al.,
2006).
2006 Investment advising software lost money when deployed to real trading
(Gunderson and Gunderson, 2006).
2010 Complex AI stock trading software caused a trillion dollar flash crash[11].
2011 E-Assistant told to “call me an ambulance” began to refer to the user as
Ambulance[12].
2013 Object recognition neural networks saw phantom objects in particular noise
images (Szegedy et al., 2013).
2013 Google software engaged in name-based discrimination in online ad delivery
(Sweeney, 2013).
j FORESIGHT j
2014 Search engine autocomplete made bigoted associations about groups of users
(Diakopoulos, 2013).
2014 Smart fire alarm failed to sound alarm during fire[13].
2015 Automated e-mail reply generator created inappropriate responses[14].
2015 A robot for grabbing auto parts grabbed and killed a man[15].
2015 Image tagging software classified black people as gorillas[16].
2015 Medical expert AI classified patients with asthma as lower risk (Caruana et al.,
2015).
2015 Adult content filtering software failed to remove inappropriate content[17].
2015 Amazon’s Echo responded to commands from TV voices[18].
2016 Linkedin’s name lookup suggests male names in place of female ones[19].
2016 AI designed to predict recidivism acted racist[20].
2016 AI agent exploited reward signal to win without completing the game course[21].
2016 Passport picture checking system flagged Asian user as having closed eyes[22].
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
j FORESIGHT j
2017 Alexa turned on loud music at night without being prompted to do so[41].
2017 AI for writing Christmas carols produced nonsense[42].
2017 Apple’s face recognition system failed to distinguish Asian users[43].
2017 Facebook’s translation software changed Yampolskiy to Polanski, see Figure 1.
2018 Google Assistant created bizarre merged photo[44].
2018 Robot store assistant was not helpful with responses like “cheese is in the fridges”[45].
Spam filters block important e-mails, GPS provides faulty directions, machine translation
corrupts meaning of phrases, autocorrect replaces a desired word with a wrong one,
biometric systems misrecognize people and transcription software fails to capture what is
being said; overall, it is harder to find examples of AIs that never fail. Depending on what we
consider for inclusion as examples of problems with intelligent software, the list of examples
could be grown almost indefinitely.
Analyzing the list of narrow AI failures, from the inception of the field to modern day
systems, we can arrive at a simple generalization: an AI designed to do X will eventually fail
to do X. Although it may seem trivial, it is a powerful generalization tool, which can be used
to predict future failures of NAIs. For example, looking at cutting-edge current and future
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
Others have given the following examples of possible accidents with A(G)I/
superintelligence:
Housekeeping robot cooks family pet for dinner[46].
A mathematician AGI converts all matter into computing elements to solve problems[47].
An AGI running simulations of humanity creates conscious being who suffer (Armstrong
et al., 2012).
Figure 1 While translating from Polish to English, Facebook’s software changed Roman
“Yampolskiy” to Roman “Polanski” because of statistically higher frequency of the
later name in sample texts
j FORESIGHT j
Paperclip manufacturing AGI fails to stop and converts universe into raw materials
(Bostrom, 2003).
A scientist AGI performs experiments with significant negative impact on biosphere
(Taylor et al., 2016).
Drug design AGI develops time-delayed poison to kill everyone and so defeat
cancer[48].
Future superintelligence optimizes away all consciousness[49].
AGI kills humanity and converts universe into materials for improved handwriting[50].
AGI designed to maximize human happiness tiles universe with tiny smiley faces
(Yudkowsky, 2011).
AGI instructed to maximize pleasure consigns humanity to a dopamine drip (Marcus,
2012).
Superintelligence may rewire human brains to increase their perceived satisfaction
(Yudkowsky, 2011).
Denning and Denning made some similar error extrapolations in their humorous paper on
“artificial stupidity,” which likewise illustrates similar trend (Denning and Denning, 2004):
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
j FORESIGHT j
non-representative training data;
discrepancy between training and testing data;
rule overgeneralization or application of population statistics to individuals;
inability to handle noise or statistical outliers;
not testing for rare or extreme conditions;
not realizing an alternative solution method can produce same results, but with side
effects;
letting users control data or learning process;
no security mechanism to prevent adversarial meddling;
no cultural competence/common sense;
limited access to information/sensors;
mistake in design and inadequate testing;
limited ability for language disambiguation; and
inability to adopt to changes in environment
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
With bias being a common current cause of failure, it is helpful to analyze particular types of
algorithmic bias. Friedman and Nissenbaum (Friedman and Nissenbaum, 1996) proposed
the following framework for analyzing bias in computer systems. They subdivide causes of
bias into three categories – preexisting bias, technical bias and emergent bias:
Preexisting bias reflects bias in society and social institutions, practices and attitudes.
The system simply preserves an existing state the world and automates application of
bias as currently exists.
Technical bias appears because of hardware or software limitations of the system itself.
Emergent bias emerges after the system is deployed because of changing societal
standards.
Many of the observed AI failures are similar to mishaps experienced by little children. This is
particularly true for artificial neural networks, which are at the cutting edge of machine
learning (ML). One can say that children are untrained neural networks deployed on real
data and observing them can teach us a lot about predicting and preventing issues with
ML. A number of research groups (Amodei et al., 2016; Taylor et al., 2016) have
investigated ML-related topics with corresponding equivalence in behavior of developing
humans, and here we have summarized their work and mapped it onto similar situations
with children (Table I).
A majority of research currently taking place to prevent such issues is currently happening
under the label of “AI Safety.”
3. AI Safety
In 2010, we coined the phrase “Artificial Intelligence Safety Engineering” and its shorthand
notation “AI Safety” to give a name to a new direction of research we were advocating. We
formally presented our ideas on AI Safety at a peer-reviewed conference in 2011
(Yampolskiy, 2011a, b), with subsequent publications on the topic in 2012 (Yampolskiy and
Fox, 2012), 2013 (Muehlhauser and Yampolskiy, 2013; Yampolskiy, 2013a), 2014 (Majot
and Yampolskiy, 2014), 2015 (Yampolskiy, 2015), 2016 (Pistono and Yampolskiy, 2016;
Yampolskiy, 2016b), 2017 (Yampolskiy, 2017) and 2018 (Brundage et al., 2018;
Ramamoorthy and Yampolskiy, 2017). It is possible that someone used the phrase
j FORESIGHT j
Table I ML concepts and corresponding notions in childhood development
Concept in ML Equivalent in child development
informally before, but to the best of our knowledge, we were the first to use it[51] in a peer-
reviewed publication and to bring it popularity. Before that, the most common names for the
field of machine control were “Machine Ethics” (Moor, 2006) or “Friendly AI” (Yudkowsky,
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
j FORESIGHT j
4. Cybersecurity vs. AI Safety
Bruce Schneier has said, “If you think technology can solve your security problems then you
don’t understand the problems and you don’t understand the technology”[66]. Salman
Rushdie made a more general statement: “There is no such thing as perfect security, only
varying levels of insecurity”[67]. We propose what we call the Fundamental Thesis of
Security – Every security system will eventually fail; there is no such thing as a 100 per cent
secure system. If your security system has not failed, just wait longer.
In theoretical computer science, a common way of isolating the essence of a difficult
problem is via the method of reduction to another, sometimes better analyzed, problem
(Karp, 1972; Yampolskiy, 2013c; Yampolskiy, 2012a, 2012b). If such a reduction is a
possibility and is computationally efficient (Yampolskiy, 2013b), such a reduction implies
that if the better analyzed problem is somehow solved, it would also provide a working
solution for the problem we are currently dealing with. The more general problem of AGI
Safety must contain a solution to the more narrow problem of making sure a particular
human is safe for other humans. We call this the Safe Human Problem[68]. Formally such a
reduction can be done via a restricted Turing test in the domain of safety in a manner
identical to how AI-Completeness of a problem could be established (Yampolskiy, 2013c;
Yampolskiy, 2011a, b). Such formalism is beyond the scope of this chapter, so we simply
point out that in both cases, we have at least a human-level intelligent agent capable of
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
influencing its environment, and we would like to make sure that the agent is safe and
controllable. Although in practice, changing the design of a human via DNA manipulation is
not as simple as changing the source code of an AI, theoretically it is just as possible.
It is observed that humans are not completely safe to themselves and others. Despite a
millennia of attempts to develop safe humans via culture, education, laws, ethics,
punishment, reward, religion, relationships, family, oaths, love and even eugenics, success
is not within reach. Humans kill and commit suicide, lie and betray, steal and cheat,
possibly in proportion to how much they can get away with. Truly powerful dictators will
enslave, commit genocide, break every law and violate every human right. It is famously
stated that a human without a sin cannot be found. The best we can hope for is to reduce
such unsafe tendencies to levels that our society can survive. Even with advanced genetic
engineering (Yampolskiy, 2016a), the best we can hope for is some additional reduction in
how unsafe humans are. As long as we permit a person to have choices (free will), they can
be bribed, they will deceive, they will prioritize their interests above those they are
instructed to serve and they will remain fundamentally unsafe. Despite being trivial
examples of a solution to the value learning problem (VLP) (Dewey, 2011; Soares and
Fallenstein, 2014; Sotala, 2016), human beings are anything but safe, bringing into question
our current hope that solving VLP will get us to Safe AI. This is important. To quote Bruce
Schneier, “Only amateurs attack machines; professionals target people.” Consequently, we
see AI Safety research as, at least partially, an adversarial field similar to cryptography or
security[69].
If a cybersecurity system fails, the damage is unpleasant but tolerable in most cases:
someone loses money, someone loses privacy or maybe somebody loses their life. For
narrow AIs, safety failures are at the same level of importance as in general cybersecurity,
but for AGI it is fundamentally different. A single failure of a superintelligent system may
cause an existential risk event. If an AGI Safety mechanism fails, everyone may lose
everything, and all biological life in the universe is potentially destroyed. With cybersecurity
systems, you will get another chance to get it right or at least do better. With AGI Safety
system, you only have one chance to succeed, so learning from failure is not an option.
Worse, a typical security system is likely to fail to a certain degree, e.g. perhaps, only a
small amount of data will be compromised. With an AGI Safety system, failure or success is
a binary option: either you have a safe and controlled superintelligence or you do not. The
goal of cybersecurity is to reduce the number of successful attacks on the system; the goal
j FORESIGHT j
of AI Safety is to make sure zero attacks by superintelligent AI succeed in bypassing the
safety mechanisms. For that reason, ability to distinguish NAI projects from potential AGI
projects (Baum, 2017) is an open problem of fundamental importance in the AI safety field.
The problems are many. We have no way to monitor, visualize or analyze the performance
of superintelligent agents. More trivially, we do not even know what to expect after such a
software starts running. Should we see immediate changes to our environment? Should we
see nothing? What is the timescale on which we should be able to detect something? Will it
be too quick to notice or are we too slow to realize something is happening (Yudkowsky and
Hanson, 2008)? Will the impact be locally observable or impact distant parts of the world?
How does one perform standard testing? On what data sets? What constitutes an “Edge
Case” for general intelligence? The questions are many, but the answers currently do not
exist. Additional complications will come from the interaction between intelligent software
and safety mechanisms designed to keep AI Safe and secure. We will also have to
somehow test all the AI Safety mechanisms currently in development. Although AI is at
human levels, some testing can be done with a human agent playing the role of the artificial
agent (Yudkowsky, 2002). At levels beyond human capacity, adversarial testing does not
seem to be realizable with today’s technology. More significantly, only one test run would
ever be possible.
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
5. Conclusions
The history of robotics and artificial intelligence in many ways is also the history of
humanity’s attempts to control such technologies. From the Golem of Prague to the military
robots of modernity, the debate continues as to what degree of independence such entities
should have and how to make sure that they do not turn on us, their inventors. Numerous
recent advancements in all aspects of research, development and deployment of intelligent
systems are well publicized but safety and security issues related to AI are rarely
addressed. It is our hope that this paper will allow us to better understand how AI systems
can fail and what we can expect from such systems in the future, allowing us to better
prepare an appropriate response.
Notes
1. https://intelligence.org/2014/01/28/how-big-is-ai/
2. https://en.wikipedia.org/wiki/2010_Flash_Crash
3. https://electrek.co/2016/05/26/tesla-model-s-crash-autopilot-video/
4. https://en.wikipedia.org/wiki/Tay_(bot)
5. https://en.wikipedia.org/wiki/Kenji_Urada
6. http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/
7. https://en.wikipedia.org/wiki/List_of_software_bugs
8. https://en.wikipedia.org/wiki/General_Problem_Solver
9. http://aliciapatterson.org/stories/eurisko-computer-mind-its-own
10. https://en.wikipedia.org/wiki/1983_Soviet_nuclear_false_alarm_incident
11. http://gawker.com/this-program-that-judges-use-to-predict-future-crimes-s-1778151070
12. www.technologyreview.com/s/601897/tougher-turing-test-exposes-chatbots-stupidity/
13. www.forbes.com/sites/aarontilley/2014/04/03/googles-nest-stops-selling-its-smart-smoke-alarm-
for-now
14. https://gmail.googleblog.com/2015/11/computer-respond-to-this-email.html
15. http://time.com/3944181/robot-kills-man-volkswagen-plant/
16. www.huffingtonpost.com/2015/07/02/google-black-people-goril_n_7717008.html
j FORESIGHT j
17. http://blogs.wsj.com/digits/2015/05/19/googles-youtube-kids-app-criticized-for-inappropriate-
content/
18. https://motherboard.vice.com/en_us/article/53dz8x/people-are-complaining-that-amazon-echo-is-
responding-to-ads-on-tv
19. www.seattletimes.com/business/microsoft/how-linkedins-search-engine-may-reflect-a-bias
20. www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
21. https://openai.com/blog/faulty-reward-functions
22. www.telegraph.co.uk/technology/2016/12/07/robot-passport-checker-rejects-asian-mans-photo-
having-eyes
23. www.kotaku.co.uk/2016/06/03/elites-ai-created-super-weapons-and-started-hunting-players-
skynet-is-here
24. www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-
black-people
25. https://en.wikipedia.org/wiki/The_DAO_(organization)
26. www.latimes.com/local/lanow/la-me-ln-crimefighting-robot-hurts-child-bay-area-20160713-snap-
story.html
27. www.engadget.com/2016/03/13/google-alphago-loses-to-human-in-one-match/
28. www.theguardian.com/technology/2016/jul/01/tesla-driver-killed-autopilot-self-driving-car-harry-
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
potter
29. www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
30. https://splinternews.com/black-teenagers-vs-white-teenagers-why-googles-algori-1793857436
31. www.japantimes.co.jp/news/2016/11/15/national/ai-robot-fails-get-university-tokyo
32. www.themarshallproject.org/2016/02/03/policing-the-future
33. www.entrepreneur.com/video/287281
34. www.boredpanda.com/funny-amazon-ai-designed-phone-cases-fail
35. www.bbc.com/future/story/20170410-how-to-fool-artificial-intelligence
36. www.abc.net.au/news/2017-04-10/centrelink-debt-recovery-system-lacks-transparency-
ombudsman/8430184
37. https://techcrunch.com/2017/10/24/another-ai-chatbot-shown-spouting-offensive-views
38. www.gizmodo.co.uk/2017/04/faceapp-blames-ai-for-whitening-up-black-people
39. https://motherboard.vice.com/en_us/article/j5jmj8/google-artificial-intelligence-bias
40. https://medium.com/@gidishperber/what-ive-learned-from-kaggle-s-fisheries-competition-
92342f9ca779
41. http://mashable.com/2017/11/08/amazon-alexa-rave-party-germany
42. http://mashable.com/2017/12/22/ai-tried-to-write-christmas-carols
43. www.mirror.co.uk/tech/apple-accused-racism-after-face-11735152
44. https://qz.com/1188170/google-photos-tried-to-fix-this-ski-photo
45. www.iflscience.com/technology/store-hires-robot-to-help-out-customers-robot-gets-fired-for-
scaring-customers-away
46. www.theguardian.com/sustainable-business/2015/jun/23/the-ethics-of-ai-how-to-stop-your-robot-
cooking-your-cat
47. https://intelligence.org/2014/11/18/misconceptions-edge-orgs-conversation-myth-ai
48. https://80000hours.org/problem-profiles/positively-shaping-artificial-intelligence
49. http://slatestarcodex.com/2014/07/13/growing-children-for-bostroms-disneyland
50. https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html
51. Term “Safe AI” has been used as early as 1995 (Rodd, 1995).
52. www.cmu.edu/safartint/
j FORESIGHT j
53. https://selfawaresystems.com/2015/07/11/formal-methods-for-ai-safety/
54. https://intelligence.org/2014/08/04/groundwork-ai-safety-engineering/
55. http://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/new-ai-safety-projects-get-
funding-from-elon-musk
56. http://globalprioritiesproject.org/2015/08/quantifyingaisafety/
57. http://futureoflife.org/2015/10/12/ai-safety-conference-in-puerto-rico/
58. http://rationality.org/waiss/
59. http://gizmodo.com/satya-nadella-has-come-up-with-his-own-ai-safety-rules-1782802269
60. https://80000hours.org/career-reviews/artificial-intelligence-risk-research/
61. https://openai.com/blog/concrete-ai-safety-problems/
62. http://lesswrong.com/lw/n4l/safety_engineering_target_selection_and_alignment/
63. www.waise2018.com/
64. www.whitehouse.gov/blog/2016/05/03/preparing-future-artificial-intelligence
65. http://acritch.com/fhi-positions/
66. www.brainyquote.com/quotes/bruce_schneier_182286
67. www.brainyquote.com/quotes/salman_rushdie_580407
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
68. Similarly a Safe Animal Problem may be of interest (can a Pitbull be guaranteed safe?).
69. The last thing we want is to be in an adversarial situation with a superintelligence, but unfortunately
we may not have a choice in the matter. It seems that long-term AI Safety cannot succeed but
neither does it have the luxury of a partial fail.
References
Abecker, A., Alami, R., Baral, C., Bickmore, T., Durfee, E., Fong, T. and Lebiere, C. (2006), “AAAI 2006
spring symposium reports”, AI Magazine, Vol. 27 No. 3, p. 107.
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J. and Mané, D. (2016), “Concrete
problems in AI safety”, arXiv preprint arXiv:1606.06565.
Armstrong, S. and Yampolskiy, R.V. (2016), “Security solutions for intelligent and complex systems”,
Security Solutions for Hyperconnectivity and the Internet of Things, IGI Global, Hershey, PA, pp. 37-88.
Armstrong, S., Sandberg, A. and Bostrom, N. (2012), “Thinking inside the box: controlling and using an
oracle ai”, Minds and Machines, Vol. 22 No. 4, pp. 299-324.
Babcock, J., Kramar, J. and Yampolskiy, R. (2016a), “The AGI containment problem”, Paper presented at
the The Ninth Conference on Artificial General Intelligence (AGI2015).
Babcock, J., Kramar, J. and Yampolskiy, R. (2016b), “The AGI containment problem”, arXiv preprint
arXiv:1604.00545.
Baum, S. (2017), “A survey of artificial general intelligence projects for ethics, risk, and policy”, Global
Catastrophic Risk Institute Working Paper 17-1.
Bostrom, N. (2003), “Ethical issues in advanced artificial intelligence”, Science Fiction and Philosophy:
From Time Travel to Superintelligence, pp. 277-284.
Bostrom, N. (2014), Superintelligence: Paths, Dangers, Strategies, Oxford University Press, New York,
NY.
Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B. and Filar, B. (2018), “The malicious
use of artificial intelligence: forecasting, prevention, and mitigation”, arXiv preprint arXiv:1802.07228.
Caliskan, A., Bryson, J.J. and Narayanan, A. (2017), “Semantics derived automatically from language
corpora contain human-like biases”, Science, Vol. 356 No. 6334, pp. 183-186.
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M. and Elhadad, N. (2015), “Intelligible models for
healthcare: predicting pneumonia risk and hospital 30-day readmission”, Paper presented at the
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining.
j FORESIGHT j
Chessen, M. (2017), The MADCOM Future, Atlantic Council, available at: www.atlanticcouncil.org/
publications/reports/the-madcom-future
Denning, D.E. and Denning, P.J. (2004), “Artificial stupidity”, Communications of The ACM, Vol. 4 No. 5,
p. 112.
Dewey, D. (2011), “Learning what to value”, Artificial General Intelligence, pp. 309-314.
Diakopoulos, N. (2013), “Algorithmic defamation: the case of the shameless autocomplete”, Blog post,
available at: www.nickdiakopoulos.com/2013/08/06/algorithmic-defamation-the-case-of-the-shameless-
autocomplete/
Friedman, B. and Nissenbaum, H. (1996), “Bias in computer systems”, ACM Transactions on Information
Systems (TOIS), Vol. 14 No. 3, pp. 330-347.
Gloor, L. (2016), “Suffering-focused AI safety: why “fail-safe” measures might be our top intervention”,
Retrieved from.
Gunderson, J. and Gunderson, L. (2006), “And then the phone rang”, Paper presented at the AAAI
Spring Symposium: What Went Wrong and Why: Lessons from AI Research and Applications.
Hewitt, C. (1958), “Development of logic programming: what went wrong, what was done about it, and
what it might mean for the future”.
Karp, R.M. (1972), “Reducibility among combinatorial problems”, In Miller, R. E. and Thatcher, J. W.
(Eds), Complexity of Computer Computations, Plenum, New York, NY, pp. 85-103.
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
Krishnan, A. (2009), Killer Robots: Legality and Ethicality of Autonomous Weapons, Ashgate Publishing,
Farnham.
Liu, A., Martin, C.E., Hetherington, T. and Matzner, S. (2006), “AI lessons learned from experiments in
insider threat detection”, Paper presented at the AAAI Spring Symposium: What Went Wrong and Why:
Lessons from AI Research and Applications.
Lowry, S. and Macpherson, G. (1988), “A blot on the profession”, British Medical Journal (Clinical
Research ed.).), Vol. 296 No. 6623, p. 657.
Majot, A.M. and Yampolskiy, R.V. (2014), “AI safety engineering through introduction of self-reference
into felicific calculus via artificial pain and pleasure”, Paper presented at the IEEE International
Symposium on Ethics in Science, Technology and Engineering, Chicago, IL.
Marcus, G. (2012), Moral Machines, The New Yorker, New York, NY, p. 24.
Marling, C. and Chelberg, D. (2008), “RoboCup for the mechanically, athletically and culturally
challenged”.
Meehan, J.R. (1977), “TALE-SPIN, an interactive program that writes stories”, Paper presented at the IJCAI.
Moor, J.H. (2006), “The nature, importance, and difficulty of machine ethics”, IEEE Intelligent Systems,
Vol. 21 No. 4, pp. 18-21.
Murphy, T. (2013), “The first level of Super Mario Bros. is easy with lexicographic orderings and time
travel”, The Association for Computational Heresy (SIGBOVIK), pp. 112-133.
Ng, A.Y., Harada, D. and Russell, S. (1999), “Policy invariance under reward transformations: theory and
application to reward shaping”, Paper presented at the ICML.
Pistono, F. and Yampolskiy, R.V. (2016), “Unethical research: how to create a malevolent artificial
intelligence”, arXiv preprint arXiv:1605.02817.
Pistono, F. and Yampolskiy, R.V. (2016), “Unethical research: how to create a malevolent artificial
intelligence”, Paper presented at the 25th International Joint Conference on Artificial Intelligence (IJCAI-16),
Ethics for Artificial Intelligence Workshop (AI-Ethics-2016), New York, NY.
Ramamoorthy, A. and Yampolskiy, R. (2017), “Beyond mad?: The race for artificial general intelligence”,
ITU Journal: ICT Discoveries, No. 1, available at: www.itu.int/en/journal/001/Pages/09.aspx
Randløv, J. and Alstrøm, P. (1998), “Learning to drive a bicycle using reinforcement learning and
shaping”, Paper presented at the ICML.
j FORESIGHT j
Ribeiro, M.T., Singh, S. and Guestrin, C. (2016), “Why should i trust you?: Explaining the predictions of
any classifier”, Paper presented at the Proceedings of the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining.
Rodd, M. (1995), “Safe AI – is this possible?”, Engineering Applications of Artificial Intelligence, Vol. 8
No. 3, pp. 243-250.
Rychtyckyj, N. and Turski, A. (2008), “Reasons for success (and failure) in the development and
deployment of AI systems”, Paper presented at the AAAI 2008 workshop on What Went Wrong and Why.
Scharre, P. (2016), “Autonomous weapons and operational risk”, Paper presented at the Center for a New
American Society, Washington DC.
Shalev-Shwartz, S., Shamir, O. and Shammah, S. (2017), “Failures of gradient-based deep learning”,
Paper presented at the International Conference on Machine Learning.
Shapiro, D. and Goker, M.H. (2008), “Advancing AI research and applications by learning from what went
wrong and why”, AI Magazine, Vol. 29 No. 2, pp. 9-10.
Sims, K. (1994), “Evolving virtual creatures”, Paper presented at the Proceedings of the 21st annual
conference on Computer graphics and interactive techniques.
Soares, N. and Fallenstein, B. (2014), Aligning superintelligence with human interests: a technical
research agenda, Machine Intelligence Research Institute (MIRI) Technical Report, 8.
Sotala, K. (2016), “Defining human values for value learners”, Paper presented at the 2nd International
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
Taylor, J., Yudkowsky, E., LaVictoire, P. and Critch, A. (2016), Alignment for Advanced Machine Learning
Systems, Machine Intelligence Research Institute, Berkeley, CA.
Yampolskiy, R.V. (2011a), “AI-Complete CAPTCHAs as zero knowledge proofs of access to an artificially
intelligent system”, ISRN Artificial Intelligence, 271878.
Yampolskiy, R.V. (2011b), “Artificial intelligence safety engineering: why machine ethics is a wrong
approach”, Paper presented at the Philosophy and Theory of Artificial Intelligence (PT-AI2011),
Thessaloniki, Greece.
Yampolskiy, R.V. (2012a), “Leakproofing the singularity artificial intelligence confinement problem”,
Journal of Consciousness Studies, Vol. 19 Nos 1/2, pp. 1-2.
Yampolskiy, R.V. (2012b), “AI-Complete, AI-Hard, or AI-Easy–classification of problems in AI”, The 23rd
Midwest Artificial Intelligence and Cognitive Science Conference, Cincinnati, OH, USA.
Yampolskiy, R.V. (2013a), “Artificial intelligence safety engineering: why machine ethics is a wrong
approach”, Philosophy and Theory of Artificial Intelligence, Springer, Berlin Heidelberg,
pp. 389-396.
Yampolskiy, R.V. (2013b), “Efficiency theory: a unifying theory for information, computation and
intelligence”, Journal of Discrete Mathematical Sciences & Cryptography, Vol. 16 Nos 4/5, pp. 259-277.
Yampolskiy, R.V. (2013c), “Turing test as a defining feature of AI-Completeness”, in Yang, X.-S. (Ed.),
Artificial Intelligence, Evolutionary Computing and Metaheuristics, Springer, Berlin Heidelberg, Vol. 427,
pp. 3-17.
Yampolskiy, R.V. (2015), Artificial Superintelligence: A Futuristic Approach, Chapman and Hall/CRC,
Boca Raton, FL.
Yampolskiy, R.V. (2016a), “On the origin of samples: attribution of output to a particular algorithm”, arXiv
preprint arXiv:1608.06172.
Yampolskiy, R.V. (2016b), “Taxonomy of pathways to dangerous artificial intelligence”, Paper presented
at the Workshops at the Thirtieth AAAI Conference on Artificial Intelligence.
j FORESIGHT j
Yampolskiy, R.V. (2017), “What are the ultimate limits to computational techniques: verifier theory and
unverifiability”, Physica Scripta, Vol. 92 No. 9, p. 093001.
Yampolskiy, R.V. and Fox, J. (2012), “Safety engineering for artificial general intelligence”, Topoi. Special
Issue on Machine Ethics & the Ethics of Building Intelligent Machines.
Yudkowsky, E. (2001), “Creating friendly AI 1.0: the analysis and design of benevolent goal
architectures”, Singularity Institute for Artificial Intelligence, San Francisco, CA, 15 June.
Yudkowsky, E. (2002), The AI-Box Experiment, available at: http://yudkowsky.net/singularity/aibox
Yudkowsky, E. (2008), “Artificial intelligence as a positive and negative factor in global risk”, Global
Catastrophic Risks, Vol. 1, p. 303.
Yudkowsky, E. (2011), “Complex value systems in friendly AI”, Artificial General Intelligence,
pp. 388-393.
Yudkowsky, E. and Hanson, R. (2008), “The Hanson-Yudkowsky AI-foom debate”, Paper presented at
the MIRI Technical Report, available at: http://intelligence.org/files/AIFoomDebate.pdf
Corresponding author
Roman V. Yampolskiy can be contacted at: [email protected]
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: [email protected]
j FORESIGHT j