Human-Level Control Through Deep Reinforcement Learning - Nature
Human-Level Control Through Deep Reinforcement Learning - Nature
Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra,
Shane Legg & Demis Hassabis
Abstract
Access options
to $39.95
Learn more
Learn more
Prices may be subject to local taxes which are calculated during checkout
Log in
Learn about institutional subscriptions
Read our FAQs
https://www.nature.com/articles/nature14236 2/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
References
4 Serre, T., Wolf, L. & Poggio, T. Object recognition with features inspired by visual
cortex. Proc. IEEE. Comput. Soc. Conf. Comput. Vis. Pattern. Recognit. 994–1000
(2005)
6 Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–
68 (1995)
7 Riedmiller, M., Gabel, T., Hafner, R. & Lange, S. Reinforcement learning for robot
soccer. Auton. Robots 27, 55–73 (2009)
https://www.nature.com/articles/nature14236 3/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
9 Bengio, Y. Learning deep architectures for AI. Foundations and Trends in Machine
Learning 2, 1–127 (2009)
12 Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning
environment: An evaluation platform for general agents. J. Artif. Intell. Res. 47,
253–279 (2013)
14 Genesereth, M., Love, N. & Pell, B. General game playing: overview of the AAAI
competition. AI Mag. 26, 62–72 (2005)
17 LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to
document recognition. Proc. IEEE 86, 2278–2324 (1998)
https://www.nature.com/articles/nature14236 4/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
22 O’Neill, J., Pleydell-Bouverie, B., Dupret, D. & Csicsvari, J. Play it again: reactivation
of waking experience and memory. Trends Neurosci. 33, 220–229 (2010)
23 Lin, L.-J. Reinforcement learning for robots using neural networks. Technical
Report, DTIC Document. (1993)
https://www.nature.com/articles/nature14236 5/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
27 Law, C.-T. & Gold, J. I. Reinforcement learning can account for associative and
perceptual learning on a visual decision task. Nature Neurosci. 12, 655 (2009)
31 Jarrett, K., Kavukcuoglu, K., Ranzato, M. A. & LeCun, Y. What is the best multi-
stage architecture for object recognition? Proc. IEEE. Int. Conf. Comput. Vis. 2146–
2153 (2009)
Acknowledgements
We thank G. Hinton, P. Dayan and M. Bowling for discussions, A. Cain and J. Keene for
work on the visuals, K. Keller and P. Rogers for help with the visuals, G. Wayne for
https://www.nature.com/articles/nature14236 6/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
comments on an earlier version of the manuscript, and the rest of the DeepMind team
for their support, ideas and encouragement.
Author information
Volodymyr Mnih, Koray Kavukcuoglu and David Silver: These authors contributed
equally to this work.
Contributions
V.M., K.K., D.S., J.V., M.G.B., M.R., A.G., D.W., S.L. and D.H. conceptualized the problem
and the technical framework. V.M., K.K., A.A.R. and D.S. developed and tested the
algorithms. J.V., S.P., C.B., A.A.R., M.G.B., I.A., A.K.F., G.O. and A.S. created the testing
platform. K.K., H.K., S.L. and D.H. managed the project. K.K., D.K., D.H., V.M., D.S.,
A.G., A.A.R., J.V. and M.G.B. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
https://www.nature.com/articles/nature14236 7/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
https://www.nature.com/articles/nature14236 8/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
Note, the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
Extended Data Table 3 The effects of replay and separating the target Q-
network
Supplementary information
Supplementary Information
This file contains a Supplementary Discussion. (PDF 110 kb)
https://www.nature.com/articles/nature14236 9/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
the top left of the screen (maximum for clearing one screen is 448 points), number of
lives remaining is shown in the middle (starting with 5 lives), and the “1” on the top
right indicates this is a 1-player game. (MOV 1500 kb)
PowerPoint slides
Issue Date
26 February 2015
DOI
https://doi.org/10.1038/nature14236
https://www.nature.com/articles/nature14236 10/11
1/26/24, 7:08 PM Human-level control through deep reinforcement learning | Nature
Comparing explanations in RL
Britt Davis Pierson, Dustin Arendt ... Matthew E. Taylor
https://www.nature.com/articles/nature14236 11/11