HRL in SC2 With Human Expertise in Subgoals Selections
HRL in SC2 With Human Expertise in Subgoals Selections
Table 1: Hyperparameters
BM CMAG
Learning rate 0.0007 0.0007
Batch size 32 32
Trajectory length 40 40
Off-policy learning algorithm PPO PPO
Reward thresholds [7,7,7,2] [300,5,5,5,500]
Acknowledgments
This work was partially supported by an Academic Research
Grant T1 251RES1827 from the Ministry of Education in
Singapore and a grant from the Advanced Robotics Center
at the National University of Singapore.
References
Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong,
R.; Welinder, P.; McGrew, B.; Tobin, J.; Pieter Abbeel, O.;
and Zaremba, W. 2017. Hindsight experience replay. In
Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fer-
gus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances
in Neural Information Processing Systems 30. Curran Asso-
ciates, Inc. 5048–5058.
Figure 5: Build Marines learning curve (best agent).
Andrychowicz, M.; Baker, B.; Chociej, M.; Józefowicz, R.;
McGrew, B.; Pachocki, J.; Petron, A.; Plappert, M.; Powell,
G.; Ray, A.; Schneider, J.; Sidor, S.; Tobin, J.; Welinder, P.;
Weng, L.; and Zaremba, W. 2020. Learning dexterous in-
hand manipulation. International Journal of Robotics Re-
search 39(1):3–20.
Bakker, B., and Schmidhuber, J. 2004. Hierarchical rein-
forcement learning based on subgoal discovery and subpol-
icy specialization. In Proceedings of the 8-th Conference on
Intelligent Autonomous Systems, IAS-8, 438–445.
Barto, A. G., and Mahadevan, S. 2003. Recent advances
in hierarchical reinforcement learning. Discrete Event Dy-
namic Systems 13(12):4177.
Bengio, Y.; Louradour, J.; Collobert, R.; and Weston, J.
2009. Curriculum learning. In Proceedings of the 26th An-
nual International Conference on Machine Learning, ICML
’09, 4148. New York, NY, USA: Association for Computing
Machinery.
Buch, V.; Ahmed, I.; and Maruthappu, M. 2018. Artificial
intelligence in medicine: Current trends and future possibil-
ities. British Journal of General Practice 68:143–144.
Cath, C. 2018. Governing artificial intelligence: Ethical, le-
gal and technical opportunities and challenges. Philosophi-
cal Transactions of The Royal Society A Mathematical Phys-
Figure 6: Build Marines learning curve (worst agent). ical and Engineering Sciences 376:20180080.
Hauskrecht, M.; Meuleau, N.; Kaelbling, L. P.; Dean, T.; and
Boutilier, C. 1998. Hierarchical solution of markov decision
Conclusion & Future Work processes using macro-actions. In Proceedings of the Four-
In this work, we examined the SC2 minigames and proposed teenth Conference on Uncertainty in Artificial Intelligence,
a way to introduce human expertise to an HRL framework. UAI98, 220229. San Francisco, CA, USA: Morgan Kauf-
By designing customized minigames to facilitate learning mann Publishers Inc.
and leveraging the effectiveness of hierarchical structures in Hengst, B. 2010. Hierarchical reinforcement learning. In
decomposing complex and large problems, we empirically Sammut, C., and Webb, G. I., eds., Encyclopedia of Machine
showed that our approach is sample-efficient and enhances Learning. Boston, MA: Springer US. 495–502.
Justesen, N.; Torrado, R. R.; Bontrager, P.; Khalifa, A.; To- Leach, M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis, D.
gelius, J.; and Risi, S. 2018. Illuminating generalization in 2016. Mastering the game of go with deep neural networks
deep reinforcement learning through procedural level gener- and tree search. Nature 529:484–489.
ation. In NeurIPs Workshop on Deep Reinforcement Learn- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.;
ing. Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton,
Lee, D.; Tang, H.; Zhang, J. O.; Xu, H.; Darrell, T.; and A.; Chen, Y.; Lillicrap, T.; Hui, F.; Sifre, L.; van den Driess-
Abbeel, P. 2018. Modular architecture for starcraft II with che, G.; Graepel, T.; and Hassabis, D. 2017. Master-
deep reinforcement learning. In Rowe, J. P., and Smith, G., ing the game of go without human knowledge. Nature
eds., Proceedings of the Fourteenth AAAI Conference on Ar- 550(7676):354–359.
tificial Intelligence and Interactive Digital Entertainment, Singh, A.; Yang, L.; Finn, C.; and Levine, S. 2019. End-to-
AIIDE 2018, November 13-17, 2018, Edmonton, Canada, end robotic reinforcement learning without reward engineer-
187–193. AAAI Press. ing. In Bicchi, A.; Kress-Gazit, H.; and Hutchinson, S., eds.,
Levy, A.; Konidaris, G. D.; Jr., R. P.; and Saenko, K. 2019. Robotics: Science and Systems XV, University of Freiburg,
Learning multi-level hierarchies with hindsight. In 7th In- Freiburg im Breisgau, Germany, June 22-26, 2019.
ternational Conference on Learning Representations, ICLR Sutton, R. S., and Barto, A. G. 2018. Reinforcement Learn-
2019, New Orleans, LA, USA, May 6-9, 2019. ing: An Introduction. A Bradford Book55 Hayward Street
Li, Z.; Narayan, A.; and Leong, T. Y. 2017. An efficient ap- Cambridge MA United States, second edition.
proach to model-based hierarchical reinforcement learning. Sutton, R. S.; Precup, D.; and Singh, S. 1999. Be-
31st AAAI Conference on Artificial Intelligence, AAAI 2017 tween mdps and semi-mdps: A framework for temporal ab-
3583–3589. straction in reinforcement learning. Artificial Intelligence
Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; 112(12):181211.
Tassa, Y.; Silver, D.; and Wierstra, D. 2016. Continuous Vinyals, O.; Ewalds, T.; Bartunov, S.; Georgiev, P.; Vezhn-
control with deep reinforcement learning. In 4th Interna- evets, A. S.; Yeo, M.; Makhzani, A.; Küttler, H.; Agapiou,
tional Conference on Learning Representations, ICLR 2016 J. P.; Schrittwieser, J.; Quan, J.; Gaffney, S.; Petersen, S.;
- Conference Track Proceedings. Simonyan, K.; Schaul, T.; van Hasselt, H.; Silver, D.; Lilli-
Lin, L.-J. 1993. Reinforcement learning for robots using crap, T. P.; Calderone, K.; Keet, P.; Brunasso, A.; Lawrence,
neural networks. Ph.D. Dissertation, Carnegie Mellon Uni- D.; Ekermo, A.; Repp, J.; and Tsing, R. 2017. Star-
versity. craft II: A new challenge for reinforcement learning. CoRR
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; abs/1708.04782.
Antonoglou, I.; Wierstra, D.; and Riedmiller, M. 2013. Play- Vinyals, O.; Babuschkin, I.; Czarnecki, W. M.; Mathieu, M.;
ing atari with deep reinforcement learning. In NIPS Deep Dudzik, A.; Chung, J.; Choi, D. H.; Powell, R.; Ewalds, T.;
Learning Workshop. Georgiev, P.; Oh, J.; Horgan, D.; Kroiss, M.; Danihelka, I.;
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, Huang, A.; Sifre, L.; Cai, T.; Agapiou, J. P.; Jaderberg, M.;
J.; Bellemare, M. G.; Graves, A.; Riedmiller, M. A.; Fidje- Vezhnevets, A. S.; Leblond, R.; Pohlen, T.; Dalibard, V.;
land, A.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Budden, D.; Sulsky, Y.; Molloy, J.; Paine, T. L.; Gulcehre,
Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, C.; Wang, Z.; Pfaff, T.; Wu, Y.; Ring, R.; Yogatama, D.;
S.; and Hassabis, D. 2015. Human-level control through Wnsch, D.; McKinney, K.; Smith, O.; Schaul, T.; Lillicrap,
deep reinforcement learning. Nature 518(7540):529–533. T.; Kavukcuoglu, K.; Hassabis, D.; Apps, C.; and Silver, D.
2019. Grandmaster level in StarCraft II using multi-agent
Pang, Z.-J.; Liu, R.-Z.; Meng, Z.-Y.; Zhang, Y.; Yu, Y.; and reinforcement learning. Nature 575(7782):350–354.
Lu, T. 2019. On Reinforcement Learning for Full-Length
Game of StarCraft. In Proceedings of the AAAI Conference Weng, L. 2020. Curriculum for reinforcement learning.
on Artificial Intelligence, volume 33, 4691–4698. lilianweng.github.io/lil-log.
Ring, R. 2018. Reaver: Modular deep reinforcement learn- Zambaldi, V. F.; Raposo, D.; Santoro, A.; Bapst, V.; Li, Y.;
ing framework. https://github.com/inoryy/reaver. Babuschkin, I.; Tuyls, K.; Reichert, D. P.; Lillicrap, T. P.;
Lockhart, E.; Shanahan, M.; Langston, V.; Pascanu, R.;
Schaul, T.; Horgan, D.; Gregor, K.; and Silver, D. 2015. Uni- Botvinick, M.; Vinyals, O.; and Battaglia, P. W. 2019. Deep
versal value function approximators. In Proceedings of the reinforcement learning with relational inductive biases. In
32nd International Conference on International Conference 7th International Conference on Learning Representations,
on Machine Learning - Volume 37, ICML15, 13121320. ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
JMLR.org.
Zaremba, W., and Sutskever, I. 2014. Learning to execute.
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and CoRR abs/1410.4615.
Klimov, O. 2017. Proximal policy optimization algorithms.
CoRR abs/1707.06347.
Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre,
L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.;
Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.;
Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T. P.;