Energy and Policy Considerations For Deep Learning in NLP
Energy and Policy Considerations For Deep Learning in NLP
Table 3: Estimated cost of training a model in terms of CO2 emissions (lbs) and cloud compute cost (USD).7 Power
and carbon footprint are omitted for TPUs due to lack of public information on power draw for this hardware.
James S Bergstra, Rémi Bardenet, Yoshua Bengio, and David R. So, Chen Liang, and Quoc V. Le. 2019.
Balázs Kégl. 2011. Algorithms for hyper-parameter The evolved transformer. In Proceedings of the
optimization. In Advances in neural information 36th International Conference on Machine Learning
processing systems, pages 2546–2554. (ICML).
Bruno Burger. 2019. Net Public Electricity Generation Emma Strubell, Patrick Verga, Daniel Andor,
in Germany in 2018. Technical report, Fraunhofer David Weiss, and Andrew McCallum. 2018.
Institute for Solar Energy Systems ISE. Linguistically-Informed Self-Attention for Se-
mantic Role Labeling. In Conference on Empir-
Alfredo Canziani, Adam Paszke, and Eugenio Culur- ical Methods in Natural Language Processing
ciello. 2016. An analysis of deep neural network (EMNLP), Brussels, Belgium.
models for practical applications.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Gary Cook, Jude Lee, Tamina Tsai, Ada Kongn, John Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz
Deans, Brian Johnson, Elizabeth Jardim, and Brian Kaiser, and Illia Polosukhin. 2017. Attention is all
Johnson. 2017. Clicking Clean: Who is winning you need. In 31st Conference on Neural Information
the race to build a green internet? Technical report, Processing Systems (NIPS).
Greenpeace.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. BERT: Pre-training of
Deep Bidirectional Transformers for Language Un-
derstanding. In NAACL.
Timothy Dozat and Christopher D. Manning. 2017.
Deep biaffine attention for neural dependency pars-
ing. In ICLR.
EPA. 2018. Emissions & Generation Resource Inte-
grated Database (eGRID). Technical report, U.S.
Environmental Protection Agency.
Christopher Forster, Thor Johnsen, Swetha Man-
dava, Sharath Turuvekere Sreenivas, Deyu Fu, Julie
Bernauer, Allison Gray, Sharan Chetlur, and Raul
Puri. 2019. BERT Meets GPUs. Technical report,
NVIDIA AI.
Da Li, Xinbo Chen, Michela Becchi, and Ziliang Zong.
2016. Evaluating the energy efficiency of deep con-
volutional neural networks on cpus and gpus. 2016
IEEE International Conferences on Big Data and
Cloud Computing (BDCloud), Social Computing
and Networking (SocialCom), Sustainable Comput-
ing and Communications (SustainCom) (BDCloud-
SocialCom-SustainCom), pages 477–484.
Thang Luong, Hieu Pham, and Christopher D. Man-
ning. 2015. Effective approaches to attention-based
neural machine translation. In Proceedings of the
2015 Conference on Empirical Methods in Natural
Language Processing, pages 1412–1421. Associa-
tion for Computational Linguistics.