Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale (2402.18593v1)
Abstract: As research and deployment of AI grows, the computational burden to support and sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form of AI hardware acceleration is virtually a requirement. Recent LLMs require considerable resources to train and deploy, resulting in significant energy usage, potential carbon emissions, and massive demand for GPUs and other hardware accelerators. However, this surge carries large implications for energy sustainability at the HPC/datacenter level. In this paper, we study the aggregate effect of power-capping GPUs on GPU temperature and power draw at a research supercomputing center. With the right amount of power-capping, we show significant decreases in both temperature and power draw, reducing power consumption and potentially improving hardware life-span with minimal impact on job performance. While power-capping reduces power draw by design, the aggregate system-wide effect on overall energy consumption is less clear; for instance, if users notice job performance degradation from GPU power-caps, they may request additional GPU-jobs to compensate, negating any energy savings or even worsening energy consumption. To our knowledge, our work is the first to conduct and make available a detailed analysis of the effects of GPU power-capping at the supercomputing scale. We hope our work will inspire HPCs/datacenters to further explore, evaluate, and communicate the impact of power-capping AI hardware accelerators for more sustainable AI.
- E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in NLP,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Jul. 2019.
- D. Narayanan, M. Shoeybi, J. Casper, P. LeGresley, M. Patwary, V. Korthikanti, D. Vainbrand, P. Kashinkunti, J. Bernauer, B. Catanzaro et al., “Efficient large-scale language model training on gpu clusters using megatron-lm,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021, pp. 1–15.
- B. Workshop, “Bloom: A 176b-parameter open-access multilingual language model,” 2023.
- M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catanzaro, “Megatron-lm: Training multi-billion parameter language models using model parallelism,” 2020.
- H. Touvron and T. S. et. al. (2023) Llama 2: Open foundation and fine-tuned chat models. [Online]. Available: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models
- M. AI, “Introducing the ai research supercluster — meta’s cutting-edge ai supercomputer for ai research,” Meta AI Research Blog, 2022.
- D. Zhao, N. C. Frey, J. McDonald, M. Hubbell, D. Bestor, M. Jones, A. Prout, V. Gadepally, and S. Samsi, “A green(er) world for a.i.” in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022, pp. 742–750.
- M. Xia, Z. Zhong, and D. Chen, “Structured pruning learns compact and accurate models,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 1513–1528. [Online]. Available: https://aclanthology.org/2022.acl-long.107
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” stat, vol. 1050, p. 9, 2015.
- T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste, “Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks,” J. Mach. Learn. Res., vol. 22, no. 1, jan 2021.
- C. N. Coelho Jr, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T. K. Aarrestad, V. Loncar, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors,” Nature Machine Intelligence, vol. 3, no. 8, pp. 675–686, 2021.
- M. Paul, S. Ganguli, and G. K. Dziugaite, “Deep learning on a data diet: Finding important examples early in training,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 20 596–20 607. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/file/ac56f8fe9eea3e4a365f29f0f1957c55-Paper.pdf
- A. Haidar, H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, “Investigating power capping toward energy-efficient scientific applications,” Concurrency and Computation: Practice and Experience, vol. 31, no. 6, p. e4485, 2019, e4485 cpe.4485. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.4485
- Y. Liu, H. Zhu, K. Lu, and Y. Liu, “A power provision and capping architecture for large scale systems,” in 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012, pp. 954–963.
- C. hsing Hsu and W. chun Feng, “A power-aware run-time system for high-performance computing,” in SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, 2005, pp. 1–1.
- K. Tang, D. Tiwari, S. Gupta, P. Huang, Q. Lu, C. Engelmann, and X. He, “Power-capping aware checkpointing: On the interplay among power-capping, temperature, reliability, performance, and energy,” in 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2016, pp. 311–322.
- J. McDonald, B. Li, N. Frey, D. Tiwari, V. Gadepally, and S. Samsi, “Great power, great responsibility: Recommendations for reducing energy for training language models,” in Findings of the Association for Computational Linguistics: NAACL 2022. Seattle, United States: Association for Computational Linguistics, Jul. 2022, pp. 1962–1970. [Online]. Available: https://aclanthology.org/2022.findings-naacl.151
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19-1423
- N. C. Frey, B. Li, J. McDonald, D. Zhao, M. Jones, D. Bestor, D. Tiwari, V. Gadepally, and S. Samsi, “Benchmarking resource usage for efficient distributed deep learning,” in 2022 IEEE High Performance Extreme Computing Conference (HPEC), 2022, pp. 1–8.
- A. Krzywaniak, P. Czarnul, and J. Proficz, “Dynamic gpu power capping with online performance tracing for energy efficient gpu computing using depo tool,” Future Gener. Comput. Syst., vol. 145, no. C, p. 396–414, may 2023. [Online]. Available: https://doi.org/10.1016/j.future.2023.03.041
- M. Johnson-Groh, “Achieving more with less: Optimizing efficiency in supercomputing,” Flatiron Institute, Flatiron Scientist Spotlight, 2023.
- A. Krzywaniak, P. Czarnul, and J. Proficz, “Gpu power capping for energy-performance trade-offs in training of deep convolutional neural networks for image recognition,” ICCS, 2022.
- A. A. Chien, C. Zhang, L. Lin, and V. Rao, “Beyond pue: Flexible datacenters empowering the cloud to decarbonize,” in 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon), 2022.
- N. Bashir, D. Irwin, P. Shenoy, and A. Souza, “Sustainable computing – without the hot air,” in 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon), 2022.
- A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V. Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive supercomputing on 40,000 cores for machine learning and data analysis,” in 2018 IEEE High Performance extreme Computing Conference (HPEC). IEEE, 2018, pp. 1–6.
- B. L. Welch, “The generalisation of studentś problems when several different population variances are involved,” Biometrika, vol. 34, no. 1-2, pp. 28–35, Jan. 1947. [Online]. Available: https://doi.org/10.1093/biomet/34.1-2.28
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023.