Toward Cross-Layer Energy Optimizations in AI Systems (2404.06675v2)
Abstract: The "AI for Science, Energy, and Security" report from DOE outlines a significant focus on developing and optimizing artificial intelligence workflows for a foundational impact on a broad range of DOE missions. With the pervasive usage of AI and ML tools and techniques, their energy efficiency is likely to become the gating factor toward adoption. This is because generative AI (GenAI) models are massive energy hogs: for instance, training a 200-billion parameter LLM at Amazon is estimated to have taken 11.9 GWh, which is enough to power more than a thousand average U.S. households for a year. Inference consumes even more energy, because a model trained once serve millions. Given this scale, high energy efficiency is key to addressing the power delivery problem of constructing and operating new supercomputers and datacenters specialized for AI workloads. In that regard, we outline software- and architecture-level research challenges and opportunities, setting the stage for creating cross-layer energy optimizations in AI systems.
- AMD system management interface library. https://rocm.docs.amd.com/projects/amdsmi/en/latest/.
- How much electricity does an American home use? https://www.eia.gov/tools/faqs/faq.php?id=97&t=3.
- NVIDIA Management Library (NVML). https://developer.nvidia.com/nvidia-management-library-nvml.
- PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In ASPLOS, 2019.
- Anthropic. Claude. https://claude.ai.
- Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1):127–138, 2017.
- Perseus: Energy scheduling in large model training. arXiv preprint arXiv:2312.06902, 2023.
- SNAFU: An ultra-low-power, energy-minimal CGRA-generation framework and architecture. In ISCA, pages 1027–1040, 2021.
- Failures in large scale systems: Long-term measurement, analysis, and implications. In SC, 2017.
- James Hamilton. Constraint-driven innovation (CIDR 2024 keynote talk). https://mvdirona.com/jrh/talksandpapers/JamesHamiltonCIDR2024.pdf.
- Applied machine learning at Facebook: A datacenter infrastructure perspective. In HPCA, 2018.
- GPipe: Efficient training of giant neural networks using pipeline parallelism. In NeurIPS, 2019.
- Oobleck: Resilient distributed training of large models using pipeline templates. In SOSP, 2023.
- Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In ISCA, 2023.
- EDEN: Enabling energy-efficient, high-performance deep neural network inference using approximate DRAM. In MICRO, 2019.
- Thunderbolt: Throughput-Optimized, Quality-of-Service-Aware power capping at scale. In OSDI, 2020.
- PipeDream: generalized pipeline parallelism for DNN training. In SOSP, 2019.
- Efficient large-scale language model training on GPU clusters using Megatron-LM. In SC, 2021.
- OpenAI. ChatGPT. https://chat.openai.com.
- POLCA: Power oversubscription in llm cloud providers. 2023.
- Towards improved power management in cloud gpus. IEEE Computer Architecture Letters, 22(2):141–144, 2023.
- Amazon Web Services. AWS inferentia. https://aws.amazon.com/machine-learning/inferentia/.
- Energy-efficient realtime motion planning. In ISCA, 2023.
- Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12):2295–2329, 2017.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Bamboo: Making preemptible instances resilient for affordable training of large DNNs. In NSDI, 2023.
- A 1.5-μ𝜇\muitalic_μW fully-integrated keyword spotting soc in 28-nm CMOS with skip-rnn and fast-settling analog frontend for adaptive frame skipping. IEEE J. Solid State Circuits, 59(1):29–39, 2024.
- Zeus: Understanding and optimizing GPU energy consumption of DNN training. In USENIX NSDI, 2023.
- Jae-Won Chung (8 papers)
- Mosharaf Chowdhury (39 papers)
- Nishil Talati (14 papers)