Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Power Hungry Processing: Watts Driving the Cost of AI Deployment? (2311.16863v3)

Published 28 Nov 2023 in cs.LG
Power Hungry Processing: Watts Driving the Cost of AI Deployment?

Abstract: Recent years have seen a surge in the popularity of commercial AI products based on generative, multi-purpose AI systems promising a unified approach to building ML models into technology. However, this ambition of `generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit. In this work, we propose the first systematic comparison of the ongoing inference cost of various categories of ML systems, covering both task-specific (i.e. finetuned models that carry out a single task) andgeneral-purpose' models, (i.e. those trained for multiple tasks). We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models. We find that multi-purpose, generative architectures are orders of magnitude more expensive than task-specific systems for a variety of tasks, even when controlling for the number of model parameters. We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions. All the data from our study can be accessed via an interactive demo to carry out further exploration and analysis.

Power Hungry Processing: Watts Driving the Cost of AI Deployment?

In recent years, the AI community has seen a significant shift towards deploying large-scale, generative models for a myriad of tasks ranging from NLP to computer vision. The work by Luccioni et al., "Power Hungry Processing: Watts Driving the Cost of AI Deployment?" offers a detailed analysis focused on the energy consumption and carbon emissions of AI model inference, an area that has been considerably less explored compared to the training phase of AI systems.

Overview of the Study

The paper initiates with the pressing need to understand the environmental impact of AI, especially given the exponential increase in computational resources consumed by major tech companies. While previous research has extensively measured the energy consumption during the training of ML models, this work is unique in its focus on the inference phase, which, according to the paper, could have equal or greater environmental ramifications given the frequency with which models are deployed in production environments.

Methodology

The authors perform a systematic comparative paper covering both task-specific and general-purpose models, evaluating 88 models across 10 tasks and 30 datasets from NLP and computer vision domains. The assessment involves running 1,000 inferences per model per dataset on an NVIDIA A100-SXM4-80GB GPU and measuring both energy consumed and the resultant carbon emissions using the Code Carbon package.

Key Findings

Task-Specific Models

The paper reveals significant variability in energy use across different tasks. For example, text classification, a low-complexity task, consumes significantly less energy (mean of 0.002 kWh for 1,000 inferences) compared to generative tasks like text generation and summarization (mean of 0.05 kWh). More notably, image-based tasks, particularly image generation, are found to be the most energy-intensive, with mean consumption scaling up to 2.9 kWh for 1,000 inferences. These findings emphasize that the complexity and type of the task greatly influence the energy footprint of AI models.

Multi-Purpose Models

Further analysis distinguishes between task-specific and multi-purpose models, revealing that generalized architectures (e.g., from the BLOOMz and Flan-T5 families) incur higher energy costs compared to their task-specific counterparts. The differences are stark in tasks like text classification and question answering, where fine-tuned models are significantly more efficient. Notably, the paper finds that sequence-to-sequence models tend to be more efficient than decoder-only models for tasks with longer output sequences, such as summarization.

Implications

The implications of their findings are manifold. Practically, this paper serves as a critical resource for AI practitioners, particularly those in operational roles, who must weigh the accuracy-efficiency trade-offs. Deploying a multi-task model for specialized tasks, while convenient, may lead to orders of magnitude higher energy consumption, necessitating a more deliberate choice of model architectures based on specific use-cases and efficiency requirements.

Theoretically, these findings call for a reevaluation of the current trends towards larger, multi-purpose models. While these models offer versatility and significant advances in zero-shot and few-shot learning, their deployment should be critically assessed against their environmental costs. The paper sets the stage for further research into optimization techniques, such as model distillation, quantization, and hardware-specific efficiencies, which could mitigate these trade-offs.

Future Directions

Looking forward, the paper paves the way for a deeper exploration into:

  • Optimization Techniques: Developing methodologies to reduce the energy footprint of inference without significantly compromising on performance.
  • Detailed Lifecycle Analysis: Extending the analysis to encompass the entire lifecycle of an AI model, including aspects like water usage and the extraction of rare earth minerals.
  • Policy Recommendations: Informing policy decisions on AI deployment and sustainability, advocating for transparency in reporting the environmental costs of AI models.

Conclusion

Luccioni et al.'s examination of the hidden costs of AI model deployment presents a compelling case for balanced decision-making in the deployment of AI models. Through rigorous empirical analysis, the paper provides a clearer understanding of how different models and tasks scale in energy consumption and environmental impact, emphasizing the need for sustainable AI practices. This work is a crucial step towards acknowledging and addressing the environmental considerations in the rapidly evolving AI landscape.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Evaluating the carbon footprint of NLP methods: a survey and analysis of existing tools. In EMNLP, Workshop SustaiNLP.
  2. Jeff Barr. 2019. Amazon ec2 update–inf1 instances with AWS inferentia chips for high performance cost-effective inferencing. https://aws.amazon.com/blogs/aws/amazon-ec2-update-inf1-instances-with-aws-inferentia-chips-for-high-performance-cost-effective-inferencing/.
  3. Bing. 2019. Bing delivers its largest improvement in search experience using Azure GPUs. https://azure.microsoft.com/en-us/blog/bing-delivers-its-largest-improvement-in-search-experience-using-azure-gpus/.
  4. Bing. 2023. Confirmed: the new Bing runs on OpenAI’s GPT-4. https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s-GPT-4.
  5. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  6. Reducing the Carbon Impact of Generative AI Inference (today and in 2035). In Proceedings of the 2nd Workshop on Sustainable Computer Systems. 1–7.
  7. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
  8. Scaling Instruction-Finetuned Language Models. https://doi.org/10.48550/ARXIV.2210.11416
  9. Rishit Dagli and Ali Mustufa Shaikh. 2021. CPPE-5: Medical Personal Protective Equipment Dataset. arXiv:2112.09569 [cs.CV]
  10. RedCaps: web-curated image-text data created by the people, for the people. arXiv:2111.11431 [cs.CV]
  11. Compute and energy consumption trends in deep learning inference. arXiv preprint arXiv:2109.05472 (2021).
  12. Measuring the carbon intensity of AI in cloud instances. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1877–1894.
  13. LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models. arXiv preprint arXiv:2309.14393 (2023).
  14. A framework for few-shot language model evaluation. https://doi.org/10.5281/zenodo.5371628
  15. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization. Association for Computational Linguistics, Hong Kong, China, 70–79. https://doi.org/10.18653/v1/D19-5409
  16. Google. 2019. Understanding searches better than ever before. https://blog.google/products/search/search-language-understanding-bert/.
  17. Google. 2023a. Bard can now connect to your Google apps and services. https://blog.google/products/bard/google-bard-new-features-update-sept-2023/.
  18. Google. 2023b. An important next step on our AI journey. https://blog.google/technology/ai/bard-google-ai-search-updates/.
  19. CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency. arXiv preprint arXiv:2302.08681 (2023).
  20. Teaching Machines to Read and Comprehend. In NeurIPS. 1693–1701. http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend
  21. Ralph Hintemann and Simon Hinterholzer. 2022. Cloud computing drives the growth of the data center industry and its energy consumption. Data centers 2022. ResearchGate (2022).
  22. International Energy Authority. 2023. Data Centres and Data Transmission Networks. https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks.
  23. Matt Gardner Johannes Welbl, Nelson F. Liu. 2017. Crowdsourcing Multiple Choice Science Questions. arXiv:1707.06209v1.
  24. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision 123 (2017), 32–73. https://doi.org/10.1007/s11263-016-0981-7
  25. Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report.
  26. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700 (2019).
  27. A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Association for Computational Linguistics, virtual+Dublin, 84–94. https://doi.org/10.18653/v1/2022.bigscience-1.8
  28. George Leopold. 2019. Aws to offer nvidia’s t4 gpus for ai inferencing. URL: https://web. archive. org/web/20220309000921/https://www. hpcwire. com/2019/03/19/aws-upgrades-its-gpu-backed-ai-inference-platform/(visited on 2022-04-19) (2019).
  29. Microsoft COCO: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
  30. Alexandra Sasha Luccioni and Alex Hernandez-Garcia. 2023. Counting carbon: A survey of factors influencing the emissions of machine learning. arXiv preprint arXiv:2302.08476 (2023).
  31. Estimating the carbon footprint of BLOOM, a 176B parameter language model. arXiv preprint arXiv:2211.02001 (2022).
  32. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150. http://www.aclweb.org/anthology/P11-1015
  33. Pointer Sentinel Mixture Models. arXiv:1609.07843 [cs.CL]
  34. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786 (2022).
  35. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. ArXiv abs/1808.08745 (2018).
  36. Will Oremus. 2023. AI chatbots lose money every time you use them. That is a problem. https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/. Washington Post (2023).
  37. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures (Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019), Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald L”ungen, and Caroline Iliadi (Eds.). Leibniz-Institut f”ur Deutsche Sprache, Mannheim, 9 – 16. https://doi.org/10.14618/ids-pub-9021
  38. Cross-lingual Name Tagging and Linking for 282 Languages. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1946–1958. https://doi.org/10.18653/v1/P17-1178
  39. Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL.
  40. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. https://doi.org/10.48550/ARXIV.2204.05149
  41. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350 (2021).
  42. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv e-prints (2019). arXiv:1910.10683
  43. Know What You Don’t Know: Unanswerable Questions for SQuAD. arXiv:1806.03822 [cs.CL]
  44. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints, Article arXiv:1606.05250 (2016), arXiv:1606.05250 pages. arXiv:1606.05250
  45. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
  46. Gustavo Santana. 2023. Stable Diffusion Prompts. https://huggingface.co/datasets/Gustavosta/Stable-Diffusion-Prompts
  47. CodeCarbon: Estimate and Track Carbon Emissions from Machine Learning Computing.
  48. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631–1642. https://www.aclweb.org/anthology/D13-1170
  49. Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).
  50. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. 142–147. https://www.aclweb.org/anthology/W03-0419
  51. Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurement. arXiv preprint arXiv:2210.01970 (2022).
  52. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arXiv preprint arXiv:1905.00537 (2019).
  53. DiffusionDB: A Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models. arXiv:2210.14896 [cs] (2022). https://arxiv.org/abs/2210.14896
  54. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
  55. BLOOM: A 176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
  56. Sustainable AI: Environmental Implications, Challenges and Opportunities. arXiv preprint arXiv:2111.00364 (2021).
  57. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation. arXiv:2304.05977 [cs.CV]
  58. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. In The IEEE International Conference on Computer Vision (ICCV).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Alexandra Sasha Luccioni (25 papers)
  2. Yacine Jernite (46 papers)
  3. Emma Strubell (60 papers)
Citations (99)
Youtube Logo Streamline Icon: https://streamlinehq.com