Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Published 21 Feb 2024 in cs.SE and cs.LG | (2402.13640v2)

Abstract: Deep Learning (DL) frameworks such as PyTorch and TensorFlow include runtime infrastructures responsible for executing trained models on target hardware, managing memory, data transfers, and multi-accelerator execution, if applicable. Additionally, it is a common practice to deploy pre-trained models on environments distinct from their native development settings. This led to the introduction of interchange formats such as ONNX, which includes its runtime infrastructure, and ONNX Runtime, which work as standard formats that can be used across diverse DL frameworks and languages. Even though these runtime infrastructures have a great impact on inference performance, no previous paper has investigated their energy efficiency. In this study, we monitor the energy consumption and inference time in the runtime infrastructures of three well-known DL frameworks as well as ONNX, using three various DL models. To have nuance in our investigation, we also examine the impact of using different execution providers. We find out that the performance and energy efficiency of DL are difficult to predict. One framework, MXNet, outperforms both PyTorch and TensorFlow for the computer vision models using batch size 1, due to efficient GPU usage and thus low CPU usage. However, batch size 64 makes PyTorch and MXNet practically indistinguishable, while TensorFlow is outperformed consistently. For BERT, PyTorch exhibits the best performance. Converting the models to ONNX yields significant performance improvements in the majority of cases. Finally, in our preliminary investigation of execution providers, we observe that TensorRT always outperforms CUDA.

Abstract PDF HTML Upgrade to Chat

References (38)

Citations (6)

View on Semantic Scholar

Summary

The paper presents empirical findings comparing energy consumption across DL frameworks during model inference.
It details methodical evaluations using ResNet, MobileNet, and BERT across varying batch sizes and execution providers.
The study highlights ONNX conversion’s role in improving energy efficiency, with TensorRT outperforming CUDA consistently.

Green AI: Energy Consumption in DL Models Across Runtime Infrastructures

The paper "Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures" investigates the energy efficiency of Deep Learning (DL) models during inference across various runtime infrastructures. This study offers valuable insights into the energy consumption patterns of prominent DL frameworks and aims to contribute to the broader discourse on Green AI, emphasizing the need for energy-efficient DL approaches.

Introduction

The initial focus of the paper is on the energy demands of DL frameworks, addressing their substantial financial and environmental impacts. DL models are typically deployed in runtime environments distinct from the settings in which they were developed. As such, runtime infrastructures like ONNX have emerged as standardized solutions to optimize cross-framework performance and efficiency. Although frameworks are generally assessed based on accuracy, this paper uniquely evaluates their energy consumption during inference, a critical aspect of their real-world application.

Methodology

The authors conducted an empirical study using three DL models—ResNet, MobileNet, and BERT—with different batch sizes, across runtime infrastructures for three well-known DL frameworks: PyTorch, TensorFlow, and MXNet. They focused on energy efficiency and performance by recording GPU utilization, power usage, inference time, and total energy consumed during model inference. Additionally, the study examines ONNX's role in enhancing energy efficiency by converting models from these frameworks and using two execution providers: CUDA and TensorRT.

Results

Energy Efficiency Across Frameworks

In MobileNet and ResNet with a batch size of 1, MXNet consistently utilized the GPU more efficiently, demonstrated by lower CPU energy usage and overall energy consumption when compared to TensorFlow and PyTorch. For BERT, PyTorch outperformed both MXNet and TensorFlow in terms of energy and time efficiency, although MXNet achieved the highest accuracy. This variability underscores the necessity for ML engineers to experiment with different frameworks to optimize energy efficiency based on specific DL tasks.

Impact of ONNX Conversion

The conversion to ONNX typically improved performance and reduced energy consumption across different batch sizes for most models. Notably, TensorFlow's inefficiency at batch size 1 was mitigated following conversion. Despite this, for batch size 64, converted models derived from MXNet and PyTorch exhibited increased energy usage and inference time, signifying that optimization through ONNX does not uniformly result in improvements.

Execution Providers: CUDA and TensorRT

TensorRT consistently outperformed CUDA as an execution provider for ONNX, managing better GPU utilization and reducing energy consumption across all tested models. These findings highlight TensorRT’s potential as a more energy-efficient solution for deploying DL models on GPUs.

Conclusion

The paper concludes that no single DL framework consistently proves optimal across varying models, batch sizes, and runtime configurations. Different frameworks excel under different conditions, underscoring the importance of targeted experimentation by ML developers. While ONNX conversion typically enhances performance and energy efficiency, results can vary significantly dependent on model and configuration specifics.

Future research should extend the examination of execution providers across more DL models and explore language-specific runtime overheads. Additionally, further studies could investigate the effects of runtime infrastructure optimizations on energy consumption in DL processes.

Overall, this paper contributes to a nuanced understanding of the complexities surrounding energy efficiency in DL model deployment, offering actionable insights for the development of Green AI initiatives focused on reducing environmental impact.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Summary

Green AI: Energy Consumption in DL Models Across Runtime Infrastructures

Introduction

Methodology

Results

Energy Efficiency Across Frameworks

Impact of ONNX Conversion

Execution Providers: CUDA and TensorRT

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (2)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Summary

Green AI: Energy Consumption in DL Models Across Runtime Infrastructures

Introduction

Methodology

Results

Energy Efficiency Across Frameworks

Impact of ONNX Conversion

Execution Providers: CUDA and TensorRT

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research