Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How to use model architecture and training environment to estimate the energy consumption of DL training (2307.05520v4)

Published 7 Jul 2023 in cs.LG, cs.CY, and cs.SE

Abstract: To raise awareness of the huge impact Deep Learning (DL) has on the environment, several works have tried to estimate the energy consumption and carbon footprint of DL-based systems across their life cycle. However, the estimations for energy consumption in the training stage usually rely on assumptions that have not been thoroughly tested. This study aims to move past these assumptions by leveraging the relationship between energy consumption and two relevant design decisions in DL training; model architecture, and training environment. To investigate these relationships, we collect multiple metrics related to energy efficiency and model correctness during the models' training. Then, we outline the trade-offs between the measured energy consumption and the models' correctness regarding model architecture, and their relationship with the training environment. Finally, we study the training's power consumption behavior and propose four new energy estimation methods. Our results show that selecting the proper model architecture and training environment can reduce energy consumption dramatically (up to 80.72%) at the cost of negligible decreases in correctness. Also, we find evidence that GPUs should scale with the models' computational complexity for better energy efficiency. Furthermore, we prove that current energy estimation methods are unreliable and propose alternatives 2x more precise.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NeurIPS, vol. 25, 2012.
  2. C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” TPAMI, vol. 35, no. 8, pp. 1915–1929, 2013.
  3. A. Esteva, K. Chou, S. Yeung, N. Naik, A. Madani, A. Mottaghi, Y. Liu, E. Topol, J. Dean, and R. Socher, “Deep learning-enabled medical computer vision,” npj Digital Medicine, vol. 4, p. 5, 12 2021.
  4. Q. Rao and J. Frtunikj, “Deep learning for self-driving cars: Chances and challenges,” in SEFAIAS, 2018, p. 35–38.
  5. J. Cho, K. Lee, E. Shin, G. Choy, and S. Do, “How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?” 2015.
  6. Model zoo summary — MMPretrain 1.0.0 documentation. [Online]. Available: https://mmpretrain.readthedocs.io/en/latest/modelzoo_statistics.html
  7. YOLO: Real-time object detection. [Online]. Available: https://pjreddie.com/darknet/yolo/
  8. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT, vol. 1, pp. 4171–4186, 10 2018.
  9. A. Dario, H. Danny, S. Girish, C. Jack, B. Greg, and S. Ilya, “Ai and compute,” 2018. [Online]. Available: https://openai.com/blog/ai-and-compute/#addendum
  10. T. Brown et al., “Language models are few-shot learners,” in NeurIPS, 2020, pp. 1877–1901.
  11. R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green ai,” Communications of the ACM, vol. 63, pp. 54–63, 11 2020.
  12. R. Verdecchia, J. Sallou, and L. Cruz, “A systematic review of Green AI,” WIREs Data Mining and Knowledge Discovery, vol. 13, no. 4, p. e1507, 2023.
  13. D. Li, X. Chen, M. Becchi, and Z. Zong, “Evaluating the energy efficiency of deep convolutional neural networks on cpus and gpus,” in BDCloud-SocialCom-SustainCom, 2016, pp. 477–484.
  14. R. Caspart et al., “Precise energy consumption measurements of heterogeneous artificial intelligence workloads,” in High Performance Computing. ISC High Performance 2022 International Workshops, 2022, pp. 108–121.
  15. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen, “Convergence of edge computing and deep learning: A comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 869–904, 2020.
  16. A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, and B. Vorster, “Deep learning in the automotive industry: Applications and tools,” in IEEE Big Data, 2016, pp. 3759–3768.
  17. Y. Xu, S. Martínez-Fernández, M. Martinez, and X. Franch, “Energy Efficiency of Training Neural Network Architectures: An Empirical Study,” Feb. 2023.
  18. E. García-Martín, C. F. Rodrigues, G. Riley, and H. Grahn, “Estimation of energy consumption in machine learning,” Journal of Parallel and Distributed Computing, vol. 134, pp. 75–88, 2019.
  19. V. R. Basili, G. Caldiera, and H. D. Rombach, “The goal question metric approach,” 1994.
  20. G. Wölflein and O. Arandjelović, “Determining chess game state from an image,” Journal of Imaging, vol. 7, p. 94, 6 2021.
  21. Chess pieces dataset. [Online]. Available: https://public.roboflow.com/object-detection/chess-full/23
  22. R. Fischer, M. Jakobs, S. Mücke, and K. Morik, “A unified framework for assessing energy efficiency of machine learning,” in ECML PKDD, 2023, pp. 39–54.
  23. C.-J. Wu et al., “Sustainable ai: Environmental implications, challenges and opportunities,” in MLSys, vol. 4, 2022, pp. 795–813.
  24. A. Lacoste, A. Luccioni, V. Schmidt, and T. Dandres, “Quantifying the carbon emissions of machine learning,” 2019.
  25. M. Weber, C. Kaltenecker, F. Sattler, S. Apel, and N. Siegmund, “Twins or false friends? a study on energy consumption and performance of configurable software,” in ICSE, vol. 23, 2023, p. 12.
  26. S. Georgiou, M. Kechagia, T. Sharma, F. Sarro, and Y. Zou, “Green ai: Do deep learning frameworks have different costs?” in ICSE, 2022, pp. 1082–1094.
  27. C. Abad, I. T. Foster, N. Herbst, and A. Iosup, “Serverless Computing (Dagstuhl Seminar 21201),” Dagstuhl Reports, vol. 11, no. 4, pp. 34–93, 2021.
  28. J. Castaño, S. Martínez-Fernández, X. Franch, and J. Bogner, “Exploring the Carbon Footprint of Hugging Face’s ML Models: A Repository Mining Study,” May 2023.
  29. D. Patterson, J. Gonzalez, U. Hölzle, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. R. So, M. Texier, and J. Dean, “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink,” Computer, vol. 55, no. 7, pp. 18–28, Jul. 2022.
  30. K. Neshatpour, F. Behnia, H. Homayoun, and A. Sasan, “Icnn: An iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation,” in DATE, 2018, pp. 551–556.
  31. F. Boutros, N. Damer, M. Fang, F. Kirchbuchner, and A. Kuijper, “Mixfacenets: Extremely efficient face recognition networks,” in IJCB, 2021, pp. 1–8.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Santiago del Rey (3 papers)
  2. Silverio Martínez-Fernández (32 papers)
  3. Luís Cruz (54 papers)
  4. Xavier Franch (48 papers)
Citations (7)