CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers (2311.03615v2)
Abstract: Training large-scale AI models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
- Carbon explorer: A holistic framework for designing carbon aware datacenters. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 118–132.
- Diverse client selection for federated learning via submodular maximization. In International Conference on Learning Representations.
- Enabling sustainable clouds: The case for virtualizing the energy system. In Proceedings of the ACM Symposium on Cloud Computing. 350–358.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM J. Comput. 44, 5 (2015), 1384–1402.
- FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. arXiv preprint arXiv:2305.05176 (2023).
- Optimal client sampling for federated learning. arXiv preprint arXiv:2010.13723 (2020).
- Client selection in federated learning: Convergence analysis and power-of-choice selection strategies. arXiv preprint arXiv:2010.01243 (2020).
- GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers. Computing 100 (2018), 21–46.
- On the Uncapacitated Location Problem**This research was supported by NSF Grants ENG75-00568 and SOC-7402516. Sections 1–4 of this paper include a technical summary of some results given in [2]. Some proofs are omitted and may be obtained in [2]. In Studies in Integer Programming, P.L. Hammer, E.L. Johnson, B.H. Korte, and G.L. Nemhauser (Eds.). Annals of Discrete Mathematics, Vol. 1. Elsevier, 163–177. https://doi.org/10.1016/S0167-5060(08)70732-5
- José-Luis Cruz and Esteban Rossi-Hansberg. 2022. Local carbon policy. Technical Report. National Bureau of Economic Research.
- Maximizing non-monotone submodular functions. SIAM J. Comput. 40, 4 (2011), 1133–1153.
- Metrics for sustainability in data centers. In Proceedings of the 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon’22).
- Google. 2022. Environmental report. https://sustainability.google/reports/
- Suyog Gupta and Berkin Akin. 2020. Accelerator-aware neural network design using automl. arXiv preprint arXiv:2003.02838 (2020).
- ACT: Designing sustainable computer systems with an architectural carbon modeling tool. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 784–799.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Jeff Heaton. 2018. Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning: The MIT Press, 2016, 800 pp, ISBN: 0262035618. Genetic programming and evolvable machines 19, 1-2 (2018), 305–307.
- Towards the systematic reporting of the energy and carbon footprints of machine learning. The Journal of Machine Learning Research 21, 1 (2020), 10039–10081.
- Time-and cost-efficient task scheduling across geo-distributed data centers. IEEE Transactions on Parallel and Distributed Systems 29, 3 (2017), 705–718.
- Mark Z Jacobson. 2010. Enhancement of local air pollution by urban CO2 domes. Environmental science & technology 44, 7 (2010), 2497–2502.
- Optical Freespace Michelson Interferometric Reconfigurable Full Complex Convolution Module. In 2023 IEEE Photonics Conference (IPC). IEEE, 1–2.
- Reliable federated learning for mobile networks. IEEE Wireless Communications 27, 2 (2020), 72–80.
- Learning multiple layers of features from tiny images. (2009).
- Oort: Efficient federated learning via guided participant selection. In 15th {normal-{\{{USENIX}normal-}\}} Symposium on Operating Systems Design and Implementation ({normal-{\{{OSDI}normal-}\}} 21). 19–35.
- Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the forty-first annual ACM symposium on Theory of computing. 323–332.
- Hermes: an efficient federated learning framework for heterogeneous mobile clients. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 420–437.
- Towards Environmentally Equitable AI via Geographical Load Balancing. (2023).
- On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189 (2019).
- Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials 22, 3 (2020), 2031–2063.
- Estimating the carbon footprint of bloom, a 176b parameter language model. arXiv preprint arXiv:2211.02001 (2022).
- Multi-Day Forecasting of Electric Grid Carbon Intensity Using Machine Learning. SIGENERGY Energy Inform. Rev. 3, 2 (jun 2023), 19–33. https://doi.org/10.1145/3607114.3607117
- DACF: day-ahead carbon intensity forecasting of power grids using machine learning. In Proceedings of the Thirteenth ACM International Conference on Future Energy Systems (Virtual Event) (e-Energy ’22). Association for Computing Machinery, New York, NY, USA, 188–192. https://doi.org/10.1145/3538637.3538849
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.
- Meta. 2021. Sustainability report. https://sustainability.fb.com/
- Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning. PMLR, 6950–6960.
- Steven Gonzalez Monserrate. 2022. The cloud is material: On the environmental impacts of computation and data storage. (2022).
- Michael Neely. 2022. Stochastic network optimization with application to communication and queueing systems. Springer Nature.
- Deep federated learning for autonomous driving. In 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1824–1830.
- Federated learning for internet of things: A comprehensive survey. IEEE Communications Surveys & Tutorials 23, 3 (2021), 1622–1658.
- The carbon footprint of machine learning training will plateau, then shrink. Computer 55, 7 (2022), 18–28.
- Thermal-aware scheduling of batch jobs in geographically distributed data centers. IEEE Transactions on cloud computing 2, 1 (2013), 71–84.
- Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale. In International Conference on Machine Learning. PMLR, 18332–18346.
- The future of digital health with federated learning. NPJ digital medicine 3, 1 (2020), 119.
- Tackling climate change with machine learning. ACM Computing Surveys (CSUR) 55, 2 (2022), 1–96.
- Green ai. Commun. ACM 63, 12 (2020), 54–63.
- Fedspace: An efficient federated learning framework at satellites and ground stations. arXiv preprint arXiv:2202.01267 (2022).
- Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).
- Jian Sun and Jie Chen. 2017. A survey on Lyapunov-based methods for stability of linear time-delay systems. Frontiers of Computer Science 11 (2017), 555–567.
- Junkyard computing: Repurposing discarded smartphones to minimize carbon. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 400–412.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022).
- Federated Learning with Flexible Control. In IEEE INFOCOM 2023-IEEE Conference on Computer Communications. IEEE, 1–10.
- Sustainable ai: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems 4 (2022), 795–813.
- AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 40–50.
- Achieving linear speedup with partial worker participation in non-iid federated learning. arXiv preprint arXiv:2101.11203 (2021).
- Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5693–5700.
- Online Learning for Non-monotone DR-Submodular Maximization: From Full Information to Bandit Feedback. In International Conference on Artificial Intelligence and Statistics. PMLR, 3515–3537.
- Xiaoyang Zhang and Dan Wang. 2023. A GNN-based Day Ahead Carbon Intensity Forecasting Model for Cross-Border Power Grids. In Proceedings of the 14th ACM International Conference on Future Energy Systems (Orlando, FL, USA) (e-Energy ’23). Association for Computing Machinery, New York, NY, USA, 361–373. https://doi.org/10.1145/3575813.3597346
- Jieming Bian (17 papers)
- Lei Wang (975 papers)
- Shaolei Ren (56 papers)
- Jie Xu (467 papers)