Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Carbon-Aware Computing for Data Centers with Probabilistic Performance Guarantees (2410.21510v2)

Published 28 Oct 2024 in eess.SY and cs.SY

Abstract: Data centers are significant contributors to carbon emissions and can strain power systems due to their high electricity consumption. To mitigate this impact and to participate in demand response programs, cloud computing companies strive to balance and optimize operations across their global fleets by making strategic decisions about when and where to place compute jobs for execution. In this paper, we introduce a load shaping scheme which reacts to time-varying grid signals by leveraging both temporal and spatial flexibility of compute jobs to provide risk-aware management guidelines and job placement with provable performance guarantees based on distributionally robust optimization. Our approach divides the problem into two key components: (i) day-ahead planning, which generates an optimal scheduling strategy based on historical load data, and (ii) real-time job placement and (time) scheduling, which dynamically tracks the optimal strategy generated in (i). We validate our method in simulation using normalized load profiles from randomly selected Google clusters, incorporating time-varying grid signals. We can demonstrate significant reductions in carbon cost and peak power with our approach compared to myopic greedy policies, while maintaining computational efficiency and abiding to system constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. E. Masanet, A. Shehabi, N. Lei, S. Smith, and J. Koomey, “Recalibrating global data center energy-use estimates,” Science, vol. 367, pp. 984–986, Feb. 2020.
  2. International Energy Agency, “Data centres and data transmission networks.” [online], July 2023.
  3. The Goldman Sachs Group, “Generational Growth AI, data centers and the coming US power demand surge.” [online], 4 2024.
  4. C. Hodgson, “Booming ai demand threatens global electricity supply,” Financial Times, 2024. Accessed: 2024-10-18.
  5. Enel X, “How data centers support the power grid with ancillary services,” 2024. Accessed: 2024-10-18.
  6. Synergy Research Group, “Microsoft, Amazon and Google Account for Over Half of Today’s 600 Hyperscale Data Centers.” https://tinyurl.com/3tz73kv7, Jan. 2021. Accessed: 2024-07-05.
  7. B. Johnson, “Carbon-aware kubernetes: Reducing emissions with smart scaling,” Oct. 2020. Microsoft Developer Blog.
  8. R. Ramachandran, “Announcing the public preview of azure compute fleet.” Microsoft, May 2024. Accessed: 2024-10-10.
  9. H. D. Dixit and J. Tse, “Retinas: Real-time infrastructure accounting for sustainability.” Meta Engineering Blog, 2024. Accessed: 2024-10-10.
  10. Verrus, “How verrus is powering the data future.” Verrus News, 2024. Accessed: 2024-10-10.
  11. Google, “Net zero carbon: Operating sustainably,” 2024. Accessed: 2024-10-18.
  12. F. Bovera, M. Delfanti, and F. Bellifemine, “Economic opportunities for demand response by data centers within the new italian ancillary service market,” in 2018 IEEE International Telecommunications Energy Conference (INTELEC), vol. 10, pp. 1–8, IEEE, Oct. 2018.
  13. J. Hansson, “The potential of data centre participation in ancillary service markets in Sweden,” Master’s thesis, KTH, School of Industrial Engineering and Management (ITM), 2022.
  14. A. Wierman, Z. Liu, I. Liu, and H. Mohsenian-Rad, “Opportunities and challenges for data center demand response,” in International Green Computing Conference, vol. 14, pp. 1–10, IEEE, Nov. 2014.
  15. V. Mehra and R. Hasegawa, “Using demand response to reduce data center power consumption.” https://tinyurl.com/msj84hcy, 2024. Accessed: 2024-10-18.
  16. M. Xu and R. Buyya, “Managing renewable energy and carbon footprint in multi-cloud computing environments,” Journal of Parallel and Distributed Computing, vol. 135, pp. 191–202, 2020.
  17. M. Abu Sharkh, A. Shami, and A. Ouda, “Optimal and suboptimal resource allocation techniques in cloud computing data centers,” Journal of Cloud Computing, vol. 6, Mar. 2017.
  18. V. Dvorkin, “Agent coordination via contextual regression (agentconcur) for data center flexibility,” IEEE Trans. Power Syst., pp. 1–11, 2024.
  19. T. Chen, A. G. Marques, and G. B. Giannakis, “Dglb: Distributed stochastic geographical load balancing over cloud networks,” IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 7, pp. 1866–1880, 2017.
  20. Z. Liu, M. Lin, A. Wierman, S. Low, and L. L. H. Andrew, “Greening geographical load balancing,” IEEE ACM Transactions on Networking, vol. 23, no. 2, pp. 657–671, 2015.
  21. J. Lindberg, B. C. Lesieutre, and L. A. Roald, “Using geographic load shifting to reduce carbon emissions,” Electric Power Systems Research, vol. 212, p. 108586, 2022.
  22. D. Paul and W.-D. Zhong, “Price and renewable aware geographical load balancing technique for data centres,” in 2013 9th International Conference on Information, Communications and Signal Processing, pp. 1–5, 2013.
  23. E. Breukelman, S. Hall, G. Belgioioso, and F. D”orfler, “Carbon-aware computing in a network of data centers: A hierarchical game-theoretic approach,” in 2024 European Control Conference (ECC), pp. 798–803, IEEE, 2024.
  24. R. Wang, Y. Lu, K. Zhu, J. Hao, P. Wang, and Y. Cao, “An optimal task placement strategy in geo-distributed data centers involving renewable energy,” IEEE Access, vol. 6, pp. 61948–61958, 2018.
  25. A. Khosravi, L. L. H. Andrew, and R. Buyya, “Dynamic vm placement method for minimizing energy and carbon cost in geographically distributed cloud data centers,” IEEE Trans. Sustain. Comput., vol. 2, no. 2, pp. 183–196, 2017.
  26. A. Radovanović, R. Koningstein, I. Schneider, B. Chen, A. Duarte, B. Roy, D. Xiao, M. Haridasan, P. Hung, N. Care, S. Talukdar, E. Mullen, K. Smith, M. Cottman, and W. Cirne, “Carbon-Aware Computing for Datacenters,” IEEE Trans. Power Syst., vol. 38, pp. 1270–1280, mar 2023.
  27. D. Kuhn, P. M. Esfahani, V. A. Nguyen, and S. Shafieezadeh-Abadeh, “Wasserstein distributionally robust optimization: Theory and applications in machine learning,” in Operations research & management science in the age of analytics, pp. 130–166, Informs, 2019.
  28. P. M. Esfahani and D. Kuhn, “Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations,” Mathematical Programming, vol. 171, pp. 115–166, jul 2017.
  29. J. Dean and S. Ghemawat, “MapReduce:simplified data processing on large clusters,” in OSDI’04: Sixth Symposium on Operating System Design and Implementation, (San Francisco, CA), pp. 137–150, 2004.
  30. Electricity Maps, “Carbon Intensity Data.” [online], 2024.
  31. A. J. K. A. S. T. Homem-de-Mello, “The Sample Average Approximation Method for Stochastic Discrete Optimization,” SIAM Journal on Optimization, vol. 12, pp. 479–502, jan 2002.
  32. J. Subirats and J. Guitart, “Assessing and forecasting energy efficiency on cloud computing platforms,” Future Generation Computer Systems, vol. 45, pp. 70–94, 2015.
  33. Springer, 2009.
  34. N. N. Taleb, The black swan : the impact of the highly improbable. New York Times Bestseller, New York: Random House Trade Paperbacks, 2nd ed., random trade pbk. ed. ed., 2010.
  35. A. R. Hota, A. Cherukuri, and J. Lygeros, “Data-driven chance constrained optimization under wasserstein ambiguity sets,” in 2019 American Control Conference (ACC), pp. 1501–1506, IEEE, 2019.
  36. M. Tirmazi, A. Barker, N. Deng, M. E. Haque, Z. G. Qin, S. Hand, M. Harchol-Balter, and J. Wilkes, “Borg: the next generation,” in Proceedings of the fifteenth European conference on computer systems, pp. 1–14, 2020.
  37. A. Verma, L. Pedrosa, M. R. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at Google with Borg,” in Proceedings of the European Conference on Computer Systems (EuroSys), (Bordeaux, France), 2015.
  38. Gurobi Optimization, LLC, “Gurobi Optimizer Reference Manual,” 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sophie Hall (8 papers)
  2. Francesco Micheli (8 papers)
  3. Giuseppe Belgioioso (31 papers)
  4. Florian Dörfler (253 papers)
  5. Ana Radovanović (2 papers)

Summary

Carbon-Aware Computing for Data Centers with Probabilistic Performance Guarantees

The paper presents a robust framework for optimizing the operation of data centers (DCs) in a carbon-aware manner, specifically targeting the reduction of carbon emissions and peak power demands through strategic job scheduling. This work proposes an innovative approach leveraging distributionally robust optimization (DRO) to manage compute jobs' temporal and spatial flexibility, providing probabilistic performance guarantees that account for uncertainties in compute loads.

Key Contributions

The authors outline the development of a load shaping scheme that reacts to time-varying grid signals, enabling cloud computing companies to strategically place compute jobs across their global data center fleets. This scheme is characterized by two principal components:

  1. Day-ahead Planning: This phase involves developing an optimal scheduling strategy based on historical data by solving a hardcore distributionally robust optimization problem. This problem considers various constraints, including temporal and spatial flexibility, offering a robust and reliable forecast despite the inherent stochastic nature of job demands.
  2. Real-time Job Placement: Here, the scheme dynamically tracks the optimal strategy computed in the day-ahead planning. The real-time placement mechanism is designed to handle the continuous flow of jobs and make rapid decisions, ensuring alignment with day-ahead predictions while maintaining operational efficiency.

Implementation and Validation

The authors validate this methodology through simulations using normalized load profiles from selected Google clusters, aiming to showcase the method's efficacy in practical deployments. The results indicate a substantial reduction in carbon costs and peak power usage compared to conventionally used greedy policies. The simulations address the key challenge of balancing computational efficiency with sustainability goals by ensuring that the placement strategy abides by system constraints and operational requirements.

Technical Insights

Several technical insights emerge from this paper:

  • Distributionally Robust Optimization (DRO): The novel application of DRO in this context offers a robust framework capable of handling uncertainties in load predictions, ensuring that scheduling decisions remain effective under different load scenarios. The ambiguity set within DRO is parameterized using the Wasserstein distance, allowing for tuning robustness against potential distribution shifts.
  • Virtual Capacity Curves (VCCs): The use of VCCs as adjustable limits on cluster capacity illustrates a method to regulate temporal shifting indirectly, contributing to reduced waiting times and enhanced resource utilization.
  • Probabilistic Guarantees: By employing Conditional Value-at-Risk (CVaR) constraints, the framework provides probabilistic guarantees that contribute to reliability in service management by modeling rare events and distribution shifts effectively.

Implications and Future Developments

The research holds significant implications for cloud service providers aiming to participate in demand response programs, thereby supporting the power grid's stability and reducing operational costs. The proposed framework not only contributes to the sustainability goals, such as achieving net-zero emissions but opens new avenues for incorporating advanced optimization techniques like DRO in managing large-scale computing resources effectively.

Future developments could involve expanding the scope of the framework to integrate additional operational factors, such as incorporating more detailed job characteristics and enhancing predictive models for load profiles. Additionally, exploring receding-horizon approaches could further refine the methodology, offering more dynamic responses to real-time grid signals and system states.

In summary, this paper provides a well-structured and theoretically sound approach to optimizing data center operations in the face of growing computational demands and environmental considerations, setting a foundation for further research and development in carbon-aware computing strategies.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com