This paper introduces Google's Carbon-Intelligent Compute Management System (CICS), designed to reduce the carbon footprint and operational costs of its global datacenter fleet by shifting temporally flexible workloads to times when grid electricity is less carbon-intensive (Radovanovic et al., 2021 ). The system addresses the growing energy consumption of datacenters and the variability in carbon intensity of electricity grids based on time and location.
The core mechanism employed by CICS is the Virtual Capacity Curve (VCC). A VCC is an hourly limit imposed on the total compute resources (specifically CPU, measured in Google Compute Units or GCUs) available to flexible workloads within a datacenter cluster for the next day. These flexible workloads typically include batch processing jobs like data compaction, machine learning training, simulations, and video processing, which can tolerate delays as long as they complete within a 24-hour window. User-facing services and customer VMs are classified as inflexible and are not affected.
System Architecture and Implementation:
CICS operates through a suite of analytical pipelines executed daily:
- Carbon Fetching Pipeline: Retrieves hourly, day-ahead forecasts of average carbon intensity (kgCOe/kWh) for the grid zones where Google's datacenters are located. The primary source for this data is Tomorrow (electricityMap.org).
- Power Modeling Pipeline: Trains and updates models that map cluster-level CPU usage to power consumption. The paper highlights that a piecewise linear model accurately estimates Power Distribution Unit (PDU) power based on CPU usage alone, with a daily Mean Absolute Percent Error (MAPE) below 5% for most PDs [(Radovanovic et al., 2021
), see also (Daltro et al., 2021
)]. The relationship between cluster CPU usage () and power () is approximated locally as:
where is the cluster power sensitivity derived from individual PDU models and their average CPU usage fractions. This accurate mapping is crucial for the optimization process.
- Load Forecasting Pipeline: Predicts cluster-level compute demand for the next day. Key forecasts include:
- Hourly inflexible CPU usage ().
- Total daily flexible CPU usage ().
- Total daily CPU reservations () (reservations are typically higher than actual usage to guarantee resources).
- Hourly CPU reservation-to-usage ratio (). Forecasting uses methods like Exponentially Weighted Moving Averages (EWMA) and linear models to capture weekly patterns and daily deviations. Forecast accuracy is generally high (median APE < 10% for most clusters), which is vital for the system's effectiveness.
- Optimization Pipeline: This daily process computes the optimal VCCs for all clusters. It minimizes a weighted sum of the expected fleetwide carbon footprint and peak power consumption costs:
* : Forecasted carbon intensity. * : Power model output. * : Power model sensitivity. * : Optimized hourly deviation of flexible usage from its daily average. * : Risk-aware forecast of daily flexible usage. * : Cluster peak power upper bound. * : Weights for carbon and peak power costs. The optimization is subject to constraints: * Daily Flexible Usage Conservation: . * Risk-Aware SLOs: Ensures total daily capacity meets the 97th percentile of predicted reservation demand (), preventing frequent violations of flexible workload completion. * Power Capping: Prevents exceeding cluster power limits based on inflexible load quantiles. * Campus Power Contracts: Limits total peak power for clusters within a datacenter. * Machine Capacity: cannot exceed physical capacity. The final VCC for cluster at hour is calculated as:
- SLO Violation Detection: Monitors if clusters consistently fail to meet their daily flexible compute targets. If violations persist (e.g., due to unpredicted demand growth), shaping for that cluster is temporarily paused to allow forecasts to adapt.
Operation and Impact:
The computed VCCs are pushed daily to Google's cluster management system (Borg). Borg's scheduler uses the VCC as the upper limit for total CPU reservations at any given hour. When the VCC is low (typically during high carbon intensity periods), the admission controller queues new flexible tasks or potentially preempts running ones, delaying their execution until the VCC increases (during lower carbon intensity periods).
The paper demonstrates the system's effectiveness using operational data:
- Cluster Examples: Shows how VCCs successfully reduce CPU reservations and power consumption during peak carbon hours in clusters with sufficient flexible load and predictable demand. Effectiveness diminishes if flexible load is small or demand forecasts have high uncertainty (requiring higher VCC headroom).
- Campus-Level Impact: A controlled experiment showed that activating CICS resulted in an average power drop of 1-2% during the highest carbon intensity hours compared to control days.
- Trade-offs: Optimizing solely for carbon might increase peak resource needs, whereas the dual objective balances environmental goals with infrastructure efficiency. More aggressive shaping might lead to a slight decrease in total daily flexible work completed, potentially due to jobs migrating or task intolerance to longer delays.
Practical Considerations:
- Scheduler-Agnostic: CICS provides capacity constraints (VCCs) but doesn't modify the underlying scheduling algorithms.
- Reliability: Designed with gradual rollouts, monitoring, and feedback loops to ensure stability and adherence to SLOs.
- Scalability: Centralized optimization based on aggregate cluster-level forecasts is more scalable than job-level approaches.
- Day-Ahead Planning: Leverages the predictability of aggregate demand and day-ahead carbon forecasts, decoupling complex optimization from real-time scheduling.
The paper concludes that CICS effectively shifts load temporally to reduce carbon emissions and improve efficiency, demonstrating a scalable, first-of-its-kind implementation. Future work includes incorporating spatial load shifting (moving jobs between datacenters).