Multi-DC Optical Networks
- Multi-Datacenter Optical Networks are high-capacity fiber systems connecting geographically dispersed datacenters to support distributed machine learning with pipeline-parallel training.
- The CBA framework dynamically adjusts frequency slot allocation and employs MILP-based scheduling to achieve a 31% reduction in iteration time and improved network performance.
- Experimental evaluations on NSFNET topologies confirm that real-time resource adaptation and contiguity-aware path selection significantly reduce bubble ratios and blocking probabilities.
Multi-Datacenter Optical Networks constitute the physical and algorithmic foundation for distributed machine learning training that spans geographically separated datacenters interconnected via high-capacity optical fiber networks. These systems are increasingly critical for scaling LLM and deep neural network (DNN) training where hardware resources in a single facility are insufficient. Multi-DC optical networks introduce novel challenges in resource assignment, communication scheduling, and system optimization, necessitating frameworks that co-design pipeline-parallel training algorithms with real-time network state awareness, latency estimation, and traffic engineering. Below, key principles, frameworks, and results from recent advances such as CBA ("Communication-Bound-Aware Cross-Domain Resource Assignment for Pipeline-Parallel Distributed LLM Training in Dynamic Multi-DC Optical Networks" (Fu et al., 23 Dec 2025)) are summarized in rigorous detail alongside representative approaches.
1. Distributed Training over Multi-DC Optical Network Topologies
Multi-DC optical networks are typically abstracted as a graph where represents individual datacenters (DCs) and are fiber links supporting frequency slots per fiber (e.g., slots, each $12.5$ GHz, NSFNET topology (Fu et al., 23 Dec 2025)). Each link maintains a binary frequency-slot occupancy vector at time . In pipeline-parallel (PP) distributed LLM training, layers are partitioned into stages with each stage mapped to a GPU—often spread across multiple DCs. Each of micro-batches per iteration triggers inter-DC transmission requests via dynamic optical network traffic, where link occupancy may overlap across requests due to temporal demand.
Key metrics are:
- Per-iteration runtime : wall-clock time from start of first forward to end of last backward micro-batch.
- Bubble ratio : proportion of iteration time spent idling owing to communication delays.
- Blocking probability : fraction of transmission requests that cannot be assigned a feasible path and frequency slot block, inducing delay or cancellation.
2. Communication-Aware Resource Assignment and Scheduling
Recent frameworks such as CBA (Fu et al., 23 Dec 2025) model PP training as a mixed-integer linear program (MILP) seeking to minimize under multi-DC optical network constraints. Decision variables indicate the assignment of micro-batch transmission (corresponding to a stage-to-stage data movement) to optical path and contiguous frequency slot block on every link of .
The communication latency for a request with payload traversing path and slot block is captured by the – model: where , are path-specific offset/bandwidth parameters updated per iteration, and accounts for queuing delays.
Scheduling constraints rigorously ensure that no frequency slot on any link is double-booked and that frequency-slot block assignment remains contiguous on all links of a path.
3. Communication-Bound-Aware (CBA) Dynamic Resource Adaptation
The crux of CBA (Fu et al., 23 Dec 2025) is adaptive, cross-domain orchestration:
- Detection of communication-bound tasks: the orchestrator inspects the previous schedule to label any micro-batch computation as communication-bound if network delays exceed prior dependency completion ().
- Dynamic frequency slot demand adjustment: if a transmission was blocked last iteration, decrease its slot demand by one; if labeled communication-bound, increment by one (bounded system-wide) to secure wider spectrum and improve latency.
- K-shortest-path search with contiguity-aware path selection: for each transmission, the framework evaluates candidate paths and slot blocks, calculating a fitness score
where is the contiguity index, is path hop count, and is current frequency slot usage fraction.
This heuristic approach enables real-time resource adaptation as network state and model demands evolve during training. No formal worst-case guarantee on solution approximation is provided.
4. Performance Characterization and Benchmarks
Experimental evaluation (Fu et al., 23 Dec 2025) utilizes NSFNET (14 nodes, 21 links; 80 FS/link, 12.5 GHz; 64-QAM modulation), placing GPUs randomly in six DCs for Llama 3 models (8B, 70B, 8 PP stages).
- Baseline algorithms: KSP-FF (K-shortest paths, first-fit assignment) and SD-FF (shortest-distance path, first-fit).
- Key results (Llama 3 70B, GPipe, micro-batches):
| Metric | KSP-FF | SD-FF | CBA (Ours) |
|---|---|---|---|
| Iteration time (s) | 102.4 | 98.7 | 68.0 |
| Bubble ratio (%) | 48.1 | 45.5 | 37.9 |
| Blocking prob. (%) | 17.3 | 15.9 | 13.8 |
- Improvements over best baseline:
- reduction in iteration time
- decrease in bubble ratio
- fewer blocking requests
CBA ablation studies show that disabling communication-bound task labeling or dynamic – latency updates leads to inferior bubble ratio and blocking probability.
5. Theoretical and Algorithmic Complexity
The per-iteration complexity of CBA (Fu et al., 23 Dec 2025) is :
- -shortest-path: per request,
- Contiguity and fitness computation: per path.
Given practical values (e.g., , , , ), CBA remains computationally tractable even in large network topologies.
6. Broader Context, Synergies, and Related Multi-DC Frameworks
Multi-DC optical networking for distributed machine learning entails tight co-design between application-level pipeline-parallel training and optical network resource management. CrossPipe (Chen et al., 30 Jun 2025) generalizes multi-DC pipeline scheduling as a constraint optimization model, providing both CP solver and greedy near-optimal schedules, explicitly accounting for bandwidth and latency (via – model) and achieving up to reduction in training time compared to static schedules.
Alternate frameworks such as SPP (Luo et al., 2022), HelixPipe (Zhang et al., 1 Jul 2025), TawPipe (Wu et al., 12 Nov 2025), and BaPipe (Zhao et al., 2020) focus on device-level communication patterns, weight-passing schemes, and load-balanced stage partitioning, providing the necessary abstractions for scaling within or across DC boundaries.
CBA represents the state of the art in integrating pipeline-parallel task scheduling with real-time optical network state, adapting spectrum assignment dynamically, and maximizing utilization under stringent multi-DC constraints. Such communication-bound-aware resource assignment mechanisms are fundamental to the sustainable scaling of distributed LLM and DNN training workloads across geographically distributed datacenters.