TopoOpt scalability and cost-efficiency for >1K-GPU clusters

Ascertain whether the TopoOpt architecture that uses a multi-tier optical patch panel fabric with slow reconfiguration can interconnect clusters with more than 1,000 GPUs while maintaining cost-efficiency, explicitly accounting for the required number of patch panel ports, the need for long-reach optical transceivers to overcome insertion loss across multiple switching layers, and the resulting hardware cost scaling.

Background

The paper compares the proposed fabric with alternative interconnects, including TopoOpt, in a detailed networking cost analysis. It argues that to scale TopoOpt beyond approximately 1,000 GPUs, a multi-tier optical patch panel network is required, which increases insertion loss and therefore demands long-reach transceivers and substantial port counts. The authors note that these requirements may significantly raise costs and, unlike their approach using regionally reconfigurable optical circuits plus electrical fabric, could challenge TopoOpt’s cost-efficiency at scale.

Within this discussion, the authors explicitly state that it is unclear whether TopoOpt can both achieve the necessary scale and preserve its cost-efficiency, thereby identifying a concrete unresolved question about TopoOpt’s practicality for very large clusters.

References

However, TopoOpt requires a multi-tier patch panel fabric to form a network capable of interconnecting more than 1K GPUs. Achieving this necessitates extensive patch panel ports and expensive long-reach transceivers to compensate for the insertion loss of optical signals across multiple switching layers. As a result, it remains unclear whether TopoOpt is able to interconnect such large clusters and maintain its cost-efficiency.

mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training (2501.03905 - Liao et al., 7 Jan 2025) in Section 6.2 (Networking Cost Analysis)