Deployment choice: densely populated accelerator servers vs. homogeneous per-server devices

Determine whether cloud providers should deploy densely populated accelerator servers or attach homogeneous accelerator devices per server to optimize total cost of ownership and sustainability while maintaining performance isolation and manageability for accelerator-as-a-service at cloud scale.

Background

The paper argues that sharing accelerators reduces costs and highlights the importance of cloud-scale management and isolation. However, the optimal deployment model remains unresolved: either concentrating many accelerators in a few servers (dense accelerator servers) or distributing similar accelerators uniformly across servers (homogeneous per-server devices).

The authors explicitly state this as an open question in the context of business and sustainability trade-offs, tying it to capacity planning, admission control, and serviceability in large public cloud environments.

References

For example, using densely populated accelerator servers or attaching homogeneous accelerator devices per server is an open question.

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild  (2407.10098 - Zhao et al., 2024) in Section 6, Accelerator cost and cloud-scale management