Traffic shaping for GPUs under spatial multiplexing

Determine how to perform PCIe-based accelerator-side traffic shaping for GPUs when they are used under spatial multiplexing so that multi-tenant GPU usage achieves predictable performance isolation and service-level agreements in public cloud environments.

Background

The paper proposes proactively shaping accelerator-related I/O traffic at the accelerator interface to achieve performance isolation in public clouds, motivated by observed contention in PCIe paths and device interfaces. While the approach is shown for I/O accelerators, extending it to GPUs introduces new challenges when GPUs are shared concurrently among tenants (spatial multiplexing) rather than time multiplexed.

In the "Managing I/O contention for GPUs" subsection, the authors identify the need to adapt their accelerator traffic shaping approach to the GPU setting, specifically calling out spatial multiplexing as a major unresolved issue requiring techniques that preserve isolation across concurrent tenants on the same GPU.

References

Some of the big open problems when applying our design to this setting will be (1) how to perform traffic shaping when GPUs are used under spatial multiplexing, (2) how to incorporate the understanding of GPU internal contention into the traffic patterns to re-shape, and (3) how to incorporate our findings when managing PCIe congestion for multi-GPU servers.

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild  (2407.10098 - Zhao et al., 2024) in Section 6, Managing I/O contention for GPUs