Integrating GPU internal contention into traffic re-shaping

Ascertain how to incorporate quantified GPU-internal contention characteristics into the traffic re-shaping of DMA flows at the accelerator interface so that multi-tenant GPU traffic can be managed predictably and fairly.

Background

The paper documents several sources of contention along accelerator I/O paths (e.g., PCIe interconnects, device interfaces, and accelerator heterogeneity) and uses traffic shaping to mitigate them. GPUs introduce additional internal contention mechanisms (e.g., scheduling, memory bandwidth arbitration) that influence end-to-end throughput.

The authors explicitly raise the unresolved question of how to incorporate an understanding of GPU-internal contention into the traffic patterns that are re-shaped, highlighting the need to couple device-internal behavior with host–device traffic management for accurate isolation.

References

Some of the big open problems when applying our design to this setting will be (1) how to perform traffic shaping when GPUs are used under spatial multiplexing, (2) how to incorporate the understanding of GPU internal contention into the traffic patterns to re-shape, and (3) how to incorporate our findings when managing PCIe congestion for multi-GPU servers.

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild  (2407.10098 - Zhao et al., 2024) in Section 6, Managing I/O contention for GPUs