Modeling queuing delays and resource contention in GPU API remoting performance
Develop a remoting cost model for GPU API remoting that explicitly incorporates queuing delays and resource contention, enabling the model to account for and accurately predict deviations between theoretical/emulation results and real hardware measurements of AI applications using RDMA or SHM backends.
Sponsor
References
Note that the results on real hardware may deviate from our theoretical (and emulation) model. This is due to the fact that we are unable to model the queuing delays and resource contentions, as well as the fact that the profile of several constants (e.g., $Start$ and $Time(api)$ in {eq:cost}) may have fluctuations.
— Characterizing Network Requirements for GPU API Remoting in AI Applications
(2401.13354 - Wang et al., 24 Jan 2024) in Section 5.2, When requirements meet real hardware