Combine Vortex’s IO redistribution with intra-GPU slicing for higher system efficiency

Develop and evaluate techniques that integrate Vortex’s IO redistribution across GPUs with intra-GPU slicing (e.g., partitioning a single GPU into slices) to collocate complementary workloads and further improve overall system efficiency in multi-tenant environments.

Background

Prior work shows that GPU slicing can improve utilization by collocating workloads that stress different microarchitectural resources on the same GPU. Vortex’s approach redistributes IO across multiple GPUs to accelerate IO-bound analytics on a single target GPU.

The authors suggest combining these two ideas—cross-GPU IO pooling with sub-GPU slicing—to achieve higher system-wide efficiency, but leave the design and evaluation of such an integrated approach as future work.

References

{'s IO redistribution idea at the GPU level, can be combined with GPU slicing at sub-GPU granularity for even higher overall system efficiency, which we leave as future work.

— Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics (2502.09541 - Yuan et al., 13 Feb 2025) in Section 10: Related Work, GPU slicing for workload collocation

Combine Vortex’s IO redistribution with intra-GPU slicing for higher system efficiency

Sponsor

Background

References

Related Problems