Papers
Topics
Authors
Recent
2000 character limit reached

Towards Efficient and Practical GPU Multitasking in the Era of LLM

Published 11 Aug 2025 in cs.OS and cs.DC | (2508.08448v1)

Abstract: GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet the demands of modern AI workloads. In this work, we highlight the key requirements for GPU multitasking, examine prior efforts, and discuss why they fall short. To advance toward efficient and practical GPU multitasking, we envision a resource management layer, analogous to a CPU operating system, to handle various aspects of GPU resource management and sharing. We outline the challenges and potential solutions, and hope this paper inspires broader community efforts to build the next-generation GPU compute paradigm grounded in multitasking.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.