Papers
Topics
Authors
Recent
Search
2000 character limit reached

DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs

Published 8 Apr 2025 in cs.DC | (2504.08795v1)

Abstract: The widespread use of Deep Neural Networks (DNNs) is limited by high computational demands, especially in constrained environments. GPUs, though effective accelerators, often face underutilization and rely on coarse-grained scheduling. This paper introduces DARIS, a priority-based real-time DNN scheduler for GPUs, utilizing NVIDIA's MPS and CUDA streaming for spatial sharing, and a synchronization-based staging method for temporal partitioning. In particular, DARIS improves GPU utilization and uniquely analyzes GPU concurrency by oversubscribing computing resources. It also supports zero-delay DNN migration between GPU partitions. Experiments show DARIS improves throughput by 15% and 11.5% over batching and state-of-the-art schedulers, respectively, even without batching. All high-priority tasks meet deadlines, with low-priority tasks having under 2% deadline miss rate. High-priority response times are 33% better than those of low-priority tasks.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.