Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks (2401.16529v1)
Abstract: Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose two novel techniques, namely the kernel thread and IOCTL-based approaches, to enable preemptive priority-based scheduling for real-time GPU tasks. Our approaches exert control over GPU context scheduling at the device driver level and enable preemptive GPU scheduling based on task priorities. The kernel thread-based approach achieves this without requiring modifications to user-level programs, while the IOCTL-based approach needs only a single macro at the boundaries of GPU access segments. In addition, we provide a comprehensive response time analysis that takes into account overlaps between different task segments, mitigating pessimism in worst-case estimates. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and timeliness of real-time tasks. The results highlight significant improvements over prior work, with up to 40\% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.
- R. Rajkumar, “Real-time synchronization protocols for shared memory multiprocessors,” in Proceedings., 10th International Conference on Distributed Computing Systems. IEEE Computer Society, 1990, pp. 116–117.
- P. Patel, I. Baek, H. Kim, and R. Rajkumar, “Analytical enhancements and practical insights for MPCP with self-suspensions,” in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2018.
- B. B. Brandenburg, “The fmlp+: An asymptotically optimal real-time locking protocol for suspension-aware analysis,” in 2014 26th Euromicro Conference on Real-Time Systems, 2014, pp. 61–71.
- S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. Rajkumar, “RGEM: A responsive GPGPU execution model for runtime engines,” in 2011 IEEE 32nd Real-Time Systems Symposium, 2011, pp. 57–66.
- C. Basaran and K. Kang, “Supporting preemptive task executions and memory copies in GPGPUs,” in 2012 24th Euromicro Conference on Real-Time Systems, 2012, pp. 287–296.
- H. Zhou, G. Tong, and C. Liu, “GPES: a preemptive execution system for GPGPU computing,” in 21st IEEE Real-Time and Embedded Technology and Applications Symposium, 2015, pp. 87–97.
- N. Capodieci, R. Cavicchioli, M. Bertogna, and A. Paramakuru, “Deadline-based scheduling for GPU with preemption support,” in 2018 IEEE Real-Time Systems Symposium (RTSS). IEEE, 2018, pp. 119–130.
- J. Bakita and J. H. Anderson, “Hardware Compute Partitioning on NVIDIA GPUs,” in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2023.
- Y. Xiang and H. Kim, “Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference,” in 2019 IEEE Real-Time Systems Symposium (RTSS). IEEE, 2019, pp. 392–405.
- G. Elliott and J. Anderson, “Globally scheduled real-time multiprocessor systems with GPUs,” Real-Time Systems, vol. 48, pp. 34–74, 05 2012.
- ——, “An optimal k𝑘kitalic_k-exclusion real-time locking protocol motivated by multi-GPU systems,” Real-Time Systems, vol. 49, no. 2, pp. 140–170, 2013.
- G. Elliott et al., “GPUSync: A framework for real-time GPU management,” in IEEE Real-Time Systems Symposium (RTSS), 2013.
- S. Saha, Y. Xiang, and H. Kim, “STGM: Spatio-temporal GPU management for real-time tasks,” in 2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), 2019, pp. 1–6.
- Y. Wang, M. Karimi, Y. Xiang, and H. Kim, “Balancing energy efficiency and real-time performance in GPU scheduling,” in 2021 IEEE Real-Time Systems Symposium (RTSS). IEEE, 2021, pp. 110–122.
- S. Jain, I. Baek, S. Wang, and R. Rajkumar, “Fractional GPUs: Software-based compute and memory bandwidth reservation for GPUs,” in 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2019, pp. 29–41.
- Y. Wang, M. Karimi, and H. Kim, “Towards Energy-Efficient Real-Time Scheduling of Heterogeneous Multi-GPU Systems,” in 2022 IEEE Real-Time Systems Symposium (RTSS). IEEE, 2022, pp. 409–421.
- A. Zou, J. Li, C. D. Gill, and X. Zhang, “RTGPU: Real-time GPU scheduling of hard deadline parallel tasks with fine-grain utilization,” IEEE Transactions on Parallel and Distributed Systems, 2023.
- B. Wu, G. Chen, D. Li, X. Shen, and J. Vetter, “Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations,” in Proceedings of the 29th ACM on International Conference on Supercomputing, 2015, pp. 119–130.
- M. Han, H. Zhang, R. Chen, and H. Chen, “Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences,” in 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). Carlsbad, CA: USENIX Association, Jul. 2022, pp. 539–558. [Online]. Available: https://www.usenix.org/conference/osdi22/presentation/han
- B. B. Brandenburg, “The FMLP+: An asymptotically optimal real-time locking protocol for suspension-aware analysis,” in 2014 26th Euromicro Conference on Real-Time Systems. IEEE, 2014, pp. 61–71.
- H. Kim, P. Patel, S. Wang, and R. R. Rajkumar, “A server-based approach for predictable GPU access control,” in 2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), 2017, pp. 1–10.
- AnandTech, “The NVIDIA GeForce GTX 1080 & GTX 1070 Founders Editions Review,” https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review.
- N. C. Audsley, “Optimal priority assignment and feasibility of static priority tasks with arbitrary start times,” 2007.
- M. Bertogna, M. Cirinei, and G. Lipari, “Schedulability analysis of global scheduling algorithms on multiprocessor platforms,” IEEE Transactions on parallel and distributed systems, vol. 20, no. 4, pp. 553–566, 2008.
- K. Bletsas, N. C. Audsley, W.-H. Huang, J.-J. Chen, and G. Nelissen, “Errata for three papers (2004-05) on fixed-priority scheduling with self-suspensions,” Leibniz Transactions on Embedded Systems, vol. 5, no. 1, p. 02:1–02:20, May 2018.
- R. Bril, E. Steffens, and W. Verhaegh, “Best-case response times and jitter analysis of real-time tasks,” J. Scheduling, vol. 7, pp. 133–147, 03 2004.
- E. Bini and G. C. Buttazzo, “Measuring the performance of schedulability tests,” Real-Time Syst., vol. 30, no. 1–2, p. 129–154, may 2005. [Online]. Available: https://doi.org/10.1007/s11241-005-0507-9
- Yidi Wang (15 papers)
- Cong Liu (169 papers)
- Daniel Wong (15 papers)
- Hyoseung Kim (14 papers)