INSPIRIT: Optimizing Heterogeneous Task Scheduling through Adaptive Priority in Task-based Runtime Systems (2404.03226v1)
Abstract: As modern HPC computing platforms become increasingly heterogeneous, it is challenging for programmers to fully leverage the computation power of massive parallelism offered by such heterogeneity. Consequently, task-based runtime systems have been proposed as an intermediate layer to hide the complex heterogeneity from the application programmers. The core functionality of these systems is to realize efficient task-to-resource mapping in the form of Directed Acyclic Graph (DAG) scheduling. However, existing scheduling schemes face several drawbacks to determine task priorities due to the heavy reliance on domain knowledge or failure to efficiently exploit the interaction of application and hardware characteristics. In this paper, we propose INSPIRIT, an efficient and lightweight scheduling framework with adaptive priority designed for task-based runtime systems. INSPIRIT introduces two novel task attributes \textit{inspiring ability} and \textit{inspiring efficiency} for dictating scheduling, eliminating the need for application domain knowledge. In addition, INSPIRIT jointly considers runtime information such as ready tasks in worker queues to guide task scheduling. This approach exposes more performance opportunities in heterogeneous hardware at runtime while effectively reducing the overhead for adjusting task priorities. Our evaluation results demonstrate that INSPIRIT achieves superior performance compared to cutting edge scheduling schemes on both synthesized and real-world task DAGs.
- Frontier: Exploring Exascale. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–16.
- StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In Euro-Par 2009 Parallel Processing: 15th International Euro-Par Conference, Delft, The Netherlands, August 25-28, 2009. Proceedings 15. Springer, 863–874.
- Constraint-based scheduling: applying constraint programming to scheduling problems. Kluwer Academic Publishers, Netherlands. https://doi.org/10.1007/978-1-4615-1479-4
- Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. https://doi.org/10.1109/SC.2012.71
- Scheduling independent tasks on multi-cores with GPU accelerators. Concurrency and Computation: Practice and Experience 27, 6 (2015), 1625–1638. https://doi.org/10.1002/cpe.3359 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.3359
- Robert D Blumofe and Charles E Leiserson. 1999. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM) 46, 5 (1999), 720–748.
- Parsec: Exploiting heterogeneity to enhance scalability. Computing in Science & Engineering 15, 6 (2013), 36–45.
- Productive cluster programming with OmpSs. In Euro-Par 2011 Parallel Processing: 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29-September 2, 2011, Proceedings, Part I 17. Springer, 555–566.
- Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva, Switzerland) (PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 2, 11 pages. https://doi.org/10.1145/3394277.3401846
- Rohit Chandra. 2001. Parallel programming in OpenMP. Morgan kaufmann.
- An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. Journal of Supercomputing 65, 2 (2013), 886–902.
- Evaluating the Potential of Disaggregated Memory Systems for HPC applications. arXiv preprint arXiv:2306.04014 (2023).
- A new genetic algorithm for flexible job-shop scheduling problems. Journal of Mechanical Science and Technology 29 (2015), 1273–1281.
- Jean Baptiste Joseph Fourier. 2009. Théorie analytique de la chaleur. https://api.semanticscholar.org/CorpusID:94452451
- Neural Topological Ordering for Computation Graphs. Advances in Neural Information Processing Systems 35 (2022), 17327–17339.
- Neural Topological Ordering for Computation Graphs. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 17327–17339. https://proceedings.neurips.cc/paper_files/paper/2022/file/6ef586bdf0af0b609b1d0386a3ce0e4b-Paper-Conference.pdf
- Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In Proceedings of the 2007 international workshop on Parallel symbolic computation. 15–23.
- PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems. Parallel Comput. 28, 2 (2002), 301–321.
- Backtracking-based load balancing. ACM Sigplan Notices 44, 4 (2009), 55–64.
- RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 338–352.
- Yu-Kwong Kwok and Ishfaq Ahmad. 1996. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. IEEE transactions on parallel and distributed systems 7, 5 (1996), 506–521.
- Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems. IEEE Transactions on computers 64, 1 (2013), 191–204.
- Hatem Ltaief. 2016. HiCMA: Hierarchical Computations on Manycore Architectures library. In The 7th International Conference on Computational Methods (ICCM2016).
- Reliable Task Scheduling for Heterogeneous Distributed Computing Environment. In 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies. 494–496. https://doi.org/10.1109/ACT.2009.127
- Ahmed Zaki Semar Shahul and Oliver Sinnen. 2010. Scheduling task graphs optimally with A*. The Journal of Supercomputing 51 (2010), 310–332. https://api.semanticscholar.org/CorpusID:8152086
- Task bench: A parameterized benchmark for evaluating parallel runtime performance. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.
- Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous Machines. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2023). https://api.semanticscholar.org/CorpusID:261392530
- Veronika Thost and Jie Chen. 2021. Directed acyclic graph neural networks. arXiv preprint arXiv:2101.07965 (2021).
- Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE transactions on parallel and distributed systems 13, 3 (2002), 260–274.
- Adaptive DAG tasks scheduling with deep reinforcement learning. In Algorithms and Architectures for Parallel Processing: 18th International Conference, ICA3PP 2018, Guangzhou, China, November 15-17, 2018, Proceedings, Part II 18. Springer, 477–490.
- Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 204–217.
- D-vae: A variational autoencoder for directed acyclic graphs. Advances in Neural Information Processing Systems 32 (2019).