Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

INSPIRIT: Optimizing Heterogeneous Task Scheduling through Adaptive Priority in Task-based Runtime Systems (2404.03226v1)

Published 4 Apr 2024 in cs.DC

Abstract: As modern HPC computing platforms become increasingly heterogeneous, it is challenging for programmers to fully leverage the computation power of massive parallelism offered by such heterogeneity. Consequently, task-based runtime systems have been proposed as an intermediate layer to hide the complex heterogeneity from the application programmers. The core functionality of these systems is to realize efficient task-to-resource mapping in the form of Directed Acyclic Graph (DAG) scheduling. However, existing scheduling schemes face several drawbacks to determine task priorities due to the heavy reliance on domain knowledge or failure to efficiently exploit the interaction of application and hardware characteristics. In this paper, we propose INSPIRIT, an efficient and lightweight scheduling framework with adaptive priority designed for task-based runtime systems. INSPIRIT introduces two novel task attributes \textit{inspiring ability} and \textit{inspiring efficiency} for dictating scheduling, eliminating the need for application domain knowledge. In addition, INSPIRIT jointly considers runtime information such as ready tasks in worker queues to guide task scheduling. This approach exposes more performance opportunities in heterogeneous hardware at runtime while effectively reducing the overhead for adjusting task priorities. Our evaluation results demonstrate that INSPIRIT achieves superior performance compared to cutting edge scheduling schemes on both synthesized and real-world task DAGs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Frontier: Exploring Exascale. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–16.
  2. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In Euro-Par 2009 Parallel Processing: 15th International Euro-Par Conference, Delft, The Netherlands, August 25-28, 2009. Proceedings 15. Springer, 863–874.
  3. Constraint-based scheduling: applying constraint programming to scheduling problems. Kluwer Academic Publishers, Netherlands. https://doi.org/10.1007/978-1-4615-1479-4
  4. Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. https://doi.org/10.1109/SC.2012.71
  5. Scheduling independent tasks on multi-cores with GPU accelerators. Concurrency and Computation: Practice and Experience 27, 6 (2015), 1625–1638. https://doi.org/10.1002/cpe.3359 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.3359
  6. Robert D Blumofe and Charles E Leiserson. 1999. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM) 46, 5 (1999), 720–748.
  7. Parsec: Exploiting heterogeneity to enhance scalability. Computing in Science & Engineering 15, 6 (2013), 36–45.
  8. Productive cluster programming with OmpSs. In Euro-Par 2011 Parallel Processing: 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29-September 2, 2011, Proceedings, Part I 17. Springer, 555–566.
  9. Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva, Switzerland) (PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 2, 11 pages. https://doi.org/10.1145/3394277.3401846
  10. Rohit Chandra. 2001. Parallel programming in OpenMP. Morgan kaufmann.
  11. An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. Journal of Supercomputing 65, 2 (2013), 886–902.
  12. Evaluating the Potential of Disaggregated Memory Systems for HPC applications. arXiv preprint arXiv:2306.04014 (2023).
  13. A new genetic algorithm for flexible job-shop scheduling problems. Journal of Mechanical Science and Technology 29 (2015), 1273–1281.
  14. Jean Baptiste Joseph Fourier. 2009. Théorie analytique de la chaleur. https://api.semanticscholar.org/CorpusID:94452451
  15. Neural Topological Ordering for Computation Graphs. Advances in Neural Information Processing Systems 35 (2022), 17327–17339.
  16. Neural Topological Ordering for Computation Graphs. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 17327–17339. https://proceedings.neurips.cc/paper_files/paper/2022/file/6ef586bdf0af0b609b1d0386a3ce0e4b-Paper-Conference.pdf
  17. Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In Proceedings of the 2007 international workshop on Parallel symbolic computation. 15–23.
  18. PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems. Parallel Comput. 28, 2 (2002), 301–321.
  19. Backtracking-based load balancing. ACM Sigplan Notices 44, 4 (2009), 55–64.
  20. RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 338–352.
  21. Yu-Kwong Kwok and Ishfaq Ahmad. 1996. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. IEEE transactions on parallel and distributed systems 7, 5 (1996), 506–521.
  22. Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems. IEEE Transactions on computers 64, 1 (2013), 191–204.
  23. Hatem Ltaief. 2016. HiCMA: Hierarchical Computations on Manycore Architectures library. In The 7th International Conference on Computational Methods (ICCM2016).
  24. Reliable Task Scheduling for Heterogeneous Distributed Computing Environment. In 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies. 494–496. https://doi.org/10.1109/ACT.2009.127
  25. Ahmed Zaki Semar Shahul and Oliver Sinnen. 2010. Scheduling task graphs optimally with A*. The Journal of Supercomputing 51 (2010), 310–332. https://api.semanticscholar.org/CorpusID:8152086
  26. Task bench: A parameterized benchmark for evaluating parallel runtime performance. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.
  27. Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous Machines. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2023). https://api.semanticscholar.org/CorpusID:261392530
  28. Veronika Thost and Jie Chen. 2021. Directed acyclic graph neural networks. arXiv preprint arXiv:2101.07965 (2021).
  29. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE transactions on parallel and distributed systems 13, 3 (2002), 260–274.
  30. Adaptive DAG tasks scheduling with deep reinforcement learning. In Algorithms and Architectures for Parallel Processing: 18th International Conference, ICA3PP 2018, Guangzhou, China, November 15-17, 2018, Proceedings, Part II 18. Springer, 477–490.
  31. Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 204–217.
  32. D-vae: A variational autoencoder for directed acyclic graphs. Advances in Neural Information Processing Systems 32 (2019).

Summary

We haven't generated a summary for this paper yet.