Comparing Task Graph Scheduling Algorithms: An Adversarial Approach (2403.07120v2)
Abstract: Scheduling a task graph representing an application over a heterogeneous network of computers is a fundamental problem in distributed computing. It is known to be not only NP-hard but also not polynomial-time approximable within a constant factor. As a result, many heuristic algorithms have been proposed over the past few decades. Yet it remains largely unclear how these algorithms compare to each other in terms of the quality of schedules they produce. We identify gaps in the traditional benchmarking approach to comparing task scheduling algorithms and propose a simulated annealing-based adversarial analysis approach called PISA to help address them. We also introduce SAGA, a new open-source library for comparing task scheduling algorithms. We use SAGA to benchmark 15 algorithms on 16 datasets and PISA to compare the algorithms in a pairwise manner. Algorithms that appear to perform similarly on benchmarking datasets are shown to perform very differently on adversarially chosen problem instances. Interestingly, the results indicate that this is true even when the adversarial search is constrained to selecting among well-structured, application-specific problem instances. This work represents an important step towards a more general understanding of the performance boundaries between task scheduling algorithms on different families of problem instances.
- Makeflow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids. In Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies (Scottsdale, Arizona, USA) (SWEET ’12). Association for Computing Machinery, New York, NY, USA, Article 1, 13 pages. https://doi.org/10.1145/2443416.2443417
- The relative performance of various mapping algorithms is independent of sizable variances in run-time predictions. In Proceedings Seventh Heterogeneous Computing Workshop (HCW’98). 79–87. https://doi.org/10.1109/HCW.1998.666547
- Anonymous Authors. 2023a. Scheduling Algorithms Gathered. Github. https://anonymous.4open.science/r/saga-1F6D/README.md
- Anonymous Authors. 2023b. Scheduling Algorithms Gathered: A Framework for Implementing, Evaluating, and Comparing Task Graph Scheduling Algorithms. Technical Report. Anonymous Institution.
- Abbas Bazzi and Ashkan Norouzi-Fard. 2015. Towards Tight Lower Bounds for Scheduling Problems. In Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings (Lecture Notes in Computer Science, Vol. 9294), Nikhil Bansal and Irene Finocchi (Eds.). Springer, 118–129. https://doi.org/10.1007/978-3-662-48350-3_11
- Task scheduling strategies for workflow-based applications in grids. In 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 9-12 May, 2005, Cardiff, UK. IEEE Computer Society, 759–767. https://doi.org/10.1109/CCGRID.2005.1558639
- A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. In Proceedings. Eighth Heterogeneous Computing Workshop (HCW’99). 15–29. https://doi.org/10.1109/HCW.1999.765093
- A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems. J. Parallel Distributed Comput. 61, 6 (2001), 810–837. https://doi.org/10.1006/jpdc.2000.1714
- Comparative Evaluation Of The Robustness Of DAG Scheduling Heuristics. In Grid Computing - Achievements and Prospects: CoreGRID Integration Workshop 2008, Hersonissos, Crete, Greece, April 2-4, 2008, Sergei Gorlatch, Paraskevi Fragopoulou, and Thierry Priol (Eds.). Springer, 73–84. https://doi.org/10.1007/978-0-387-09457-1_7
- Automated generation of scientific workflow generators with WfChef. Future Gener. Comput. Syst. 147 (2023), 16–29. https://doi.org/10.1016/j.future.2023.04.031
- Random graph generation for scheduling simulations. ICST. https://doi.org/10.4108/ICST.SIMUTOOLS2010.8667
- Using simple PID-inspired controllers for online resilient resource management of distributed scientific workflows. Future Gener. Comput. Syst. 95 (2019), 615–628. https://doi.org/10.1016/j.future.2019.01.015
- Empowering Agroecosystem Modeling with HTC Scientific Workflows: The Cycles Model Use Case. In 2019 IEEE International Conference on Big Data (IEEE BigData), Los Angeles, CA, USA, December 9-12, 2019, Chaitanya K. Baru, Jun Huan, Latifur Khan, Xiaohua Hu, Ronay Ak, Yuanyuan Tian, Roger S. Barga, Carlo Zaniolo, Kisung Lee, and Yanfang (Fanny) Ye (Eds.). IEEE, 4545–4552. https://doi.org/10.1109/BigData47090.2019.9006107
- Pegasus, a workflow management system for science automation. Future Generation Computer Systems 46 (2015), 17–35. https://doi.org/10.1016/j.future.2014.10.008
- Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 4 (01 Apr 2017), 316–319. https://doi.org/10.1038/nbt.3820
- Hesham El-Rewini and T. G. Lewis. 1990. Scheduling parallel program tasks onto arbitrary target machines. J. Parallel and Distrib. Comput. 9, 2 (1990), 138–153. https://doi.org/10.1016/0743-7315(90)90042-N
- Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science. In Seventh International Workshop on Data-Intensive Computing in the Clouds, DataCloud@SC 2016, Salt Lake, UT, USA, November 14, 2016. IEEE Computer Society, 1–8. https://doi.org/10.1109/DataCloud.2016.004
- M. R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.
- Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey. In Discrete Optimization II, P.L. Hammer, E.L. Johnson, and B.H. Korte (Eds.). Annals of Discrete Mathematics, Vol. 5. Elsevier, 287–326. https://doi.org/10.1016/S0167-5060(08)70356-X
- R. L. Graham. 1969. Bounds on Multiprocessing Timing Anomalies. SIAM J. Appl. Math. 17, 2 (1969), 416–429. https://doi.org/10.1137/0117039 arXiv:https://doi.org/10.1137/0117039
- Nick Hazekamp and Douglas Thain. 2017. Makeflow Examples Repository. Github. http://github.com/cooperative-computing-lab/makeflow-examples
- Task Scheduling in Cloud Computing based on Meta-heuristics: Review, Taxonomy, Open Challenges, and Future Trends. Swarm Evol. Comput. 62 (2021), 100841. https://doi.org/10.1016/j.swevo.2021.100841
- Scheduling Precedence Graphs in Systems with Interprocessor Communication Times. SIAM J. Comput. 18, 2 (1989), 244–257. https://doi.org/10.1137/0218016
- Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29, 3 (2013), 682–692. https://doi.org/10.1016/j.future.2012.08.015
- Lessons Learned from the Chameleon Testbed. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15-17, 2020, Ada Gavrilovska and Erez Zadok (Eds.). USENIX Association, 219–233. https://www.usenix.org/conference/atc20/presentation/keahey
- Optimization by simulated annealing. science 220, 4598 (1983), 671–680. https://doi.org/10.1126/science.220.4598.671
- Y.-K. Kwok and I. Ahmad. 1998. Benchmarking the task graph scheduling algorithms. In Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing. 531–537. https://doi.org/10.1109/IPPS.1998.669967
- Multiprocessor scheduling with interprocessor communication delays. Operations Research Letters 7, 3 (1988), 141–147. https://doi.org/10.1016/0167-6377(88)90080-6
- PGen: large-scale genomic variations analysis workflow and browser in SoyKB. BMC Bioinform. 17, S-13 (2016), 337. https://doi.org/10.1186/s12859-016-1227-y
- Minding the gap between Fast Heuristics and their Optimal Counterparts. In Hot Topics in Networking. acm. https://www.microsoft.com/en-us/research/publication/minding-the-gap-between-fast-heuristics-and-their-optimal-counterparts/
- Hyunok Oh and Soonhoi Ha. 1996. A Static Scheduling Heuristic for Heterogeneous Processors. In Euro-Par ’96 Parallel Processing, Second International Euro-Par Conference, Lyon, France, August 26-29, 1996, Proceedings, Volume II (Lecture Notes in Computer Science, Vol. 1124), Luc Bougé, Pierre Fraigniaud, Anne Mignotte, and Yves Robert (Eds.). Springer, 573–577. https://doi.org/10.1007/BFb0024750
- Andrei Radulescu and Arjan J. C. van Gemund. 2000. Fast and Effective Task Scheduling in Heterogeneous Systems. In 9th Heterogeneous Computing Workshop, HCW 2000, Cancun, Mexico, May 1, 2000. IEEE Computer Society, 229–238. https://doi.org/10.1109/HCW.2000.843747
- Mats Rynge. 2017. SRA Search Pegasus Workflow. Github. https://github.com/pegasus-isi/sra-search-pegasus-workflow
- Producing an Infrared Multiwavelength Galactic Plane Atlas Using Montage, Pegasus, and Amazon Web Services. In Astronomical Data Analysis Software and Systems XXIII (Astronomical Society of the Pacific Conference Series, Vol. 485), N. Manset and P. Forshay (Eds.). 211.
- RIoTBench: A Real-time IoT Benchmark for Distributed Stream Processing Platforms. CoRR abs/1701.08530 (2017). arXiv:1701.08530 http://arxiv.org/abs/1701.08530
- G.C. Sih and E.A. Lee. 1993a. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Transactions on Parallel and Distributed Systems 4, 2 (1993), 175–187. https://doi.org/10.1109/71.207593
- Gilbert C. Sih and Edward A. Lee. 1993b. A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures. IEEE Trans. Parallel Distributed Syst. 4, 2 (1993), 175–187. https://doi.org/10.1109/71.207593
- Task Scheduling Algorithms for Heterogeneous Processors. In 8th Heterogeneous Computing Workshop, HCW 1999, San Juan, Puerto Rico, April12, 1999. IEEE Computer Society, 3–14. https://doi.org/10.1109/HCW.1999.765092
- Resilient Execution of Data-triggered Applications on Edge, Fog and Cloud Resources. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 473–483. https://doi.org/10.1109/CCGrid54584.2022.00057
- Huijun Wang and Oliver Sinnen. 2018. List-Scheduling versus Cluster-Scheduling. IEEE Trans. Parallel Distributed Syst. 29, 8 (2018), 1736–1749. https://doi.org/10.1109/TPDS.2018.2808959