Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ASA -- The Adaptive Scheduling Algorithm (2401.09733v1)

Published 18 Jan 2024 in cs.DC

Abstract: In High Performance Computing (HPC) infrastructures, the control of resources by batch systems can lead to prolonged queue waiting times and adverse effects on the overall execution times of applications, particularly in data-intensive and low-latency workflows where efficient processing hinges on resource planning and timely allocation. Allocating the maximum capacity upfront ensures the fastest execution but results in spare and idle resources, extended queue waits, and costly usage. Conversely, dynamic allocation based on workflow stage requirements optimizes resource usage but may negatively impact the total workflow makespan. To address these issues, we introduce ASA, the Adaptive Scheduling Algorithm. ASA is a novel, convergence-proven scheduling technique that minimizes jobs inter-stage waiting times by estimating the queue waiting times to proactively submit resource change requests ahead of time. It strikes a balance between exploration and exploitation, considering both learning (waiting times) and applying learnt insights. Real-world experiments over two supercomputers centers with scientific workflows demonstrate ASA's effectiveness, achieving near-optimal resource utilization and accuracy, with up to 10% and 2% reductions in average workflow queue waiting times and makespan, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. 2009. Karnak from TeraGrid Round Table discussion.
  2. 2018. Montage - An astronomical image mosaic engine. http://montage.ipac.caltech.edu/
  3. Kepler: an extensible system for design and execution of scientific workflows. In Proceedings. 16th International Conference on Scientific and Statistical Database Management.
  4. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research (1997).
  5. Common workflow language, v1. 0. (2016).
  6. Big data and extreme-scale computing: Pathways to Convergence-Toward a shaping strategy for a future software and data ecosystem for scientific inquiry. The International Journal of High Performance Computing Applications (2018).
  7. Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep (2008).
  8. Montage: A grid enabled image mosaic service for the national virtual observatory. In Astronomical Data Analysis Software and Systems (ADASS) XIII.
  9. Pmix: process management for exascale environments. Parallel Comput. (2018).
  10. Fan Chung and Linyuan Lu. 2006. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics (2006).
  11. Infrastructure and api extensions for elastic execution of mpi applications. In Proceedings of the 23rd European MPI Users’ Group Meeting. ACM.
  12. Pegasus: Mapping scientific workflows onto the grid. In European Across Grids Conference. Springer.
  13. The future of scientific workflows. The International Journal of High Performance Computing Applications (2018).
  14. E-HPC: A Library for Elastic Resource Management in HPC Environments. In Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science (WORKS ’17).
  15. Francesc Guim and Julita Corbalan. 2007. A job self-scheduling policy for HPC infrastructures. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer.
  16. Tigres workflow library: Supporting scientific pipelines on hpc systems. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE.
  17. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. (2011).
  18. SLURM: Simple Linux Utility for Resource Management. In In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP). Springer-Verlag.
  19. A tale of two data-intensive paradigms: Applications, abstractions, and architectures. In IEEE BigData Congress, 2014.
  20. J Zico Kolter and Matthew J Johnson. 2011. REDD: A public data set for energy disaggregation research. In Workshop on Data Mining Applications in Sustainability (SIGKDD), San Diego, CA.
  21. QBETS: queue bounds estimation from time series. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer.
  22. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics (2004).
  23. VGrADS: enabling e-Science workflows on grids and clouds with fault tolerance. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM.
  24. Scalable system scheduling for HPC and big data. J. Parallel and Distrib. Comput. (2018).
  25. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 351–364.
  26. Subhashini Sivagnanam and Kenneth Yoshimoto. 2010. TeraGrid Resource Selection Tools: A Road Test. In Proceedings of the 2010 TeraGrid Conference (TG ’10).
  27. Workflows for e-Science: scientific workflows for grids. Vol. 1. Springer.
  28. Sebastian B. Thrun. 1992. Efficient Exploration In Reinforcement Learning. Technical Report.
  29. Autonomic streaming pipeline for scientific workflows. Concurrency and Computation: Practice and Experience (2011).
  30. Improving Karnak’s Wait Time Predictions. https://www.xsede.org/ecosystem/science-gateways/gateways-symposium
  31. Batch queue resource scheduling for workflow applications. In IEEE International Conference on Cluster Computing and Workshops. IEEE.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Abel Souza (10 papers)
  2. Kristiaan Pelckmans (18 papers)
  3. Devarshi Ghoshal (1 paper)
  4. Lavanya Ramakrishnan (7 papers)
  5. Johan Tordsson (4 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets