Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Predicting Dynamic Memory Requirements for Scientific Workflow Tasks (2311.08185v2)

Published 14 Nov 2023 in cs.DC

Abstract: With the increasing amount of data available to scientists in disciplines as diverse as bioinformatics, physics, and remote sensing, scientific workflow systems are becoming increasingly important for composing and executing scalable data analysis pipelines. When writing such workflows, users need to specify the resources to be reserved for tasks so that sufficient resources are allocated on the target cluster infrastructure. Crucially, underestimating a task's memory requirements can result in task failures. Therefore, users often resort to overprovisioning, resulting in significant resource wastage and decreased throughput. In this paper, we propose a novel online method that uses monitoring time series data to predict task memory usage in order to reduce the memory wastage of scientific workflow tasks. Our method predicts a task's runtime, divides it into k equally-sized segments, and learns the peak memory value for each segment depending on the total file input size. We evaluate the prototype implementation of our method using workflows from the publicly available nf-core repository, showing an average memory wastage reduction of 29.48% compared to the best state-of-the-art approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. J. S. Farnes, B. Mort, F. Dulwich, K. Adámek, A. Brown, J. Novotny, S. Salvini, and W. Armour, “Building the world’s largest radio telescope: The square kilometre array science data processor,” in 2018 IEEE 14th International Conference on e-Science (e-Science).   IEEE, 2018.
  2. M. Barisits, T. Beermann, V. Garonne, T. Javurek, M. Lassnig, C. Serfon, A. Collaboration et al., “The atlas data management system rucio: Supporting lhc run-2 and beyond,” in Journal of Physics: Conference Series, vol. 1085, no. 3.   IOP Publishing, 2018.
  3. R. F. da Silva, R. M. Badia, V. Bala, D. Bard, P.-T. Bremer, I. Buckley, S. Caino-Lores, K. Chard, C. Goble, S. Jha et al., “Workflows community summit 2022: A roadmap revolution,” arXiv preprint arXiv:2304.00019, 2023.
  4. J. Bader, J. Witzke, S. Becker, A. Lößer, F. Lehmann, L. Doehler, A. D. Vu, and O. Kao, “Towards advanced monitoring for scientific workflows,” in Big Data.   IEEE, 2022.
  5. E. Deelman, T. Peterka, I. Altintas, C. D. Carothers, K. K. van Dam, K. Moreland, M. Parashar, L. Ramakrishnan, M. Taufer, and J. Vetter, “The future of scientific workflows,” The International Journal of High Performance Computing Applications, vol. 32, no. 1, 2018.
  6. A. B. Yoo, M. A. Jette, and M. Grondona, “Slurm: Simple linux utility for resource management,” in Workshop on job scheduling strategies for parallel processing.   Springer, 2003.
  7. B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, omega, and kubernetes,” Communications of the ACM, 2016.
  8. F. Lehmann, J. Bader, F. Tschirpke, L. Thamsen, and U. Leser, “How workflow engines should talk to resource managers: A proposal for a common workflow scheduling interface,” in CCGrid, 2023.
  9. T. S. Phung, L. Ward, K. Chard, and D. Thain, “Not all tasks are created equal: Adaptive resource allocation for heterogeneous tasks in dynamic workflows,” in 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS).   IEEE, 2021.
  10. A. Hirales-Carbajal, A. Tchernykh, R. Yahyapour, J. L. González-García, T. Röblitz, and J. M. Ramírez-Alcaraz, “Multiple workflow scheduling strategies with user run time estimates on a grid,” Journal of Grid Computing, vol. 10, 2012.
  11. C. Witt, J. van Santen, and U. Leser, “Learning low-wastage memory allocations for scientific workflows at icecube,” in 2019 International Conference on High Performance Computing & Simulation (HPCS).   IEEE, 2019.
  12. A. Lößer, J. Witzke, F. Schintke, and B. Scheuermann, “Bottlemod: Modeling data flows and tasks for fast bottleneck analysis,” in 2022 IEEE International Conference on Big Data (Big Data).   IEEE, 2022.
  13. A. M. Kintsakis, F. E. Psomopoulos, and P. A. Mitkas, “Reinforcement learning based scheduling in a workflow management system,” Engineering Applications of Artificial Intelligence, vol. 81, 2019.
  14. B. Tovar, B. Lyons, K. Mohrman, B. Sly-Delgado, K. Lannon, and D. Thain, “Dynamic task shaping for high throughput data analysis applications in high energy physics,” in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).   IEEE, 2022.
  15. B. Tovar, R. F. da Silva, G. Juve, E. Deelman, W. Allcock, D. Thain, and M. Livny, “A job sizing strategy for high-throughput scientific workflows,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 2, 2017.
  16. C. Witt, D. Wagner, and U. Leser, “Feedback-based resource allocation for batch scheduling of scientific workflows,” in 2019 International Conference on High Performance Computing & Simulation (HPCS).   IEEE, 2019.
  17. J. Bader, N. Zunker, S. Becker, and O. Kao, “Leveraging reinforcement learning for task resource allocation in scientific workflows,” in 2022 IEEE International Conference on Big Data (Big Data).   IEEE, 2022.
  18. P. Di Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, and C. Notredame, “Nextflow enables reproducible computational workflows,” Nature biotechnology, vol. 35, no. 4, 2017.
  19. Y. Al-Dhuraibi, F. Paraiso, N. Djarallah, and P. Merle, “Autonomic vertical elasticity of docker containers with elasticdocker,” in 2017 IEEE CLOUD.   IEEE, 2017.
  20. Z. Zhuang, C. Tran, J. Weng, H. Ramachandra, and B. Sridharan, “Taming memory related performance pitfalls in linux cgroups,” in 2017 ICNC.   IEEE, 2017.
  21. P. A. Ewels, A. Peltzer, S. Fillinger, H. Patel, J. Alneberg, A. Wilm, M. U. Garcia, P. Di Tommaso, and S. Nahnsen, “The nf-core framework for community-curated bioinformatics pipelines,” Nature biotechnology, vol. 38, no. 3, 2020.
  22. J. A. F. Yates, T. C. Lamnidis, M. Borry, A. A. Valtueña, Z. Fagernäs, S. Clayton, M. U. Garcia, J. Neukamm, and A. Peltzer, “Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager,” PeerJ, vol. 9, 2021.
  23. F. Hanssen, M. U. Garcia, L. Folkersen, A. S. Pedersen, F. Lescai, S. Jodoin, E. Miller, O. Wacker, N. Smith, nf-core community et al., “Scalable and efficient dna sequencing analysis on different compute infrastructures aiding variant discovery,” bioRxiv, 2023.
  24. M. Garcia, S. Juhos, M. Larsson, P. I. Olason, M. Martin, J. Eisfeldt, S. DiLorenzo, J. Sandgren, T. D. De Ståhl, P. Ewels et al., “Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants,” F1000Research, vol. 9, 2020.
  25. P. d. B. Damgaard, N. Marchi, S. Rasmussen, M. Peyrot, G. Renaud, T. Korneliussen, J. V. Moreno-Mayar, M. W. Pedersen, A. Goldberg, E. Usmanova et al., “137 ancient human genomes from across the eurasian steppes,” Nature, vol. 557, no. 7705, 2018.
  26. A. Harrod, C.-F. Lai, I. Goldsbrough, G. M. Simmons, N. Oppermans, D. B. Santos, B. Győrffy, R. C. Allsopp, B. J. Toghill, K. Balachandran et al., “Genome engineering for estrogen receptor mutations reveals differential responses to anti-estrogens and new prognostic gene signatures for breast cancer,” Oncogene, vol. 41, no. 44, 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets