Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 152 tok/s Pro
GPT OSS 120B 325 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Ponder: Online Prediction of Task Memory Requirements for Scientific Workflows (2408.00047v2)

Published 31 Jul 2024 in cs.DC

Abstract: Scientific workflows are used to analyze large amounts of data. These workflows comprise numerous tasks, many of which are executed repeatedly, running the same custom program on different inputs. Users specify resource allocations for each task, which must be sufficient for all inputs to prevent task failures. As a result, task memory allocations tend to be overly conservative, wasting precious cluster resources, limiting overall parallelism, and increasing workflow makespan. In this paper, we first benchmark a state-of-the-art method on four real-life workflows from the nf-core workflow repository. This analysis reveals that certain assumptions underlying current prediction methods, which typically were evaluated only on simulated workflows, cannot generally be confirmed for real workflows and executions. We then present Ponder, a new online task-sizing strategy that considers and chooses between different methods to cater to different memory demand patterns. We implemented Ponder for Nextflow and made the code publicly available. In an experimental evaluation that also considers the impact of memory predictions on scheduling, Ponder improves Memory Allocation Quality on average by 71.0% and makespan by 21.8% in comparison to a state-of-the-art method. Moreover, Ponder produces 93.8% fewer task failures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. R. Wilkinson, M. Mleczko, R. Brewin, K. Gaston, M. Mueller, J. Shutler, X. Yan, and K. Anderson, “Environmental Impacts of Earth Observation Data in the Constellation and Cloud Computing Era,” Science of The Total Environment, vol. 909, 2024.
  2. D. Phiri, M. Simwanda, S. Salekin, V. Nyirenda, Y. Murayama, and M. Ranagalage, “Sentinel-2 Data for Land Cover/Use Mapping: A Review,” Remote Sensing, vol. 12, no. 14, 2020.
  3. A. Magi, M. Benelli, A. Gozzini, F. Girolami, F. Torricelli, and M. L. Brandi, “Bioinformatics for Next Generation Sequencing Data,” Genes, vol. 1, no. 2, 2010.
  4. P. A. Ewels, A. Peltzer, S. Fillinger, H. Patel, J. Alneberg, A. Wilm, M. U. Garcia, P. Di Tommaso, and S. Nahnsen, “The nf-core framework for community-curated bioinformatics pipelines,” Nature Biotechnology, 2020.
  5. F. Ahmad and W. Ahmad, “An Efficient Astronomical Image Processing Technique Using Advance Dynamic Workflow Scheduler in Cloud Environment,” International Journal of Information Technology, vol. 14, no. 6, 2022.
  6. G. B. Berriman and J. Good, “Terapixel Scale Processing of Astronomical Images with Montage,” in American Astronomical Society Meeting Abstracts, vol. 54.   Bull. AAS, 2022.
  7. R. da Silva, R. Filgueira, I. Pietri, M. Jiang, R. Sakellariou, and E. Deelman, “A Characterization of Workflow Management Systems for Extreme-Scale Applications,” FGCS, 2017.
  8. F. Lehmann, J. Bader, F. Tschirpke, L. Thamsen, and U. Leser, “How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface,” in CCGRID’23, 2023.
  9. C. Witt, J. van Santen, and U. Leser, “Learning Low-Wastage Memory Allocations for Scientific Workflows at IceCube,” in HPCS’19, 2019.
  10. M. Bux, J. Brandt, C. Witt, J. Dowling, and U. Leser, “Hi-WAY: Execution of Scientific Workflows on Hadoop YARN,” in EDBT’17, 2017.
  11. B. Tovar, R. F. Da Silva, G. Juve, E. Deelman, W. Allcock, D. Thain, and M. Livny, “A Job Sizing Strategy for High-Throughput Scientific Workflows,” TPDS, vol. 29, no. 2, 2018.
  12. C. Witt, D. Wagner, and U. Leser, “Feedback-Based Resource Allocation for Batch Scheduling of Scientific Workflows,” in HPCS’19, 2019.
  13. J. Bader, N. Zunker, S. Becker, and O. Kao, “Leveraging Reinforcement Learning for Task Resource Allocation in Scientific Workflows,” in Big Data’22, 2022.
  14. J. Bader, N. Diedrich, L. Thamsen, and O. Kao, “Predicting Dynamic Memory Requirements for Scientific Workflow Tasks,” in Big Data’23, 2023.
  15. P. Di Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, and C. Notredame, “Nextflow Enables Reproducible Computational Workflows,” Nature Biotechnology, vol. 35, no. 4, 2017.
  16. M. Bailes, A. Jameson, F. Abbate, E. D. Barr, N. D. R. Bhat, L. Bondonneau, M. Burgay, S. J. Buchner, F. Camilo, D. J. Champion, and et al., “The MeerKAT Telescope as a Pulsar Facility: System verification and early science results from MeerTime,” Publications of the Astronomical Society of Australia, vol. 37, 2020.
  17. F. Lehmann, D. Frantz, S. Becker, U. Leser, and P. Hostert, “FORCE on Nextflow: Scalable Analysis of Earth Observation data on Commodity Clusters,” in CIKM 2021 Workshops, ser. CEUR Workshop Proc., vol. 3052, 2021.
  18. F. Hanssen, M. U. Garcia, L. Folkersen, A. S. Pedersen, F. Lescai, S. Jodoin, E. Miller, O. Wacker, N. Smith, nf-core community, G. Gabernet, and S. Nahnsen, “Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery,” NAR Genomics and Bioinformatics, vol. 6, no. 2, 2024.
  19. S. Krakau, D. Straub, H. Gourlé, G. Gabernet, and S. Nahnsen, “nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning,” NAR Genomics and Bioinformatics, vol. 4, no. 1, 2022.
  20. J. L. Green, R. E. Osterhout, A. L. Klova, C. Merkwirth, S. R. P. McDonnell, R. B. Zavareh, B. C. Fuchs, A. Kamal, and J. S. Jakobsen, “Molecular characterization of type I IFN-induced cytotoxicity in bladder cancer cells reveals biomarkers of resistance,” Molecular Therapy Oncolytics, vol. 23, 2021.
  21. J. Jeong, K. Kwon, T. K. Geisseova, J. Lee, T. Kwon, and C. Lim, “Drosulfakinin signaling encodes early-life memory for adaptive social plasticity,” bioRxiv, 2024.
  22. A. Harrod, C. F. Lai, I. Goldsbrough et al., “Genome engineering for estrogen receptor mutations reveals differential responses to anti-estrogens and new prognostic gene signatures for breast cancer,” Oncogene, vol. 41, 2022.
  23. M. J. Tisza and C. B. Buck, “A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases,” PNAS, vol. 118, no. 23, 2021.
  24. D. Frantz, P. Hostert, P. Rufin, S. Ernst, A. Röder, and S. van der Linden, “Revisiting the Past: Replicability of a Historic Long-Term Vegetation Dynamics Assessment in the Era of Big Data Analytics,” Remote Sensing, vol. 14, no. 3, 2022.
  25. D. Frantz, “FORCE—Landsat + Sentinel-2 Analysis Ready Data and Beyond,” Remote Sensing, vol. 11, no. 9, 2019.
  26. B. Tovar, B. Lyons, K. Mohrman, B. Sly-Delgado, K. Lannon, and D. Thain, “Dynamic Task Shaping for High Throughput Data Analysis Applications in High Energy Physics,” in IPDPS’22, 2022.
  27. M. Kumar, S. C. Sharma, A. Goel, and S. P. Singh, “A comprehensive survey for scheduling techniques in cloud computing,” JNCA, vol. 143, 2019.
  28. H. Al-Sayeh, M. A. Jibril, B. Memishi, and K.-U. Sattler, “Blink: Lightweight Sample Runs for Cost Optimization of Big Data Applications,” in New Trends in Database and Information Systems.   Springer International Publishing, 2022.
  29. R. Myung, “Machine-Learning Based Memory Prediction Model for Data Parallel Workloads in Apache Spark,” Symmetry, vol. 13, no. 4, 2021.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 0 likes.