Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Software Resource Disaggregation for HPC with Serverless Computing (2401.10852v5)

Published 19 Jan 2024 in cs.DC

Abstract: Aggregated HPC resources have rigid allocation systems and programming models which struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to efficiently use the large pools of unused memory and increase the utilization of idle computing resources. Prior work attempted to increase the throughput and efficiency of supercomputing systems through workload co-location and resource disaggregation. However, these methods fall short of providing a solution that can be applied to existing systems without major hardware modifications and performance losses. In this paper, we improve the utilization of supercomputers by employing the new cloud paradigm of serverless computing. We show how serverless functions provide fine-grained access to the resources of batch-managed cluster nodes. We present an HPC-oriented Function-as-a-Service (FaaS) that satisfies the requirements of high-performance applications. We demonstrate a software resource disaggregation approach where placing functions on unallocated and underutilized nodes allows idle cores and accelerators to be utilized while retaining near-native performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (135)
  1. A. Khan, H. Sim, S. S. Vazhkudai, A. R. Butt, and Y. Kim, “An analysis of system balance and architectural trends based on top500 supercomputers,” in The International Conference on High Performance Computing in Asia-Pacific Region, ser. HPC Asia 2021.   New York, NY, USA: Association for Computing Machinery, 2021, p. 11–22. [Online]. Available: https://doi.org/10.1145/3432261.3432263
  2. J. P. Jones and B. Nitzberg, “Scheduling for parallel supercomputing: A historical perspective of achievable utilization,” in Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 1999, pp. 1–16.
  3. H. You and H. Zhang, “Comprehensive workload analysis and modeling of a petascale supercomputer,” in Job Scheduling Strategies for Parallel Processing, W. Cirne, N. Desai, E. Frachtenberg, and U. Schwiegelshohn, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 253–271.
  4. T. Patel, Z. Liu, R. Kettimuthu, P. Rich, W. Allcock, and D. Tiwari, “Job characteristics on large-scale systems: Long-term analysis, quantification, and implications,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’20.   IEEE Press, 2020.
  5. M. D. Jones, J. P. White, M. Innus, R. L. DeLeon, N. Simakov, J. T. Palmer, S. M. Gallo, T. R. Furlani, M. T. Showerman, R. Brunner, A. Kot, G. H. Bauer, B. M. Bode, J. Enos, and W. T. Kramer, “Workload analysis of blue waters,” CoRR, vol. abs/1703.00924, 2017. [Online]. Available: http://arxiv.org/abs/1703.00924
  6. A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely, “The case for colocation of high performance computing workloads,” Concurrency and Computation: Practice and Experience, vol. 28, no. 2, pp. 232–251, 2016. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3187
  7. D. G. Feitelson and L. Rudolph, “Toward convergence in job schedulers for parallel supercomputers,” in Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 1996, pp. 1–26.
  8. A. Mo-Hellenbrand, I. Comprés, O. Meister, H.-J. Bungartz, M. Gerndt, and M. Bader, “A large-scale malleable tsunami simulation realized on an elastic mpi infrastructure,” in Proceedings of the Computing Frontiers Conference, ser. CF’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 271–274. [Online]. Available: https://doi.org/10.1145/3075564.3075585
  9. S. Iserte, R. Mayo, E. S. Quintana-Ortí, V. Beltran, and A. J. Peña, “Efficient scalable computing through flexible applications and adaptive workloads,” in 2017 46th International Conference on Parallel Processing Workshops (ICPPW), 2017, pp. 180–189.
  10. G. Michelogiannakis, B. Klenk, B. Cook, M. Y. Teh, M. Glick, L. Dennison, K. Bergman, and J. Shalf, “A case for intra-rack resource disaggregation in hpc,” ACM Trans. Archit. Code Optim., jan 2022, just Accepted. [Online]. Available: https://doi.org/10.1145/3514245
  11. K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch, “Disaggregated memory for expansion and sharing in blade servers,” in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA ’09.   New York, NY, USA: Association for Computing Machinery, 2009, p. 267–278. [Online]. Available: https://doi.org/10.1145/1555754.1555789
  12. C. Pinto, D. Syrivelis, M. Gazzetti, P. Koutsovasilis, A. Reale, K. Katrinis, and H. P. Hofstee, “Thymesisflow: A software-defined, hw/sw co-designed interconnect stack for rack-scale memory disaggregation,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 868–880.
  13. M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei, “Remote memory in the age of fast networks,” in Proceedings of the 2017 Symposium on Cloud Computing, ser. SoCC ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 121–127. [Online]. Available: https://doi.org/10.1145/3127479.3131612
  14. C. Iancu, S. Hofmeyr, F. Blagojević, and Y. Zheng, “Oversubscription on multicore processors,” in 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), 2010, pp. 1–11.
  15. M. J. Koop, M. Luo, and D. K. Panda, “Reducing network contention with mixed workloads on modern multicore, clusters,” in 2009 IEEE International Conference on Cluster Computing and Workshops, 2009, pp. 1–10.
  16. T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud, and J. Pei, “A practical method for estimating performance degradation on multicore processors, and its application to hpc workloads,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’12.   Washington, DC, USA: IEEE Computer Society Press, 2012.
  17. M. Kambadur, T. Moseley, R. Hank, and M. A. Kim, “Measuring interference between live datacenter applications,” in SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1–12.
  18. J. Weinberg and A. Snavely, “Symbiotic space-sharing on sdsc’s datastar system,” in Job Scheduling Strategies for Parallel Processing, E. Frachtenberg and U. Schwiegelshohn, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 192–209.
  19. C. Antonopoulos, D. Nikolopoulos, and T. Papatheodorou, “Scheduling algorithms with bus bandwidth considerations for smps,” in 2003 International Conference on Parallel Processing, 2003. Proceedings., 2003, pp. 547–554.
  20. M. Copik, K. Taranov, A. Calotoiu, and T. Hoefler, “rFaaS: Enabling High Performance Serverless with RDMA and Leases,” in 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS).   Los Alamitos, CA, USA: IEEE Computer Society, may 2023, pp. 897–907. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/IPDPS54959.2023.00094
  21. B. Przybylski, M. Pawlik, P. Żuk, B. Lagosz, M. Malawski, and K. Rzadca, “Using unused: Non-invasive dynamic faas infrastructure with hpc-whisk,” in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, pp. 1–15.
  22. I. Peng, R. Pearce, and M. Gokhale, “On the memory underutilization: Exploring disaggregated memory on hpc systems,” in 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2020, pp. 183–190.
  23. G. Panwar, D. Zhang, Y. Pang, M. Dahshan, N. DeBardeleben, B. Ravindran, and X. Jian, “Quantifying memory underutilization in hpc systems and using it to improve performance via architecture support,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’52.   New York, NY, USA: Association for Computing Machinery, 2019, p. 821–835. [Online]. Available: https://doi.org/10.1145/3352460.3358267
  24. D. Zivanovic, M. Pavlovic, M. Radulovic, H. Shin, J. Son, S. A. Mckee, P. M. Carpenter, P. Radojković, and E. Ayguadé, “Main memory in hpc: Do we need more or could we live with less?” ACM Trans. Archit. Code Optim., vol. 14, no. 1, mar 2017. [Online]. Available: https://doi.org/10.1145/3023362
  25. L. A. Barroso, U. Hölzle, and P. Ranganathan, “The datacenter as a computer: Designing warehouse-scale machines,” Synthesis Lectures on Computer Architecture, vol. 13, no. 3, pp. i–189, 2018.
  26. “TOP500, November 2021,” https://www.top500.org/lists/top500/2021/11/, 2021, accessed: 2022-03-10.
  27. F. Wang, S. Oral, S. Sen, and N. Imam, “Learning from five-year resource-utilization data of titan system,” in 2019 IEEE International Conference on Cluster Computing (CLUSTER), 2019, pp. 1–6.
  28. A. Dragojević, D. Narayanan, M. Castro, and O. Hodson, “FaRM: Fast remote memory,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14).   Seattle, WA: USENIX Association, Apr. 2014, pp. 401–414. [Online]. Available: https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevic
  29. J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin, “Efficient memory disaggregation with infiniswap,” in 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17).   Boston, MA: USENIX Association, Mar. 2017, pp. 649–667. [Online]. Available: https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/gu
  30. P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker, “Network requirements for resource disaggregation,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).   Savannah, GA: USENIX Association, Nov. 2016, pp. 249–264. [Online]. Available: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/gao
  31. S. Liang, R. Noronha, and D. K. Panda, “Swapping to remote memory over infiniband: An approach using a high performance network block device,” in 2005 IEEE International Conference on Cluster Computing, 2005, pp. 1–10.
  32. J. P. White, R. L. DeLeon, T. R. Furlani, S. M. Gallo, M. D. Jones, A. Ghadersohi, C. D. Cornelius, A. K. Patra, J. C. Browne, W. L. Barth, and J. Hammond, “An analysis of node sharing on hpc clusters using xdmod/tacc_stats,” in Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, ser. XSEDE ’14.   New York, NY, USA: Association for Computing Machinery, 2014. [Online]. Available: https://doi.org/10.1145/2616498.2616533
  33. N. A. Simakov, R. L. DeLeon, J. P. White, T. R. Furlani, M. Innus, S. M. Gallo, M. D. Jones, A. Patra, B. D. Plessinger, J. Sperhac, T. Yearke, R. Rathsam, and J. T. Palmer, “A quantitative analysis of node sharing on hpc clusters using xdmod application kernels,” in Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, ser. XSEDE16.   New York, NY, USA: Association for Computing Machinery, 2016. [Online]. Available: https://doi.org/10.1145/2949550.2949553
  34. A. Snavely, D. M. Tullsen, and G. Voelker, “Symbiotic jobscheduling with priorities for a simultaneous multithreading processor,” in Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS ’02.   New York, NY, USA: Association for Computing Machinery, 2002, p. 66–76. [Online]. Available: https://doi.org/10.1145/511334.511343
  35. J. Liedtke, M. Völp, and K. Elphinstone, “Preliminary thoughts on memory-bus scheduling,” in Proceedings of the 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System, ser. EW 9.   New York, NY, USA: Association for Computing Machinery, 2000, p. 207–210. [Online]. Available: https://doi.org/10.1145/566726.566768
  36. E. Koukis and N. Koziris, “Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps,” in 12th International Conference on Parallel and Distributed Systems - (ICPADS’06), vol. 1, 2006, pp. 10 pp.–.
  37. D. Xu, C. Wu, and P.-C. Yew, “On mitigating memory bandwidth contention through bandwidth-aware scheduling,” in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT ’10.   New York, NY, USA: Association for Computing Machinery, 2010, p. 237–248. [Online]. Available: https://doi.org/10.1145/1854273.1854306
  38. J. Weinberg and A. Snavely, “User-guided symbiotic space-sharing of real workloads,” in Proceedings of the 20th Annual International Conference on Supercomputing, ser. ICS ’06.   New York, NY, USA: Association for Computing Machinery, 2006, p. 345–352. [Online]. Available: https://doi.org/10.1145/1183401.1183450
  39. L. Tang, J. Mars, W. Wang, T. Dey, and M. L. Soffa, “Reqos: Reactive static/dynamic compilation for qos in warehouse scale computers,” SIGPLAN Not., vol. 48, no. 4, p. 89–100, mar 2013. [Online]. Available: https://doi.org/10.1145/2499368.2451126
  40. A. D. Breslow, A. Tiwari, M. Schulz, L. Carrington, L. Tang, and J. Mars, “Enabling fair pricing on hpc systems with node sharing,” in SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–12.
  41. M. Shahrad, R. Fonseca, I. Goiri, G. Chaudhry, P. Batum, J. Cooke, E. Laureano, C. Tresness, M. Russinovich, and R. Bianchini, “Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider,” in 2020 USENIX Annual Technical Conference (USENIX ATC 20).   USENIX Association, Jul. 2020, pp. 205–218. [Online]. Available: https://www.usenix.org/conference/atc20/presentation/shahrad
  42. R. D. Hornung, J. A. Keasler, and M. B. Gokhale, “Hydrodynamics challenge problem,” 6 2011. [Online]. Available: https://www.osti.gov/biblio/1117905
  43. M. Copik, A. Calotoiu, T. Grosser, N. Wicki, F. Wolf, and T. Hoefler, “Extracting clean performance models from tainted programs,” in Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 403–417. [Online]. Available: https://doi.org/10.1145/3437801.3441613
  44. I. Comprés, A. Mo-Hellenbrand, M. Gerndt, and H.-J. Bungartz, “Infrastructure and api extensions for elastic execution of mpi applications,” in Proceedings of the 23rd European MPI Users’ Group Meeting, ser. EuroMPI 2016.   New York, NY, USA: Association for Computing Machinery, 2016, p. 82–97. [Online]. Available: https://doi.org/10.1145/2966884.2966917
  45. J. Duato, A. J. Pena, F. Silla, R. Mayo, and E. S. Quintana-Ortí, “rcuda: Reducing the number of gpu-based accelerators in high performance clusters,” in 2010 International Conference on High Performance Computing & Simulation.   IEEE, 2010, pp. 224–231.
  46. L. Tobler, “Gpuless – serverless gpu functions,” 2022. [Online]. Available: https://spcl.inf.ethz.ch/Publications/.pdf/tobler-gpu-thesis.pdf
  47. (2019) Slurm generic resource (gres) scheduling. https://slurm.schedmd.com/gres.html. Accessed: 2020-01-20.
  48. A. Nayak, P. B., V. Ganapathy, and A. Basu, “(mis)managed: A novel tlb-based covert channel on gpus,” in Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ser. ASIA CCS ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 872–885. [Online]. Available: https://doi.org/10.1145/3433210.3453077
  49. G. Gilman and R. J. Walls, “Characterizing concurrency mechanisms for nvidia gpus under deep learning workloads,” Performance Evaluation, vol. 151, p. 102234, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0166531621000511
  50. C. Delimitrou and C. Kozyrakis, “Qos-aware scheduling in heterogeneous datacenters with paragon,” ACM Trans. Comput. Syst., vol. 31, no. 4, dec 2013. [Online]. Available: https://doi.org/10.1145/2556583
  51. Z. Zhao, N. J. Wright, and K. Antypas, “Effects of hyper-threading on the nersc workload on edison,” Cray User Group CUG (May 2013), 2013. [Online]. Available: https://cug.org/proceedings/cug2013_proceedings/includes/files/pap106.pdf
  52. K. Antypas, B. Austin, T. Butler, R. Gerber, C. Whitney, N. Wright, W.-S. Yang, and Z. Zhao, “Nersc workload analysis on hopper,” Lawrence Berkeley National Laboratory Technical Report, vol. 6804, p. 15, 2013. [Online]. Available: https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Documents/NERSCWorkloadAnalysisFeb2013.pdf
  53. H. Yu, C. Fontenot, H. Wang, J. Li, X. Yuan, and S.-J. Park, “Libra: Harvesting idle resources safely and timely in serverless clusters,” in Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 181–194. [Online]. Available: https://doi.org/10.1145/3588195.3592996
  54. A. Calotoiu, A. Graf, T. Hoefler, D. Lorenz, S. Rinke, and F. Wolf, “Lightweight requirements engineering for exascale co-design,” in 2018 IEEE International Conference on Cluster Computing (CLUSTER), 2018, pp. 201–211.
  55. A. Shah, M. Müller, and F. Wolf, “Estimating the impact of external interference on application performance,” in Euro-Par 2018: Parallel Processing, M. Aldinucci, L. Padovani, and M. Torquati, Eds.   Cham: Springer International Publishing, 2018, pp. 46–58.
  56. T. Hoefler, T. Schneider, and A. Lumsdaine, “Characterizing the influence of system noise on large-scale applications by simulation,” in SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.   IEEE, 2010, pp. 1–11.
  57. ——, “The impact of network noise at large-scale communication performance,” in 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009, pp. 1–8.
  58. D. De Sensi, T. De Matteis, K. Taranov, S. Di Girolamo, T. Rahn, and T. Hoefler, “Noise in the clouds: Influence of network performance variability on application scalability,” Proc. ACM Meas. Anal. Comput. Syst., vol. 6, no. 3, dec 2022. [Online]. Available: https://doi.org/10.1145/3570609
  59. A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs, “There goes the neighborhood: Performance degradation due to nearby jobs,” in SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–12.
  60. X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan, “Watch out for the bully! job interference study on dragonfly network,” in SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, pp. 750–760.
  61. “Piz Daint,” https://www.cscs.ch/computers/piz-daint/, 2021, accessed: 2020-01-20.
  62. M. Copik, G. Kwasniewski, M. Besta, M. Podstawski, and T. Hoefler, “Sebs: A serverless benchmark suite for function-as-a-service computing,” in Proceedings of the 22nd International Middleware Conference, ser. Middleware ’21.   Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3464298.3476133
  63. H. Pritchard, E. Harvey, S.-E. Choi, J. Swaro, and Z. Tiffany, “The gni provider layer for ofi libfabric,” in Proceedings of Cray User Group Meeting, CUG, vol. 2016, 2016.
  64. A. Madonna and T. Aliaga, “Libfa bric-based injection solutions for portable containerized mpi applications,” in 2022 IEEE/ACM 4th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC), 2022, pp. 45–56.
  65. J. Shimek, J. Swaro, and M. Saint Paul, “Dynamic rdma credentials,” in Cray User Group (CUG) Meeting, 2016.
  66. J. Manner, M. Endreß, T. Heckel, and G. Wirtz, “Cold start influencing factors in function as a service,” in 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion).   IEEE, 2018, pp. 181–188.
  67. P. Silva, D. Fireman, and T. E. Pereira, “Prebaking functions to warm the serverless cold start,” in Proceedings of the 21st International Middleware Conference, ser. Middleware ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 1–13. [Online]. Available: https://doi.org/10.1145/3423211.3425682
  68. E. Oakes, L. Yang, D. Zhou, K. Houck, T. Harter, A. Arpaci-Dusseau, and R. Arpaci-Dusseau, “SOCK: Rapid task provisioning with serverless-optimized containers,” in 2018 USENIX Annual Technical Conference (USENIX ATC 18).   Boston, MA: USENIX Association, Jul. 2018, pp. 57–70. [Online]. Available: https://www.usenix.org/conference/atc18/presentation/oakes
  69. A. Agache, M. Brooker, A. Iordache, A. Liguori, R. Neugebauer, P. Piwonka, and D.-M. Popa, “Firecracker: Lightweight virtualization for serverless applications,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20).   Santa Clara, CA: USENIX Association, Feb. 2020, pp. 419–434. [Online]. Available: https://www.usenix.org/conference/nsdi20/presentation/agache
  70. D. Du, T. Yu, Y. Xia, B. Zang, G. Yan, C. Qin, Q. Wu, and H. Chen, “Catalyzer: Sub-millisecond startup for serverless computing with initialization-less booting,” in Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 467–481. [Online]. Available: https://doi.org/10.1145/3373376.3378512
  71. G. M. Kurtzer, V. Sochat, and M. W. Bauer, “Singularity: Scientific containers for mobility of compute,” PLOS ONE, vol. 12, no. 5, p. e0177459, May 2017. [Online]. Available: https://doi.org/10.1371/journal.pone.0177459
  72. L. Benedicic, F. A. Cruz, A. Madonna, and K. Mariotti, “Sarus: Highly scalable docker containers for hpc systems,” in International Conference on High Performance Computing.   Springer, 2019, pp. 46–60.
  73. B. Behzad, S. Byna, Prabhat, and M. Snir, “Optimizing i/o performance of hpc applications with autotuning,” ACM Trans. Parallel Comput., vol. 5, no. 4, mar 2019. [Online]. Available: https://doi.org/10.1145/3309205
  74. “AWS API Pricing,” https://aws.amazon.com/api-gateway/pricing/, 2020, accessed: 2020-08-20.
  75. “Google Cloud Functions Pricing,” https://cloud.google.com/functions/pricing, 2020, accessed: 2020-08-20.
  76. D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, “Logp: Towards a realistic model of parallel computation,” in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPOPP ’93.   New York, NY, USA: Association for Computing Machinery, 1993, p. 1–12. [Online]. Available: https://doi.org/10.1145/155332.155333
  77. T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm, “LogfP - A Model for small Messages in InfiniBand,” in Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS), PMEO-PDS’06 Workshop, Apr. 2006.
  78. A. Heinecke, S. Schraufstetter, and H.-J. Bungartz, “A highly parallel black–scholes solver based on adaptive sparse grids,” Int. J. Comput. Math., vol. 89, no. 9, p. 1212–1238, Jun. 2012. [Online]. Available: https://doi.org/10.1080/00207160.2012.690865
  79. A. Calotoiu, T. Hoefler, M. Poke, and F. Wolf, “Using automated performance modeling to find scalability bugs in complex codes,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’13.   New York, NY, USA: Association for Computing Machinery, 2013. [Online]. Available: https://doi.org/10.1145/2503210.2503277
  80. S. Shudler, A. Calotoiu, T. Hoefler, and F. Wolf, “Isoefficiency in practice: Configuring and understanding the performance of task-based applications,” in Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 131–143. [Online]. Available: https://doi.org/10.1145/3018743.3018770
  81. M. Copik, T. Grosser, T. Hoefler, P. Bientinesi, and B. Berkels, “Work-stealing prefix scan: Addressing load imbalance in large-scale image registration,” IEEE Transactions on Parallel & Distributed Systems, vol. 33, no. 03, pp. 523–535, mar 2022.
  82. M. Chadha, J. John, and M. Gerndt, “Extending slurm for dynamic resource-aware adaptive batch scheduling,” in 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2020, pp. 223–232.
  83. M. Copik, R. Böhringer, A. Calotoiu, and T. Hoefler, “Fmi: Fast and cheap message passing for serverless functions,” in Proceedings of the 37th International Conference on Supercomputing, ser. ICS ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 373–385. [Online]. Available: https://doi.org/10.1145/3577193.3593718
  84. “MinIO Object Storage,” min.io, 2019, accessed: 2023-10-05.
  85. C. Bernard, M. C. Ogilvie, T. A. DeGrand, C. E. DeTar, S. A. Gottlieb, A. Krasnitz, R. L. Sugar, and D. Toussaint, “Studying quarks and gluons on mimd parallel computers,” The International Journal of Supercomputing Applications, vol. 5, no. 4, pp. 61–70, 1991.
  86. H.-Q. Jin, M. Frumkin, and J. Yan, “The openmp implementation of nas parallel benchmarks and its performance,” 1999. [Online]. Available: https://ntrs.nasa.gov/api/citations/20000102377/downloads/20000102377.pdf
  87. S. Seo, G. Jo, and J. Lee, “Performance characterization of the nas parallel benchmarks in opencl,” in 2011 IEEE International Symposium on Workload Characterization (IISWC), 2011, pp. 137–148.
  88. F. Wong, R. Martin, R. Arpaci-Dusseau, and D. Culler, “Architectural requirements and scalability of the nas parallel benchmarks,” in SC ’99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, 1999, pp. 41–41.
  89. H. Shan, F. Blagojević, S.-J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright, “A programming model performance study using the nas parallel benchmarks,” Scientific Programming, vol. 18, p. 715637, Jan 1900. [Online]. Available: https://doi.org/10.3233/SPR-2010-0306
  90. J. Löff, D. Griebler, G. Mencagli, G. Araujo, M. Torquati, M. Danelutto, and L. G. Fernandes, “The nas parallel benchmarks for evaluating c++ parallel programming frameworks on shared-memory architectures,” Future Generation Computer Systems, vol. 125, pp. 743–757, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X21002831
  91. A. Faraj and X. Yuan, “Communication characteristics in the NAS parallel benchmarks,” in International Conference on Parallel and Distributed Computing Systems, PDCS 2002, November 4-6, 2002, Cambridge, USA, S. G. Akl and T. F. Gonzalez, Eds.   IASTED/ACTA Press, 2002, pp. 724–729.
  92. A. Bauer, H. Pan, R. Chard, Y. Babuji, J. Bryan, D. Tiwari, I. Foster, and K. Chard, “The globus compute dataset: An open function-as-a-service dataset from the edge to the cloud,” Future Generation Computer Systems, vol. 153, pp. 558–574, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X23004703
  93. J. Carter, Y. He, J. Shalf, H. Shan, E. Strohmaier, and H. Wasserman, “The performance effect of multi-core on scientific applications,” 5 2007. [Online]. Available: https://www.osti.gov/biblio/923361
  94. G. Bauer, S. Gottlieb, and T. Hoefler, “Performance modeling and comparative analysis of the milc lattice qcd application su3_rmd,” in 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012, pp. 652–659.
  95. K. Antypas, J. Shalf, and H. Wasserman, “Nersc-6 workload analysis and benchmark selection process,” Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), Tech. Rep., 2008. [Online]. Available: https://www.nersc.gov/assets/pubs_presos/NERSCWorkload.pdf
  96. Y. Wang, J. D. McCalpin, J. Li, M. Cawood, J. Cazes, H. Chen, L. Koesterke, H. Liu, C.-Y. Lu, R. McLay, K. Milfield, A. Ruhela, D. Semeraro, and W. Zhang, “Application performance analysis: A report on the impact of memory bandwidth,” in High Performance Computing, A. Bienz, M. Weiland, M. Baboulin, and C. Kruse, Eds.   Cham: Springer Nature Switzerland, 2023, pp. 339–352.
  97. S. Smith, D. Lowenthal, A. Bhatele, J. Thiagarajan, P. Bremer, and Y. Livnat, “Analyzing inter-job contention in dragonfly networks,” 2016. [Online]. Available: https://www2.cs.arizona.edu/~smiths949/dragonfly.pdf
  98. Y. Zhang, T. Groves, B. Cook, N. J. Wright, and A. K. Coskun, “Quantifying the impact of network congestion on application performance and network metrics,” in 2020 IEEE International Conference on Cluster Computing (CLUSTER), 2020, pp. 162–168.
  99. A. Patke, S. Jha, H. Qiu, J. Brandt, A. Gentile, J. Greenseid, Z. Kalbarczyk, and R. K. Iyer, “Delay sensitivity-driven congestion mitigation for hpc systems,” in Proceedings of the ACM International Conference on Supercomputing, ser. ICS ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 342–353. [Online]. Available: https://doi.org/10.1145/3447818.3460362
  100. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in 2009 IEEE International Symposium on Workload Characterization (IISWC), 2009, pp. 44–54.
  101. P. K. Romano, N. E. Horelik, B. R. Herman, A. G. Nelson, B. Forget, and K. Smith, “Openmc: A state-of-the-art monte carlo code for research and development,” Annals of Nuclear Energy, vol. 82, pp. 90–97, 2015, joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo 2013, SNA + MC 2013. Pluri- and Trans-disciplinarity, Towards New Modeling and Numerical Simulation Paradigms. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S030645491400379X
  102. L. M. et al, “Monte carlo reactor calculation with substantially reduced number of cycles,” in Proceedings of PHYSOR 2012, 2012.
  103. R. Chard, Y. Babuji, Z. Li, T. Skluzacek, A. Woodard, B. Blaiszik, I. Foster, and K. Chard, “Funcx: A federated function serving fabric for science,” in Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 65–76. [Online]. Available: https://doi.org/10.1145/3369583.3392683
  104. R. B. Roy, T. Patel, and D. Tiwari, “Daydream: Executing dynamic scientific workflows on serverless platforms with hot starts,” in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, pp. 1–18.
  105. R. B. Roy, T. Patel, V. Gadepally, and D. Tiwari, “Mashup: Making serverless computing useful for hpc workflows via hybrid execution,” in Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 46–60. [Online]. Available: https://doi.org/10.1145/3503221.3508407
  106. A. Frank, T. Süß, and A. Brinkmann, “Effects and benefits of node sharing strategies in hpc batch systems,” in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).   IEEE, 2019, pp. 43–53.
  107. X. Tang, H. Wang, X. Ma, N. El-Sayed, J. Zhai, W. Chen, and A. Aboulnaga, “Spread-n-share: improving application performance and cluster throughput with resource-aware job placement,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–15.
  108. J. Park, S. Park, and W. Baek, “Copart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers,” in Proceedings of the Fourteenth EuroSys Conference 2019, 2019, pp. 1–16.
  109. P. Ambati, I. Goiri, F. Frujeri, A. Gun, K. Wang, B. Dolan, B. Corell, S. Pasupuleti, T. Moscibroda, S. Elnikety, M. Fontoura, and R. Bianchini, “Providing SLOs for Resource-Harvesting VMs in cloud platforms,” in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20).   USENIX Association, Nov. 2020, pp. 735–751. [Online]. Available: https://www.usenix.org/conference/osdi20/presentation/ambati
  110. A. Fuerst, S. Novaković, I. n. Goiri, G. I. Chaudhry, P. Sharma, K. Arya, K. Broas, E. Bak, M. Iyigun, and R. Bianchini, “Memory-harvesting vms in cloud platforms,” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 583–594. [Online]. Available: https://doi.org/10.1145/3503222.3507725
  111. Y. Zhang, I. n. Goiri, G. I. Chaudhry, R. Fonseca, S. Elnikety, C. Delimitrou, and R. Bianchini, “Faster and cheaper serverless computing on harvested resources,” in Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, ser. SOSP ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 724–739. [Online]. Available: https://doi.org/10.1145/3477132.3483580
  112. H. Yu, H. Wang, J. Li, X. Yuan, and S.-J. Park, “Accelerating serverless computing by harvesting idle resources,” in Proceedings of the ACM Web Conference 2022, ser. WWW ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 1741–1751. [Online]. Available: https://doi.org/10.1145/3485447.3511979
  113. A. Raveendran, T. Bicer, and G. Agrawal, “A framework for elastic execution of existing mpi programs,” in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011, pp. 940–947.
  114. G. Martín, D. E. Singh, M.-C. Marinescu, and J. Carretero, “Enhancing the performance of malleable mpi applications by using performance-aware dynamic reconfiguration,” Parallel Computing, vol. 46, pp. 60–77, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167819115000642
  115. C. Huang, G. Zheng, L. Kalé, and S. Kumar, “Performance evaluation of adaptive mpi,” in Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’06.   New York, NY, USA: Association for Computing Machinery, 2006, p. 12–21. [Online]. Available: https://doi.org/10.1145/1122971.1122976
  116. I. Cores, P. González, E. Jeannot, M. J. Martín, and G. Rodríguez, “An application-level solution for the dynamic reconfiguration of mpi applications,” in High Performance Computing for Computational Science – VECPAR 2016, I. Dutra, R. Camacho, J. Barbosa, and O. Marques, Eds.   Cham: Springer International Publishing, 2017, pp. 191–205.
  117. K. El Maghraoui, B. K. Szymanski, and C. Varela, “An architecture for reconfigurable iterative mpi applications in dynamic environments,” in Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, N. Meyer, and J. Waśniewski, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 258–271.
  118. K. El Maghraoui, T. J. Desell, B. K. Szymanski, and C. A. Varela, “Dynamic malleability in iterative mpi applications,” in Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid ’07), 2007, pp. 591–598.
  119. S. Prabhakaran, M. Iqbal, S. Rinke, C. Windisch, and F. Wolf, “A batch system with fair scheduling for evolving applications,” in 2014 43rd International Conference on Parallel Processing, 2014, pp. 351–360.
  120. S. Prabhakaran, M. Neumann, S. Rinke, F. Wolf, A. Gupta, and L. V. Kale, “A batch system with efficient adaptive scheduling for malleable and evolving applications,” in 2015 IEEE International Parallel and Distributed Processing Symposium, 2015, pp. 429–438.
  121. E. Jonas, S. Venkataraman, I. Stoica, and B. Recht, “Occupy the cloud: Distributed computing for the 99%,” CoRR, vol. abs/1702.04024, 2017. [Online]. Available: http://arxiv.org/abs/1702.04024
  122. J. Thorpe, Y. Qiao, J. Eyolfson, S. Teng, G. Hu, Z. Jia, J. Wei, K. Vora, R. Netravali, M. Kim, and G. H. Xu, “Dorylus: Affordable, scalable, and accurate GNN training with distributed CPU servers and serverless threads,” in 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21).   USENIX Association, Jul. 2021, pp. 495–514. [Online]. Available: https://www.usenix.org/conference/osdi21/presentation/thorpe
  123. G. París, P. García-López, and M. Sánchez-Artigas, “Serverless elastic exploration of unbalanced algorithms,” in 2020 IEEE 13th International Conference on Cloud Computing (CLOUD), 2020, pp. 149–157.
  124. M. K. Aguilera, E. Amaro, N. Amit, E. Hunhoff, A. Yelam, and G. Zellweger, “Memory disaggregation: Why now and what are the challenges,” SIGOPS Oper. Syst. Rev., vol. 57, no. 1, p. 38–46, jun 2023. [Online]. Available: https://doi.org/10.1145/3606557.3606563
  125. P. S. Rao and G. Porter, “Is memory disaggregation feasible? a case study with spark sql,” in 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2016, pp. 75–80.
  126. D. Álvarez, K. Sala, and V. Beltran, “nos-v: Co-executing hpc applications using system-wide task scheduling,” 2022. [Online]. Available: https://arxiv.org/abs/2204.10768
  127. K. Brown and S. Matsuoka, “Co-locating graph analytics and hpc applications,” in 2017 IEEE International Conference on Cluster Computing (CLUSTER), 2017, pp. 659–660.
  128. H. Xu, S. Song, and Z. Mao, “Characterizing the performance of emerging deep learning, graph, and high performance computing workloads under interference,” 2023.
  129. M. Rocklin et al., “Dask: Parallel computation with blocked algorithms and task scheduling,” in Proceedings of the 14th python in science conference, vol. 130.   SciPy Austin, TX, 2015, p. 136.
  130. P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., “Ray: A distributed framework for emerging {{\{{AI}}\}} applications,” in 13th USENIX symposium on operating systems design and implementation (OSDI 18), 2018, pp. 561–577.
  131. A. Gueroudji, J. Bigot, and B. Raffin, “Deisa: Dask-enabled in situ analytics,” in 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2021, pp. 11–20.
  132. W. Lu, B. Shan, E. Raut, J. Meng, M. Araya-Polo, J. Doerfert, A. M. Malik, and B. Chapman, “Towards efficient remote openmp offloading,” in OpenMP in a Modern World: From Multi-device Support to Meta Programming, M. Klemm, B. R. de Supinski, J. Klinkenberg, and B. Neth, Eds.   Cham: Springer International Publishing, 2022, pp. 17–31.
  133. A. Patel and J. Doerfert, “Remote openmp offloading,” in High Performance Computing, A.-L. Varbanescu, A. Bhatele, P. Luszczek, and B. Marc, Eds.   Cham: Springer International Publishing, 2022, pp. 315–333.
  134. L. Möller, M. Copik, A. Calotoiu, and T. Hoefler, “Cppless: Productive and performant serverless programming in c++,” https://spcl.inf.ethz.ch/Publications/index.php?pub=508, 2023, accessed: 2024-01-19.
  135. T. Ben-Nun, J. de Fine Licht, A. N. Ziogas, T. Schneider, and T. Hoefler, “Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3295500.3356173
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Marcin Copik (22 papers)
  2. Marcin Chrapek (8 papers)
  3. Larissa Schmid (5 papers)
  4. Alexandru Calotoiu (19 papers)
  5. Torsten Hoefler (203 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.