Software Resource Disaggregation for HPC with Serverless Computing (2401.10852v5)
Abstract: Aggregated HPC resources have rigid allocation systems and programming models which struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to efficiently use the large pools of unused memory and increase the utilization of idle computing resources. Prior work attempted to increase the throughput and efficiency of supercomputing systems through workload co-location and resource disaggregation. However, these methods fall short of providing a solution that can be applied to existing systems without major hardware modifications and performance losses. In this paper, we improve the utilization of supercomputers by employing the new cloud paradigm of serverless computing. We show how serverless functions provide fine-grained access to the resources of batch-managed cluster nodes. We present an HPC-oriented Function-as-a-Service (FaaS) that satisfies the requirements of high-performance applications. We demonstrate a software resource disaggregation approach where placing functions on unallocated and underutilized nodes allows idle cores and accelerators to be utilized while retaining near-native performance.
- A. Khan, H. Sim, S. S. Vazhkudai, A. R. Butt, and Y. Kim, “An analysis of system balance and architectural trends based on top500 supercomputers,” in The International Conference on High Performance Computing in Asia-Pacific Region, ser. HPC Asia 2021. New York, NY, USA: Association for Computing Machinery, 2021, p. 11–22. [Online]. Available: https://doi.org/10.1145/3432261.3432263
- J. P. Jones and B. Nitzberg, “Scheduling for parallel supercomputing: A historical perspective of achievable utilization,” in Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999, pp. 1–16.
- H. You and H. Zhang, “Comprehensive workload analysis and modeling of a petascale supercomputer,” in Job Scheduling Strategies for Parallel Processing, W. Cirne, N. Desai, E. Frachtenberg, and U. Schwiegelshohn, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 253–271.
- T. Patel, Z. Liu, R. Kettimuthu, P. Rich, W. Allcock, and D. Tiwari, “Job characteristics on large-scale systems: Long-term analysis, quantification, and implications,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’20. IEEE Press, 2020.
- M. D. Jones, J. P. White, M. Innus, R. L. DeLeon, N. Simakov, J. T. Palmer, S. M. Gallo, T. R. Furlani, M. T. Showerman, R. Brunner, A. Kot, G. H. Bauer, B. M. Bode, J. Enos, and W. T. Kramer, “Workload analysis of blue waters,” CoRR, vol. abs/1703.00924, 2017. [Online]. Available: http://arxiv.org/abs/1703.00924
- A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely, “The case for colocation of high performance computing workloads,” Concurrency and Computation: Practice and Experience, vol. 28, no. 2, pp. 232–251, 2016. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3187
- D. G. Feitelson and L. Rudolph, “Toward convergence in job schedulers for parallel supercomputers,” in Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996, pp. 1–26.
- A. Mo-Hellenbrand, I. Comprés, O. Meister, H.-J. Bungartz, M. Gerndt, and M. Bader, “A large-scale malleable tsunami simulation realized on an elastic mpi infrastructure,” in Proceedings of the Computing Frontiers Conference, ser. CF’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 271–274. [Online]. Available: https://doi.org/10.1145/3075564.3075585
- S. Iserte, R. Mayo, E. S. Quintana-Ortí, V. Beltran, and A. J. Peña, “Efficient scalable computing through flexible applications and adaptive workloads,” in 2017 46th International Conference on Parallel Processing Workshops (ICPPW), 2017, pp. 180–189.
- G. Michelogiannakis, B. Klenk, B. Cook, M. Y. Teh, M. Glick, L. Dennison, K. Bergman, and J. Shalf, “A case for intra-rack resource disaggregation in hpc,” ACM Trans. Archit. Code Optim., jan 2022, just Accepted. [Online]. Available: https://doi.org/10.1145/3514245
- K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch, “Disaggregated memory for expansion and sharing in blade servers,” in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 267–278. [Online]. Available: https://doi.org/10.1145/1555754.1555789
- C. Pinto, D. Syrivelis, M. Gazzetti, P. Koutsovasilis, A. Reale, K. Katrinis, and H. P. Hofstee, “Thymesisflow: A software-defined, hw/sw co-designed interconnect stack for rack-scale memory disaggregation,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 868–880.
- M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei, “Remote memory in the age of fast networks,” in Proceedings of the 2017 Symposium on Cloud Computing, ser. SoCC ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 121–127. [Online]. Available: https://doi.org/10.1145/3127479.3131612
- C. Iancu, S. Hofmeyr, F. Blagojević, and Y. Zheng, “Oversubscription on multicore processors,” in 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), 2010, pp. 1–11.
- M. J. Koop, M. Luo, and D. K. Panda, “Reducing network contention with mixed workloads on modern multicore, clusters,” in 2009 IEEE International Conference on Cluster Computing and Workshops, 2009, pp. 1–10.
- T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud, and J. Pei, “A practical method for estimating performance degradation on multicore processors, and its application to hpc workloads,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’12. Washington, DC, USA: IEEE Computer Society Press, 2012.
- M. Kambadur, T. Moseley, R. Hank, and M. A. Kim, “Measuring interference between live datacenter applications,” in SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1–12.
- J. Weinberg and A. Snavely, “Symbiotic space-sharing on sdsc’s datastar system,” in Job Scheduling Strategies for Parallel Processing, E. Frachtenberg and U. Schwiegelshohn, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 192–209.
- C. Antonopoulos, D. Nikolopoulos, and T. Papatheodorou, “Scheduling algorithms with bus bandwidth considerations for smps,” in 2003 International Conference on Parallel Processing, 2003. Proceedings., 2003, pp. 547–554.
- M. Copik, K. Taranov, A. Calotoiu, and T. Hoefler, “rFaaS: Enabling High Performance Serverless with RDMA and Leases,” in 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). Los Alamitos, CA, USA: IEEE Computer Society, may 2023, pp. 897–907. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/IPDPS54959.2023.00094
- B. Przybylski, M. Pawlik, P. Żuk, B. Lagosz, M. Malawski, and K. Rzadca, “Using unused: Non-invasive dynamic faas infrastructure with hpc-whisk,” in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, pp. 1–15.
- I. Peng, R. Pearce, and M. Gokhale, “On the memory underutilization: Exploring disaggregated memory on hpc systems,” in 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2020, pp. 183–190.
- G. Panwar, D. Zhang, Y. Pang, M. Dahshan, N. DeBardeleben, B. Ravindran, and X. Jian, “Quantifying memory underutilization in hpc systems and using it to improve performance via architecture support,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’52. New York, NY, USA: Association for Computing Machinery, 2019, p. 821–835. [Online]. Available: https://doi.org/10.1145/3352460.3358267
- D. Zivanovic, M. Pavlovic, M. Radulovic, H. Shin, J. Son, S. A. Mckee, P. M. Carpenter, P. Radojković, and E. Ayguadé, “Main memory in hpc: Do we need more or could we live with less?” ACM Trans. Archit. Code Optim., vol. 14, no. 1, mar 2017. [Online]. Available: https://doi.org/10.1145/3023362
- L. A. Barroso, U. Hölzle, and P. Ranganathan, “The datacenter as a computer: Designing warehouse-scale machines,” Synthesis Lectures on Computer Architecture, vol. 13, no. 3, pp. i–189, 2018.
- “TOP500, November 2021,” https://www.top500.org/lists/top500/2021/11/, 2021, accessed: 2022-03-10.
- F. Wang, S. Oral, S. Sen, and N. Imam, “Learning from five-year resource-utilization data of titan system,” in 2019 IEEE International Conference on Cluster Computing (CLUSTER), 2019, pp. 1–6.
- A. Dragojević, D. Narayanan, M. Castro, and O. Hodson, “FaRM: Fast remote memory,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). Seattle, WA: USENIX Association, Apr. 2014, pp. 401–414. [Online]. Available: https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevic
- J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin, “Efficient memory disaggregation with infiniswap,” in 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). Boston, MA: USENIX Association, Mar. 2017, pp. 649–667. [Online]. Available: https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/gu
- P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker, “Network requirements for resource disaggregation,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). Savannah, GA: USENIX Association, Nov. 2016, pp. 249–264. [Online]. Available: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/gao
- S. Liang, R. Noronha, and D. K. Panda, “Swapping to remote memory over infiniband: An approach using a high performance network block device,” in 2005 IEEE International Conference on Cluster Computing, 2005, pp. 1–10.
- J. P. White, R. L. DeLeon, T. R. Furlani, S. M. Gallo, M. D. Jones, A. Ghadersohi, C. D. Cornelius, A. K. Patra, J. C. Browne, W. L. Barth, and J. Hammond, “An analysis of node sharing on hpc clusters using xdmod/tacc_stats,” in Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, ser. XSEDE ’14. New York, NY, USA: Association for Computing Machinery, 2014. [Online]. Available: https://doi.org/10.1145/2616498.2616533
- N. A. Simakov, R. L. DeLeon, J. P. White, T. R. Furlani, M. Innus, S. M. Gallo, M. D. Jones, A. Patra, B. D. Plessinger, J. Sperhac, T. Yearke, R. Rathsam, and J. T. Palmer, “A quantitative analysis of node sharing on hpc clusters using xdmod application kernels,” in Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, ser. XSEDE16. New York, NY, USA: Association for Computing Machinery, 2016. [Online]. Available: https://doi.org/10.1145/2949550.2949553
- A. Snavely, D. M. Tullsen, and G. Voelker, “Symbiotic jobscheduling with priorities for a simultaneous multithreading processor,” in Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS ’02. New York, NY, USA: Association for Computing Machinery, 2002, p. 66–76. [Online]. Available: https://doi.org/10.1145/511334.511343
- J. Liedtke, M. Völp, and K. Elphinstone, “Preliminary thoughts on memory-bus scheduling,” in Proceedings of the 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System, ser. EW 9. New York, NY, USA: Association for Computing Machinery, 2000, p. 207–210. [Online]. Available: https://doi.org/10.1145/566726.566768
- E. Koukis and N. Koziris, “Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps,” in 12th International Conference on Parallel and Distributed Systems - (ICPADS’06), vol. 1, 2006, pp. 10 pp.–.
- D. Xu, C. Wu, and P.-C. Yew, “On mitigating memory bandwidth contention through bandwidth-aware scheduling,” in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT ’10. New York, NY, USA: Association for Computing Machinery, 2010, p. 237–248. [Online]. Available: https://doi.org/10.1145/1854273.1854306
- J. Weinberg and A. Snavely, “User-guided symbiotic space-sharing of real workloads,” in Proceedings of the 20th Annual International Conference on Supercomputing, ser. ICS ’06. New York, NY, USA: Association for Computing Machinery, 2006, p. 345–352. [Online]. Available: https://doi.org/10.1145/1183401.1183450
- L. Tang, J. Mars, W. Wang, T. Dey, and M. L. Soffa, “Reqos: Reactive static/dynamic compilation for qos in warehouse scale computers,” SIGPLAN Not., vol. 48, no. 4, p. 89–100, mar 2013. [Online]. Available: https://doi.org/10.1145/2499368.2451126
- A. D. Breslow, A. Tiwari, M. Schulz, L. Carrington, L. Tang, and J. Mars, “Enabling fair pricing on hpc systems with node sharing,” in SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–12.
- M. Shahrad, R. Fonseca, I. Goiri, G. Chaudhry, P. Batum, J. Cooke, E. Laureano, C. Tresness, M. Russinovich, and R. Bianchini, “Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider,” in 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, Jul. 2020, pp. 205–218. [Online]. Available: https://www.usenix.org/conference/atc20/presentation/shahrad
- R. D. Hornung, J. A. Keasler, and M. B. Gokhale, “Hydrodynamics challenge problem,” 6 2011. [Online]. Available: https://www.osti.gov/biblio/1117905
- M. Copik, A. Calotoiu, T. Grosser, N. Wicki, F. Wolf, and T. Hoefler, “Extracting clean performance models from tainted programs,” in Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 403–417. [Online]. Available: https://doi.org/10.1145/3437801.3441613
- I. Comprés, A. Mo-Hellenbrand, M. Gerndt, and H.-J. Bungartz, “Infrastructure and api extensions for elastic execution of mpi applications,” in Proceedings of the 23rd European MPI Users’ Group Meeting, ser. EuroMPI 2016. New York, NY, USA: Association for Computing Machinery, 2016, p. 82–97. [Online]. Available: https://doi.org/10.1145/2966884.2966917
- J. Duato, A. J. Pena, F. Silla, R. Mayo, and E. S. Quintana-Ortí, “rcuda: Reducing the number of gpu-based accelerators in high performance clusters,” in 2010 International Conference on High Performance Computing & Simulation. IEEE, 2010, pp. 224–231.
- L. Tobler, “Gpuless – serverless gpu functions,” 2022. [Online]. Available: https://spcl.inf.ethz.ch/Publications/.pdf/tobler-gpu-thesis.pdf
- (2019) Slurm generic resource (gres) scheduling. https://slurm.schedmd.com/gres.html. Accessed: 2020-01-20.
- A. Nayak, P. B., V. Ganapathy, and A. Basu, “(mis)managed: A novel tlb-based covert channel on gpus,” in Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ser. ASIA CCS ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 872–885. [Online]. Available: https://doi.org/10.1145/3433210.3453077
- G. Gilman and R. J. Walls, “Characterizing concurrency mechanisms for nvidia gpus under deep learning workloads,” Performance Evaluation, vol. 151, p. 102234, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0166531621000511
- C. Delimitrou and C. Kozyrakis, “Qos-aware scheduling in heterogeneous datacenters with paragon,” ACM Trans. Comput. Syst., vol. 31, no. 4, dec 2013. [Online]. Available: https://doi.org/10.1145/2556583
- Z. Zhao, N. J. Wright, and K. Antypas, “Effects of hyper-threading on the nersc workload on edison,” Cray User Group CUG (May 2013), 2013. [Online]. Available: https://cug.org/proceedings/cug2013_proceedings/includes/files/pap106.pdf
- K. Antypas, B. Austin, T. Butler, R. Gerber, C. Whitney, N. Wright, W.-S. Yang, and Z. Zhao, “Nersc workload analysis on hopper,” Lawrence Berkeley National Laboratory Technical Report, vol. 6804, p. 15, 2013. [Online]. Available: https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Documents/NERSCWorkloadAnalysisFeb2013.pdf
- H. Yu, C. Fontenot, H. Wang, J. Li, X. Yuan, and S.-J. Park, “Libra: Harvesting idle resources safely and timely in serverless clusters,” in Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 181–194. [Online]. Available: https://doi.org/10.1145/3588195.3592996
- A. Calotoiu, A. Graf, T. Hoefler, D. Lorenz, S. Rinke, and F. Wolf, “Lightweight requirements engineering for exascale co-design,” in 2018 IEEE International Conference on Cluster Computing (CLUSTER), 2018, pp. 201–211.
- A. Shah, M. Müller, and F. Wolf, “Estimating the impact of external interference on application performance,” in Euro-Par 2018: Parallel Processing, M. Aldinucci, L. Padovani, and M. Torquati, Eds. Cham: Springer International Publishing, 2018, pp. 46–58.
- T. Hoefler, T. Schneider, and A. Lumsdaine, “Characterizing the influence of system noise on large-scale applications by simulation,” in SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2010, pp. 1–11.
- ——, “The impact of network noise at large-scale communication performance,” in 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009, pp. 1–8.
- D. De Sensi, T. De Matteis, K. Taranov, S. Di Girolamo, T. Rahn, and T. Hoefler, “Noise in the clouds: Influence of network performance variability on application scalability,” Proc. ACM Meas. Anal. Comput. Syst., vol. 6, no. 3, dec 2022. [Online]. Available: https://doi.org/10.1145/3570609
- A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs, “There goes the neighborhood: Performance degradation due to nearby jobs,” in SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2013, pp. 1–12.
- X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan, “Watch out for the bully! job interference study on dragonfly network,” in SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, pp. 750–760.
- “Piz Daint,” https://www.cscs.ch/computers/piz-daint/, 2021, accessed: 2020-01-20.
- M. Copik, G. Kwasniewski, M. Besta, M. Podstawski, and T. Hoefler, “Sebs: A serverless benchmark suite for function-as-a-service computing,” in Proceedings of the 22nd International Middleware Conference, ser. Middleware ’21. Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3464298.3476133
- H. Pritchard, E. Harvey, S.-E. Choi, J. Swaro, and Z. Tiffany, “The gni provider layer for ofi libfabric,” in Proceedings of Cray User Group Meeting, CUG, vol. 2016, 2016.
- A. Madonna and T. Aliaga, “Libfa bric-based injection solutions for portable containerized mpi applications,” in 2022 IEEE/ACM 4th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC), 2022, pp. 45–56.
- J. Shimek, J. Swaro, and M. Saint Paul, “Dynamic rdma credentials,” in Cray User Group (CUG) Meeting, 2016.
- J. Manner, M. Endreß, T. Heckel, and G. Wirtz, “Cold start influencing factors in function as a service,” in 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion). IEEE, 2018, pp. 181–188.
- P. Silva, D. Fireman, and T. E. Pereira, “Prebaking functions to warm the serverless cold start,” in Proceedings of the 21st International Middleware Conference, ser. Middleware ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 1–13. [Online]. Available: https://doi.org/10.1145/3423211.3425682
- E. Oakes, L. Yang, D. Zhou, K. Houck, T. Harter, A. Arpaci-Dusseau, and R. Arpaci-Dusseau, “SOCK: Rapid task provisioning with serverless-optimized containers,” in 2018 USENIX Annual Technical Conference (USENIX ATC 18). Boston, MA: USENIX Association, Jul. 2018, pp. 57–70. [Online]. Available: https://www.usenix.org/conference/atc18/presentation/oakes
- A. Agache, M. Brooker, A. Iordache, A. Liguori, R. Neugebauer, P. Piwonka, and D.-M. Popa, “Firecracker: Lightweight virtualization for serverless applications,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). Santa Clara, CA: USENIX Association, Feb. 2020, pp. 419–434. [Online]. Available: https://www.usenix.org/conference/nsdi20/presentation/agache
- D. Du, T. Yu, Y. Xia, B. Zang, G. Yan, C. Qin, Q. Wu, and H. Chen, “Catalyzer: Sub-millisecond startup for serverless computing with initialization-less booting,” in Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 467–481. [Online]. Available: https://doi.org/10.1145/3373376.3378512
- G. M. Kurtzer, V. Sochat, and M. W. Bauer, “Singularity: Scientific containers for mobility of compute,” PLOS ONE, vol. 12, no. 5, p. e0177459, May 2017. [Online]. Available: https://doi.org/10.1371/journal.pone.0177459
- L. Benedicic, F. A. Cruz, A. Madonna, and K. Mariotti, “Sarus: Highly scalable docker containers for hpc systems,” in International Conference on High Performance Computing. Springer, 2019, pp. 46–60.
- B. Behzad, S. Byna, Prabhat, and M. Snir, “Optimizing i/o performance of hpc applications with autotuning,” ACM Trans. Parallel Comput., vol. 5, no. 4, mar 2019. [Online]. Available: https://doi.org/10.1145/3309205
- “AWS API Pricing,” https://aws.amazon.com/api-gateway/pricing/, 2020, accessed: 2020-08-20.
- “Google Cloud Functions Pricing,” https://cloud.google.com/functions/pricing, 2020, accessed: 2020-08-20.
- D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, “Logp: Towards a realistic model of parallel computation,” in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPOPP ’93. New York, NY, USA: Association for Computing Machinery, 1993, p. 1–12. [Online]. Available: https://doi.org/10.1145/155332.155333
- T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm, “LogfP - A Model for small Messages in InfiniBand,” in Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS), PMEO-PDS’06 Workshop, Apr. 2006.
- A. Heinecke, S. Schraufstetter, and H.-J. Bungartz, “A highly parallel black–scholes solver based on adaptive sparse grids,” Int. J. Comput. Math., vol. 89, no. 9, p. 1212–1238, Jun. 2012. [Online]. Available: https://doi.org/10.1080/00207160.2012.690865
- A. Calotoiu, T. Hoefler, M. Poke, and F. Wolf, “Using automated performance modeling to find scalability bugs in complex codes,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’13. New York, NY, USA: Association for Computing Machinery, 2013. [Online]. Available: https://doi.org/10.1145/2503210.2503277
- S. Shudler, A. Calotoiu, T. Hoefler, and F. Wolf, “Isoefficiency in practice: Configuring and understanding the performance of task-based applications,” in Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 131–143. [Online]. Available: https://doi.org/10.1145/3018743.3018770
- M. Copik, T. Grosser, T. Hoefler, P. Bientinesi, and B. Berkels, “Work-stealing prefix scan: Addressing load imbalance in large-scale image registration,” IEEE Transactions on Parallel & Distributed Systems, vol. 33, no. 03, pp. 523–535, mar 2022.
- M. Chadha, J. John, and M. Gerndt, “Extending slurm for dynamic resource-aware adaptive batch scheduling,” in 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2020, pp. 223–232.
- M. Copik, R. Böhringer, A. Calotoiu, and T. Hoefler, “Fmi: Fast and cheap message passing for serverless functions,” in Proceedings of the 37th International Conference on Supercomputing, ser. ICS ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 373–385. [Online]. Available: https://doi.org/10.1145/3577193.3593718
- “MinIO Object Storage,” min.io, 2019, accessed: 2023-10-05.
- C. Bernard, M. C. Ogilvie, T. A. DeGrand, C. E. DeTar, S. A. Gottlieb, A. Krasnitz, R. L. Sugar, and D. Toussaint, “Studying quarks and gluons on mimd parallel computers,” The International Journal of Supercomputing Applications, vol. 5, no. 4, pp. 61–70, 1991.
- H.-Q. Jin, M. Frumkin, and J. Yan, “The openmp implementation of nas parallel benchmarks and its performance,” 1999. [Online]. Available: https://ntrs.nasa.gov/api/citations/20000102377/downloads/20000102377.pdf
- S. Seo, G. Jo, and J. Lee, “Performance characterization of the nas parallel benchmarks in opencl,” in 2011 IEEE International Symposium on Workload Characterization (IISWC), 2011, pp. 137–148.
- F. Wong, R. Martin, R. Arpaci-Dusseau, and D. Culler, “Architectural requirements and scalability of the nas parallel benchmarks,” in SC ’99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, 1999, pp. 41–41.
- H. Shan, F. Blagojević, S.-J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright, “A programming model performance study using the nas parallel benchmarks,” Scientific Programming, vol. 18, p. 715637, Jan 1900. [Online]. Available: https://doi.org/10.3233/SPR-2010-0306
- J. Löff, D. Griebler, G. Mencagli, G. Araujo, M. Torquati, M. Danelutto, and L. G. Fernandes, “The nas parallel benchmarks for evaluating c++ parallel programming frameworks on shared-memory architectures,” Future Generation Computer Systems, vol. 125, pp. 743–757, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X21002831
- A. Faraj and X. Yuan, “Communication characteristics in the NAS parallel benchmarks,” in International Conference on Parallel and Distributed Computing Systems, PDCS 2002, November 4-6, 2002, Cambridge, USA, S. G. Akl and T. F. Gonzalez, Eds. IASTED/ACTA Press, 2002, pp. 724–729.
- A. Bauer, H. Pan, R. Chard, Y. Babuji, J. Bryan, D. Tiwari, I. Foster, and K. Chard, “The globus compute dataset: An open function-as-a-service dataset from the edge to the cloud,” Future Generation Computer Systems, vol. 153, pp. 558–574, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X23004703
- J. Carter, Y. He, J. Shalf, H. Shan, E. Strohmaier, and H. Wasserman, “The performance effect of multi-core on scientific applications,” 5 2007. [Online]. Available: https://www.osti.gov/biblio/923361
- G. Bauer, S. Gottlieb, and T. Hoefler, “Performance modeling and comparative analysis of the milc lattice qcd application su3_rmd,” in 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012, pp. 652–659.
- K. Antypas, J. Shalf, and H. Wasserman, “Nersc-6 workload analysis and benchmark selection process,” Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), Tech. Rep., 2008. [Online]. Available: https://www.nersc.gov/assets/pubs_presos/NERSCWorkload.pdf
- Y. Wang, J. D. McCalpin, J. Li, M. Cawood, J. Cazes, H. Chen, L. Koesterke, H. Liu, C.-Y. Lu, R. McLay, K. Milfield, A. Ruhela, D. Semeraro, and W. Zhang, “Application performance analysis: A report on the impact of memory bandwidth,” in High Performance Computing, A. Bienz, M. Weiland, M. Baboulin, and C. Kruse, Eds. Cham: Springer Nature Switzerland, 2023, pp. 339–352.
- S. Smith, D. Lowenthal, A. Bhatele, J. Thiagarajan, P. Bremer, and Y. Livnat, “Analyzing inter-job contention in dragonfly networks,” 2016. [Online]. Available: https://www2.cs.arizona.edu/~smiths949/dragonfly.pdf
- Y. Zhang, T. Groves, B. Cook, N. J. Wright, and A. K. Coskun, “Quantifying the impact of network congestion on application performance and network metrics,” in 2020 IEEE International Conference on Cluster Computing (CLUSTER), 2020, pp. 162–168.
- A. Patke, S. Jha, H. Qiu, J. Brandt, A. Gentile, J. Greenseid, Z. Kalbarczyk, and R. K. Iyer, “Delay sensitivity-driven congestion mitigation for hpc systems,” in Proceedings of the ACM International Conference on Supercomputing, ser. ICS ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 342–353. [Online]. Available: https://doi.org/10.1145/3447818.3460362
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in 2009 IEEE International Symposium on Workload Characterization (IISWC), 2009, pp. 44–54.
- P. K. Romano, N. E. Horelik, B. R. Herman, A. G. Nelson, B. Forget, and K. Smith, “Openmc: A state-of-the-art monte carlo code for research and development,” Annals of Nuclear Energy, vol. 82, pp. 90–97, 2015, joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo 2013, SNA + MC 2013. Pluri- and Trans-disciplinarity, Towards New Modeling and Numerical Simulation Paradigms. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S030645491400379X
- L. M. et al, “Monte carlo reactor calculation with substantially reduced number of cycles,” in Proceedings of PHYSOR 2012, 2012.
- R. Chard, Y. Babuji, Z. Li, T. Skluzacek, A. Woodard, B. Blaiszik, I. Foster, and K. Chard, “Funcx: A federated function serving fabric for science,” in Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 65–76. [Online]. Available: https://doi.org/10.1145/3369583.3392683
- R. B. Roy, T. Patel, and D. Tiwari, “Daydream: Executing dynamic scientific workflows on serverless platforms with hot starts,” in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, pp. 1–18.
- R. B. Roy, T. Patel, V. Gadepally, and D. Tiwari, “Mashup: Making serverless computing useful for hpc workflows via hybrid execution,” in Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 46–60. [Online]. Available: https://doi.org/10.1145/3503221.3508407
- A. Frank, T. Süß, and A. Brinkmann, “Effects and benefits of node sharing strategies in hpc batch systems,” in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2019, pp. 43–53.
- X. Tang, H. Wang, X. Ma, N. El-Sayed, J. Zhai, W. Chen, and A. Aboulnaga, “Spread-n-share: improving application performance and cluster throughput with resource-aware job placement,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–15.
- J. Park, S. Park, and W. Baek, “Copart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers,” in Proceedings of the Fourteenth EuroSys Conference 2019, 2019, pp. 1–16.
- P. Ambati, I. Goiri, F. Frujeri, A. Gun, K. Wang, B. Dolan, B. Corell, S. Pasupuleti, T. Moscibroda, S. Elnikety, M. Fontoura, and R. Bianchini, “Providing SLOs for Resource-Harvesting VMs in cloud platforms,” in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Nov. 2020, pp. 735–751. [Online]. Available: https://www.usenix.org/conference/osdi20/presentation/ambati
- A. Fuerst, S. Novaković, I. n. Goiri, G. I. Chaudhry, P. Sharma, K. Arya, K. Broas, E. Bak, M. Iyigun, and R. Bianchini, “Memory-harvesting vms in cloud platforms,” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 583–594. [Online]. Available: https://doi.org/10.1145/3503222.3507725
- Y. Zhang, I. n. Goiri, G. I. Chaudhry, R. Fonseca, S. Elnikety, C. Delimitrou, and R. Bianchini, “Faster and cheaper serverless computing on harvested resources,” in Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, ser. SOSP ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 724–739. [Online]. Available: https://doi.org/10.1145/3477132.3483580
- H. Yu, H. Wang, J. Li, X. Yuan, and S.-J. Park, “Accelerating serverless computing by harvesting idle resources,” in Proceedings of the ACM Web Conference 2022, ser. WWW ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 1741–1751. [Online]. Available: https://doi.org/10.1145/3485447.3511979
- A. Raveendran, T. Bicer, and G. Agrawal, “A framework for elastic execution of existing mpi programs,” in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011, pp. 940–947.
- G. Martín, D. E. Singh, M.-C. Marinescu, and J. Carretero, “Enhancing the performance of malleable mpi applications by using performance-aware dynamic reconfiguration,” Parallel Computing, vol. 46, pp. 60–77, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167819115000642
- C. Huang, G. Zheng, L. Kalé, and S. Kumar, “Performance evaluation of adaptive mpi,” in Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’06. New York, NY, USA: Association for Computing Machinery, 2006, p. 12–21. [Online]. Available: https://doi.org/10.1145/1122971.1122976
- I. Cores, P. González, E. Jeannot, M. J. Martín, and G. Rodríguez, “An application-level solution for the dynamic reconfiguration of mpi applications,” in High Performance Computing for Computational Science – VECPAR 2016, I. Dutra, R. Camacho, J. Barbosa, and O. Marques, Eds. Cham: Springer International Publishing, 2017, pp. 191–205.
- K. El Maghraoui, B. K. Szymanski, and C. Varela, “An architecture for reconfigurable iterative mpi applications in dynamic environments,” in Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, N. Meyer, and J. Waśniewski, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 258–271.
- K. El Maghraoui, T. J. Desell, B. K. Szymanski, and C. A. Varela, “Dynamic malleability in iterative mpi applications,” in Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid ’07), 2007, pp. 591–598.
- S. Prabhakaran, M. Iqbal, S. Rinke, C. Windisch, and F. Wolf, “A batch system with fair scheduling for evolving applications,” in 2014 43rd International Conference on Parallel Processing, 2014, pp. 351–360.
- S. Prabhakaran, M. Neumann, S. Rinke, F. Wolf, A. Gupta, and L. V. Kale, “A batch system with efficient adaptive scheduling for malleable and evolving applications,” in 2015 IEEE International Parallel and Distributed Processing Symposium, 2015, pp. 429–438.
- E. Jonas, S. Venkataraman, I. Stoica, and B. Recht, “Occupy the cloud: Distributed computing for the 99%,” CoRR, vol. abs/1702.04024, 2017. [Online]. Available: http://arxiv.org/abs/1702.04024
- J. Thorpe, Y. Qiao, J. Eyolfson, S. Teng, G. Hu, Z. Jia, J. Wei, K. Vora, R. Netravali, M. Kim, and G. H. Xu, “Dorylus: Affordable, scalable, and accurate GNN training with distributed CPU servers and serverless threads,” in 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, Jul. 2021, pp. 495–514. [Online]. Available: https://www.usenix.org/conference/osdi21/presentation/thorpe
- G. París, P. García-López, and M. Sánchez-Artigas, “Serverless elastic exploration of unbalanced algorithms,” in 2020 IEEE 13th International Conference on Cloud Computing (CLOUD), 2020, pp. 149–157.
- M. K. Aguilera, E. Amaro, N. Amit, E. Hunhoff, A. Yelam, and G. Zellweger, “Memory disaggregation: Why now and what are the challenges,” SIGOPS Oper. Syst. Rev., vol. 57, no. 1, p. 38–46, jun 2023. [Online]. Available: https://doi.org/10.1145/3606557.3606563
- P. S. Rao and G. Porter, “Is memory disaggregation feasible? a case study with spark sql,” in 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2016, pp. 75–80.
- D. Álvarez, K. Sala, and V. Beltran, “nos-v: Co-executing hpc applications using system-wide task scheduling,” 2022. [Online]. Available: https://arxiv.org/abs/2204.10768
- K. Brown and S. Matsuoka, “Co-locating graph analytics and hpc applications,” in 2017 IEEE International Conference on Cluster Computing (CLUSTER), 2017, pp. 659–660.
- H. Xu, S. Song, and Z. Mao, “Characterizing the performance of emerging deep learning, graph, and high performance computing workloads under interference,” 2023.
- M. Rocklin et al., “Dask: Parallel computation with blocked algorithms and task scheduling,” in Proceedings of the 14th python in science conference, vol. 130. SciPy Austin, TX, 2015, p. 136.
- P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., “Ray: A distributed framework for emerging {{\{{AI}}\}} applications,” in 13th USENIX symposium on operating systems design and implementation (OSDI 18), 2018, pp. 561–577.
- A. Gueroudji, J. Bigot, and B. Raffin, “Deisa: Dask-enabled in situ analytics,” in 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2021, pp. 11–20.
- W. Lu, B. Shan, E. Raut, J. Meng, M. Araya-Polo, J. Doerfert, A. M. Malik, and B. Chapman, “Towards efficient remote openmp offloading,” in OpenMP in a Modern World: From Multi-device Support to Meta Programming, M. Klemm, B. R. de Supinski, J. Klinkenberg, and B. Neth, Eds. Cham: Springer International Publishing, 2022, pp. 17–31.
- A. Patel and J. Doerfert, “Remote openmp offloading,” in High Performance Computing, A.-L. Varbanescu, A. Bhatele, P. Luszczek, and B. Marc, Eds. Cham: Springer International Publishing, 2022, pp. 315–333.
- L. Möller, M. Copik, A. Calotoiu, and T. Hoefler, “Cppless: Productive and performant serverless programming in c++,” https://spcl.inf.ethz.ch/Publications/index.php?pub=508, 2023, accessed: 2024-01-19.
- T. Ben-Nun, J. de Fine Licht, A. N. Ziogas, T. Schneider, and T. Hoefler, “Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’19. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3295500.3356173
- Marcin Copik (22 papers)
- Marcin Chrapek (8 papers)
- Larissa Schmid (5 papers)
- Alexandru Calotoiu (19 papers)
- Torsten Hoefler (203 papers)