Software Resource Disaggregation for HPC with Serverless Computing (2401.10852v5)

Published 19 Jan 2024 in cs.DC

Abstract: Aggregated HPC resources have rigid allocation systems and programming models which struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to efficiently use the large pools of unused memory and increase the utilization of idle computing resources. Prior work attempted to increase the throughput and efficiency of supercomputing systems through workload co-location and resource disaggregation. However, these methods fall short of providing a solution that can be applied to existing systems without major hardware modifications and performance losses. In this paper, we improve the utilization of supercomputers by employing the new cloud paradigm of serverless computing. We show how serverless functions provide fine-grained access to the resources of batch-managed cluster nodes. We present an HPC-oriented Function-as-a-Service (FaaS) that satisfies the requirements of high-performance applications. We demonstrate a software resource disaggregation approach where placing functions on unallocated and underutilized nodes allows idle cores and accelerators to be utilized while retaining near-native performance.

References (135)

Authors (5)

Marcin Copik (22 papers)
Marcin Chrapek (8 papers)
Larissa Schmid (5 papers)
Alexandru Calotoiu (19 papers)
Torsten Hoefler (203 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/spcl_eth/status/1795215447920365642

https://twitter.com/realmofresearch/status/1818551120488546447

Software Resource Disaggregation for HPC with Serverless Computing (2401.10852v5)

Summary

Related Papers

Tweets