Dice Question Streamline Icon: https://streamlinehq.com

Responsibility for HPC data storage provisioning

Determine whether the responsibility for providing short-term and archival storage for data produced and used by high-performance computing workloads should lie with HPC centres, universities, or individual research groups, in order to guide planning for next-generation HPC systems.

Information Square Streamline Icon: https://streamlinehq.com

Background

The report highlights that biomolecular applications on HPC generate substantial data volumes, from tens of gigabytes per day for molecular dynamics to terabytes for cryo-EM datasets. Reliable short-term and archival storage, as well as efficient data transfer mechanisms, are essential for effective use of HPC resources.

Despite the critical nature of storage infrastructure, the authors note a lack of clarity regarding which parties—HPC centres, universities, or individual research groups—should be responsible for provisioning and maintaining storage. They emphasize that resolving this question is necessary before deploying the next generation of HPC platforms.

References

Storage needs to be provided for both short-term and archival use --- however, currently, it isn't clear whether this is the responsibility of HPC centres or universities or individual research groups. The answer to this question is beyond the remit of this report, but it does need to be answered before the next generation of HPC systems are built.

Engineering Supercomputing Platforms for Biomolecular Applications (2506.15585 - Welch et al., 18 Jun 2025) in Section 5.2 (Software Configuration)