Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes (2403.14352v3)
Abstract: Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement of large data sets from acquisition systems to HPC centers introduces bottlenecks owing to storage I/O at both ends. This manuscript introduces a streaming workflow designed for an high data rate electron detector that streams data directly to compute node memory at the National Energy Research Scientific Computing Center (NERSC), thereby avoiding storage I/O. The new workflow deploys ZeroMQ-based services for data production, aggregation, and distribution for on-the-fly processing, all coordinated through a distributed key-value store. The system is integrated with the detector's science gateway and utilizes the NERSC Superfacility API to initiate streaming jobs through a web-based frontend. Our approach achieves up to a 14-fold increase in data throughput and enhances predictability and reliability compared to a I/O-heavy file-based transfer workflow. Our work highlights the transformative potential of streaming workflows to expedite data analysis for time-sensitive experiments.
- “Towards data-driven next-generation transmission electron microscopy” In Nature materials 20.3, 2021, pp. 274–279 DOI: 10.1038/s41563-020-00833-z
- Rahul Rao “Synchrotrons face a data deluge” In Phys. Today 2020.2 AIP Publishing, 2020, pp. 0925a
- “LBNL Superfacility Project Report”, 2022 DOI: 10.2172/1875256
- “Cross-facility science with the Superfacility Project at LBNL” In 2020 IEEE/ACM 2nd Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP), 2020, pp. 1–7 IEEE
- “Distiller” In GitHub repository GitHub, https://github.com/OpenChemistry/distiller, 2023
- “The 4D Camera: an 87 kHz direct electron detector for scanning/transmission electron microscopy”, 2023 arXiv:2305.11961 [physics.ins-det]
- Andrew B Hanushevsky “Peer-to-peer computing for secure high performance data copying”, 2002
- Colin Ophus “Four-Dimensional Scanning Transmission Electron Microscopy (4D-STEM): From Scanning Nanodiffraction to Ptychography and Beyond” In Microscopy and Microanalysis 25.3, 2019, pp. 563–582 DOI: 10.1017/S1431927619000497
- “Electron ptychography achieves atomic-resolution limits set by lattice vibrations” In Science 372.6544, 2021, pp. 826–831 DOI: 10.1126/science.abg2533
- “OpenChemistry/stempy: stempy 3.3.3” Zenodo, 2023 DOI: 10.5281/zenodo.7806318
- “Spin.”, https://www.nersc.gov/systems/spin/
- Malhar Lathkar “High-Performance Web Apps with FastAPI: The Asynchronous Web Framework Based on Modern Python” In Nanded, Maharashtra, India Springer, 2023, pp. 1–309
- Nishant Garg “Apache kafka” Packt Publishing Birmingham, UK, 2013
- Joshua D Drake and John C Worsley “Practical PostgreSQL” " O’Reilly Media, Inc.", 2002
- Andy B Yoo, Morris A Jette and Mark Grondona “Slurm: Simple linux utility for resource management” In Workshop on job scheduling strategies for parallel processing, 2003, pp. 44–60 Springer
- Tobias Nipkow “Jinja: Towards a comprehensive formal semantics for a Java-like language” In Proof Technology and Computation, 2003, pp. 247–277
- “ZGuide.”, https://zguide.zeromq.org
- “MessagePack.”, https://msgpack.org/
- “Cluster imaging with a direct detection CMOS pixel sensor in Transmission Electron Microscopy” In Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment 608.2, 2009, pp. 363–365 DOI: 10.1016/j.nima.2009.07.017
- “The FAIR Guiding Principles for scientific data management and stewardship” In Scientific data 3.1 Nature Publishing Group, 2016, pp. 1–9
- “Streaming Data from Experimental Facilities to Supercomputers for Real-Time Data Processing” In Proceedings of the SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W ’23 Denver, CO, USA,: Association for Computing Machinery, 2023, pp. 2110–2117 DOI: 10.1145/3624062.3624610
- Leo R Dalesio, AJ Kozubal and MR Kraimer “EPICS architecture”, 1991
- Siniša Veseli “PvaPy: Python API for EPICS PV Access” In Proc. ICALEPCS 2015, 2015
- “The DUNE Far Detector Vertical Drift Technology, Technical Design Report”, 2023 arXiv:2312.03130 [hep-ex]
- “ALFA: The new ALICE-FAIR software framework” In Journal of Physics: Conference Series 664.7 IOP Publishing, 2015, pp. 072001 DOI: 10.1088/1742-6596/664/7/072001
- “ALFA: A framework for building distributed applications” In EPJ Web Conf. 245, 2020, pp. 05021 DOI: 10.1051/epjconf/202024505021
- “The O2 software framework and GPU usage in ALICE online and offline reconstruction in Run 3” In arXiv preprint arXiv:2402.01205, 2024
- “Accelerating Data Acquisition, Reduction, and Analysis at the Spallation Neutron Source” In 2014 IEEE 10th International Conference on e-Science 1, 2014, pp. 223–230 DOI: 10.1109/eScience.2014.31
- “Interactive automated Bragg peak identification with 3D neutron scattering data”, 2023