Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes (2403.14352v3)

Published 21 Mar 2024 in cs.NI and cs.DC

Abstract: Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement of large data sets from acquisition systems to HPC centers introduces bottlenecks owing to storage I/O at both ends. This manuscript introduces a streaming workflow designed for an high data rate electron detector that streams data directly to compute node memory at the National Energy Research Scientific Computing Center (NERSC), thereby avoiding storage I/O. The new workflow deploys ZeroMQ-based services for data production, aggregation, and distribution for on-the-fly processing, all coordinated through a distributed key-value store. The system is integrated with the detector's science gateway and utilizes the NERSC Superfacility API to initiate streaming jobs through a web-based frontend. Our approach achieves up to a 14-fold increase in data throughput and enhances predictability and reliability compared to a I/O-heavy file-based transfer workflow. Our work highlights the transformative potential of streaming workflows to expedite data analysis for time-sensitive experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. “Towards data-driven next-generation transmission electron microscopy” In Nature materials 20.3, 2021, pp. 274–279 DOI: 10.1038/s41563-020-00833-z
  2. Rahul Rao “Synchrotrons face a data deluge” In Phys. Today 2020.2 AIP Publishing, 2020, pp. 0925a
  3. “LBNL Superfacility Project Report”, 2022 DOI: 10.2172/1875256
  4. “Cross-facility science with the Superfacility Project at LBNL” In 2020 IEEE/ACM 2nd Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP), 2020, pp. 1–7 IEEE
  5. “Distiller” In GitHub repository GitHub, https://github.com/OpenChemistry/distiller, 2023
  6. “The 4D Camera: an 87 kHz direct electron detector for scanning/transmission electron microscopy”, 2023 arXiv:2305.11961 [physics.ins-det]
  7. Andrew B Hanushevsky “Peer-to-peer computing for secure high performance data copying”, 2002
  8. Colin Ophus “Four-Dimensional Scanning Transmission Electron Microscopy (4D-STEM): From Scanning Nanodiffraction to Ptychography and Beyond” In Microscopy and Microanalysis 25.3, 2019, pp. 563–582 DOI: 10.1017/S1431927619000497
  9. “Electron ptychography achieves atomic-resolution limits set by lattice vibrations” In Science 372.6544, 2021, pp. 826–831 DOI: 10.1126/science.abg2533
  10. “OpenChemistry/stempy: stempy 3.3.3” Zenodo, 2023 DOI: 10.5281/zenodo.7806318
  11. “Spin.”, https://www.nersc.gov/systems/spin/
  12. Malhar Lathkar “High-Performance Web Apps with FastAPI: The Asynchronous Web Framework Based on Modern Python” In Nanded, Maharashtra, India Springer, 2023, pp. 1–309
  13. Nishant Garg “Apache kafka” Packt Publishing Birmingham, UK, 2013
  14. Joshua D Drake and John C Worsley “Practical PostgreSQL” " O’Reilly Media, Inc.", 2002
  15. Andy B Yoo, Morris A Jette and Mark Grondona “Slurm: Simple linux utility for resource management” In Workshop on job scheduling strategies for parallel processing, 2003, pp. 44–60 Springer
  16. Tobias Nipkow “Jinja: Towards a comprehensive formal semantics for a Java-like language” In Proof Technology and Computation, 2003, pp. 247–277
  17. “ZGuide.”, https://zguide.zeromq.org
  18. “MessagePack.”, https://msgpack.org/
  19. “Cluster imaging with a direct detection CMOS pixel sensor in Transmission Electron Microscopy” In Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment 608.2, 2009, pp. 363–365 DOI: 10.1016/j.nima.2009.07.017
  20. “The FAIR Guiding Principles for scientific data management and stewardship” In Scientific data 3.1 Nature Publishing Group, 2016, pp. 1–9
  21. “Streaming Data from Experimental Facilities to Supercomputers for Real-Time Data Processing” In Proceedings of the SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W ’23 Denver, CO, USA,: Association for Computing Machinery, 2023, pp. 2110–2117 DOI: 10.1145/3624062.3624610
  22. Leo R Dalesio, AJ Kozubal and MR Kraimer “EPICS architecture”, 1991
  23. Siniša Veseli “PvaPy: Python API for EPICS PV Access” In Proc. ICALEPCS 2015, 2015
  24. “The DUNE Far Detector Vertical Drift Technology, Technical Design Report”, 2023 arXiv:2312.03130 [hep-ex]
  25. “ALFA: The new ALICE-FAIR software framework” In Journal of Physics: Conference Series 664.7 IOP Publishing, 2015, pp. 072001 DOI: 10.1088/1742-6596/664/7/072001
  26. “ALFA: A framework for building distributed applications” In EPJ Web Conf. 245, 2020, pp. 05021 DOI: 10.1051/epjconf/202024505021
  27. “The O2 software framework and GPU usage in ALICE online and offline reconstruction in Run 3” In arXiv preprint arXiv:2402.01205, 2024
  28. “Accelerating Data Acquisition, Reduction, and Analysis at the Spallation Neutron Source” In 2014 IEEE 10th International Conference on e-Science 1, 2014, pp. 223–230 DOI: 10.1109/eScience.2014.31
  29. “Interactive automated Bragg peak identification with 3D neutron scattering data”, 2023
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets