Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems (2308.00891v1)

Published 2 Aug 2023 in cs.DC

Abstract: Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative scientific workflows in collaboration with the domain scientists to identify concrete provenance needs. Based on the first-hand analysis, we propose a provenance framework called PROV-IO+, which includes an I/O-centric provenance model for describing scientific data and the associated I/O operations and environments precisely. Moreover, we build a prototype of PROV-IO+ to enable end-to-end provenance support on real HPC systems with little manual effort. The PROV-IO+ framework can support both containerized and non-containerized workflows on different HPC platforms with flexibility in selecting various classes of provenance. Our experiments with realistic workflows show that PROV-IO+ can address the provenance needs of the domain scientists effectively with reasonable performance (e.g., less than 3.5% tracking overhead for most experiments). Moreover, PROV-IO+ outperforms a state-of-the-art system (i.e., ProvLake) in our experiments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Runzhou Han (4 papers)
  2. Mai Zheng (8 papers)
  3. Suren Byna (15 papers)
  4. Houjun Tang (3 papers)
  5. Bin Dong (111 papers)
  6. Dong Dai (17 papers)
  7. Yong Chen (299 papers)
  8. Dongkyun Kim (5 papers)
  9. Joseph Hassoun (7 papers)
  10. David Thorsley (4 papers)
  11. Matthew Wolf (12 papers)
Citations (2)