Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Recording provenance of workflow runs with RO-Crate (2312.07852v2)

Published 13 Dec 2023 in cs.DL

Abstract: Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain. A corresponding RO-Crate for this article is at https://w3id.org/ro/doi/10.5281/zenodo.10368989

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. W3C OWL Working Group. OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommendation 11 December 2012 [cited 2023 Dec 7]. http://www.w3.org/TR/2012/REC-owl2-overview-20121211/
  2. Samuel S, König-Ries B. End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach. Journal of Biomedical Semantics 2022;13:1. doi: 10.1186/s13326-021-00253-1
  3. Isaac A, Summers E. SKOS Simple Knowledge Organization System Primer. W3C Working Group Note 18 August 2009 [cited 2023 Dec 11]. https://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
  4. Workflow Run RO-Crate working group. Process Run Crate specification. Version 0.4. Zenodo, 2023. doi: 10.5281/zenodo.10203944
  5. Workflow Run RO-Crate working group. Workflow Run Crate specification. Version 0.4. Zenodo, 2023. doi: 10.5281/zenodo.10203971
  6. Köster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480
  7. The Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Research 2022;50(W1):W345–W351. doi: 10.1093/nar/gkac247
  8. Workflow Run RO-Crate working group. Provenance Run Crate specification. Version 0.4. Zenodo, 2023. doi: 10.5281/zenodo.10203978
  9. De Geest P. Run of an example Galaxy collection workflow. Zenodo, 2023. doi: 10.5281/zenodo.7785861
  10. Fernández González JM. RO-Crate from staged WfExS working directory 047b6dfc-3547-4e09-92f8-df7143038ff4 (overbridging templon). Zenodo, 2023. doi: 10.5281/zenodo.10091550
  11. Bahra A. Managing work flows with ecFlow. ECMWF Newsletter, 2011;129:30–32. doi: 10.21957/nr843dob
  12. Kinoshita BP. RO-Crate created using Autosubmit version 4.0.100 workflow running kinow/auto-mhm-test-domains. Zenodo, 2023. doi: 10.5281/zenodo.8144612
  13. Leo S. Run of digital pathology tissue/tumor prediction workflow. Zenodo, 2023. doi: 10.5281/zenodo.7774351
  14. The Galaxy Community. Galaxy. Version 23.1 Software Heritage Archive, 2023. https://identifiers.org/swh:1:rel:33ce0ce4f6e3d77d5c0af8cff24b2f68ba8d57e9
  15. Colonnelli I. StreamFlow run of digital pathology tissue/tumor prediction workflow. Zenodo, 2023. doi: 10.5281/zenodo.7911906
  16. de Wit R. A Non-Intimidating Approach to Workflow Reproducibility in Bioinformatics: Adding Metadata to Research Objects through the Design and Evaluation of Use-Focused Extensions to CWLProv. Zenodo, 2022. doi: 10.5281/zenodo.7113250
  17. Soiland-Reyes S. Describing and packaging workflows using RO-Crate and BioCompute Objects. Zenodo, 2021. doi: 10.5281/zenodo.4633732
  18. Leo S. Workflow Run RO-Crate Introduction. Galaxy Training Materials, 2023 [cited 2023 Dec 11]. https://gxy.io/GTN:T00343
Citations (6)

Summary

We haven't generated a summary for this paper yet.