Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Formal Definition and Implementation of Reproducibility Tenets for Computational Workflows (2406.01146v1)

Published 3 Jun 2024 in cs.DC

Abstract: Computational workflow management systems power contemporary data-intensive sciences. The slowly resolving reproducibility crisis presents both a sobering warning and an opportunity to iterate on what science and data processing entails. The Square Kilometre Array (SKA), the world's largest radio telescope, is among the most extensive scientific projects underway and presents grand scientific collaboration and data-processing challenges. This work presents a scale and system-agnostic computational workflow model and extends five well-known reproducibility tenets into seven defined for our workflow model. Subsequent implementation of these definitions, powered by blockchain primitives, into the Data Activated Flow Graph Engine (DALiuGE), a workflow management system for the SKA, demonstrates the possibility of facilitating automatic formal verification of scientific quality in amortized constant time. We validate our approach with a simple yet representative astronomical processing task; filtering a noisy signal with a lowpass filter with both CPU and GPU methods. Our framework illuminates otherwise obscure scientific discrepancies and similarities between principally identical workflow executions.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets