Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WorkflowHub: a registry for computational workflows (2410.06941v1)

Published 9 Oct 2024 in cs.DL and cs.SE

Abstract: The rising popularity of computational workflows is driven by the need for repetitive and scalable data processing, sharing of processing know-how, and transparent methods. As both combined records of analysis and descriptions of processing steps, workflows should be reproducible, reusable, adaptable, and available. Workflow sharing presents opportunities to reduce unnecessary reinvention, promote reuse, increase access to best practice analyses for non-experts, and increase productivity. In reality, workflows are scattered and difficult to find, in part due to the diversity of available workflow engines and ecosystems, and because workflow sharing is not yet part of research practice. WorkflowHub provides a unified registry for all computational workflows that links to community repositories, and supports both the workflow lifecycle and making workflows findable, accessible, interoperable, and reusable (FAIR). By interoperating with diverse platforms, services, and external registries, WorkflowHub adds value by supporting workflow sharing, explicitly assigning credit, enhancing FAIRness, and promoting workflows as scholarly artefacts. The registry has a global reach, with hundreds of research organisations involved, and more than 700 workflows registered.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Ove Johan Ragnar Gustafsson (2 papers)
  2. Sean R. Wilkinson (11 papers)
  3. Finn Bacall (2 papers)
  4. Luca Pireddu (4 papers)
  5. Stian Soiland-Reyes (19 papers)
  6. Simone Leo (4 papers)
  7. Stuart Owen (3 papers)
  8. Nick Juty (2 papers)
  9. Björn Grüning (4 papers)
  10. Tom Brown (74 papers)
  11. Hervé Ménager (3 papers)
  12. Salvador Capella-Gutierrez (2 papers)
  13. Frederik Coppens (6 papers)
  14. Carole Goble (24 papers)
  15. José M. Fernández (7 papers)
Citations (1)

Summary

  • The paper introduces WorkflowHub as a novel platform that registers, shares, and manages computational workflows based on FAIR principles.
  • It details the integration of Git repositories, RO-Crate metadata, and GA4GH TRS API to ensure interoperability and continuous workflow updates.
  • The paper demonstrates significant impact, indexing over 760 workflows with contributions from 840 users across 35 countries by 2024.

WorkflowHub: A Registry for Computational Workflows

The paper "WorkflowHub: a registry for computational workflows" presents WorkflowHub, a dedicated platform designed to facilitate the sharing and management of computational workflows across a wide array of scientific disciplines. Authored by a consortium of researchers from various institutions, the paper outlines the infrastructure, design, and implementation of WorkflowHub, which aims to support the life cycle of scientific workflows by promoting findability, accessibility, reusability, and interoperability, in accordance with the FAIR principles.

Motivation and Objectives

The advent of Big Data and the increasing reliance on computational workflows for data processing and analysis has underscored the necessity for robust, scalable, and reproducible methods in scientific research. Existing sharing mechanisms, while numerous, often lack standardization and interoperability, impeding the effective dissemination and reuse of workflows. WorkflowHub addresses these challenges by providing a central registry that leverages widely recognized standards to enable the sharing and discovery of workflows while assigning credit to their developers and contributors.

Features and Capabilities

WorkflowHub is engineered to be platform-agnostic, supporting workflows regardless of their scientific domain, language, or development environment. It integrates with various services and platforms across the workflow ecosystem, fostering an environment of seamless sharing and collaboration. Key features of WorkflowHub include:

  • Integration with Git and Other Repositories: Automation in registration and updating of workflows through integration with Git systems ensures that workflows remain in their native development environments, facilitating continuous development without disruption.
  • FAIR Metadata and RO-Crate Standards: Utilization of Bioschemas, FAIRDOM-SEEK metadata, and RO-Crate standards provides a structured and rich metadata framework that enhances findability and interoperability.
  • Community Engagement and Support: Designed to support both large consortia and individual developers, WorkflowHub encourages community involvement through spaces and teams that reflect real-world collaborations and credit assignments.
  • Comprehensive Interoperability: Implementation of GA4GH TRS API allows integration with execution platforms such as Galaxy and Nextflow, enabling workflows to be discovered, retrieved, and executed directly from WorkflowHub.

Results and Impact

Since its launch in 2020, WorkflowHub has made significant strides in creating a FAIR-compliant ecosystem for workflows. By October 2024, it indexed over 760 workflows, with contributions from 840 registered users across 35 countries. Its extensive array of partnerships, including those with EOSC-Life, Australian BioCommons, and various domain-specific communities, underscores its role as a critical infrastructure for diverse scientific communities.

The registry facilitates crucial tasks such as assigning DOIs for workflows, citing them with integration into scholarly recognition processes, and maintaining their FAIR status. These capabilities not only enhance the scientific quality and reproducibility of research outputs but also foster an open science environment conducive to innovation and collaboration.

Future Perspectives

Looking forward, WorkflowHub envisions refining the support it offers to users, expanding its integration capabilities, and engaging with new communities. It aims to improve GUI elements, enhance metadata accuracy through automated tools, and increase compliance with FAIR4RS principles and other standards. Engaging with publishers to standardize the citation of workflows in scientific literature will further elevate the recognition of workflow developers.

Conclusion

WorkflowHub stands out as a versatile and inclusive platform designed to tackle the multifaceted challenges of workflow sharing and management in scientific research. By embracing FAIR principles and promoting community collaboration, it plays a pivotal role in advancing the reproducibility, interoperability, and reuse of computational workflows, thereby accelerating scientific progress across disciplines.

X Twitter Logo Streamline Icon: https://streamlinehq.com