Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Biomedical Open Source Software: Crucial Packages and Hidden Heroes (2404.06672v4)

Published 10 Apr 2024 in cs.SE and cs.CY

Abstract: Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon. In this work we used CZ Software Mentions Dataset to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Gephi: An open source software for exploring and manipulating networks. In International AAAI Conference on Weblogs and Social Media. AAAI, 2009. URL http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154.
  2. When and how to make breaking changes: Policies and practices in 18 open source software ecosystems. ACM Trans. Softw. Eng. Methodol., 30(4), jul 2021. ISSN 1049-331X. doi:10.1145/3447245. URL https://doi.org/10.1145/3447245.
  3. Guiding development work across a software ecosystem by visualizing usage data. arXiv e-prints, art. arXiv:2012.05987, December 2020. doi:10.48550/arXiv.2012.05987.
  4. Three perspectives on centrality. In Ryan Light and James Moody, editors, The Oxford Handbook of Social Networks, page 334–351. Oxford University Press, January 2021. ISBN 9780190251765. doi:10.1093/oxfordhb/9780190251765.013.22.
  5. Eva Maxfield Brown. A Dependency Graph for 460,000 Papers and Their Software Mentions from the CZI Software Mentions Dataset, October 2023. URL https://doi.org/10.5281/zenodo.10048132.
  6. Exploring the dependencies of the CZI mentions dataset, October 2023. URL https://github.com/borisveytsman/SoftwareImpactHackathon2023_Tracing_dependencies.
  7. Stephan Druskat. Software and dependencies in research citation graphs. Computing in Science & Engineering, 22(2):8–21, 2020. doi:10.1109/MCSE.2019.2952840.
  8. Citation File Format, August 2021. URL https://doi.org/10.5281/zenodo.5171937.
  9. Don’t mention it: An approach to assess challenges to using software mentions for citation and discoverability research. arXiv, 2024(arXiv:2402.14602), February 2024. doi:10.48550/arXiv.2402.14602.
  10. SoftCite dataset: A dataset of software mentions in biomedical and economic research publications. JASIST, 72(7):870–884, 2021. doi:10.1002/asi.24454. URL https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.24454.
  11. GEXF Working Group. GEXF file format, 2009. URL https://gexf.net/.
  12. Dan Goodin. What we know about the xz utils backdoor that almost infected the world. Ars Technica, March 2024. URL https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/.
  13. Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. JASIST, 67(9):2137–2155, 2016. doi:10.1002/asi.23538. URL https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.23538.
  14. Scientific software production: Incentives and collaboration. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, CSCW ’11, page 513–522, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450305563. doi:10.1145/1958824.1958904. URL https://doi.org/10.1145/1958824.1958904.
  15. Understanding the scientific software ecosystem and its impact: Current and future measures. Research Evaluation, 24(4):454–470, 07 2015. ISSN 0958-2029. doi:10.1093/reseval/rvv014. URL https://doi.org/10.1093/reseval/rvv014.
  16. A large dataset of software mentions in the biomedical literature. arXiv, 2022a. doi:10.48550/ARXIV.2209.00693. URL https://arxiv.org/abs/2209.00693.
  17. CZ Software Mentions: A large dataset of software mentions in the biomedical literature, September 2022b. URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.6wwpzgn2c.
  18. Daniel S. Katz. Transitive credit as a means to address social and technological concerns stemming from citation and attribution of digital products. Journal of Open Research Software, July 2014. doi:10.5334/jors.be.
  19. Implementing transitive credit with JSON-LD. arXiv, 2014. doi:10.48550/arXiv.1407.5117. URL https://arxiv.org/abs/1407.5117.
  20. We need to talk about the lack of investment in digital research infrastructure. Nature Computational Science, 1(3):169–171, Mar 2021. ISSN 2662-8457. doi:10.1038/s43588-021-00048-5. URL https://doi.org/10.1038/s43588-021-00048-5.
  21. Randall Patrick Munroe. Dependency, August 2020. URL https://xkcd.com/2347/.
  22. Andrew Nesbitt. Package and dependency metadata for CZI hackathon: Mapping the impact of research software in science. Zenodo, October 2023.
  23. OpenAlex: a fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv e-prints, art. arXiv:2205.01833, May 2022. doi:10.48550/arXiv.2205.01833.
  24. SoMeSci—a 5 star open data gold standard knowledge graph of software mentions in scientific articles. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, page 4574–4583, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384469. URL https://doi.org/10.1145/3459637.3482017.
  25. Dalmeet Singh Chawla. The unsung heroes of scientific software. Nature, 529(7584):115–116, Jan 2016. ISSN 1476-4687. doi:10.1038/529115a.
  26. Donald E. Stokes. Pasteur’s Quadrant: Basic Science and Technological Innovation. Brookings Institute Press, Washington, D. C., 1997. ISBN 978-0815781776.
  27. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5):821–829, May 2008. ISSN 1088-9051. doi:10.1101/gr.074492.107.
  28. Velvet [Software]. Software Heritage, 2014. URL https://archive.softwareheritage.org/swh:1:rev:9adf09f7ded7fedaf6b0e5e4edf9f46602e263d3.
Citations (1)

Summary

We haven't generated a summary for this paper yet.