Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cited But Not Archived: Analyzing the Status of Code References in Scholarly Articles (2401.04887v1)

Published 10 Jan 2024 in cs.DL

Abstract: One in five arXiv articles published in 2021 contained a URI to a Git Hosting Platform (GHP), which demonstrates the growing prevalence of GHP URIs in scholarly publications. However, GHP URIs are vulnerable to the same reference rot that plagues the Web at large. The disappearance of software hosting platforms, like Gitorious and Google Code, and the source code they contain threatens research reproducibility. Archiving the source code and development history available in GHPs enables the long-term reproducibility of research. Software Heritage and Web archives contain archives of GHP URI resources. However, are the GHP URIs referenced by scholarly publications contained within the Software Heritage and Web archive collections? We analyzed a dataset of GHP URIs extracted from scholarly publications to determine (1) is the URI still publicly available on the live Web?, (2) has the URI been archived by Software Heritage?, and (3) has the URI been archived by Web archives? Of all GHP URIs, we found that 93.98% were still publicly available on the live Web, 68.39% had been archived by Software Heritage, and 81.43% had been archived by Web archives.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Emily Escamilla (6 papers)
  2. Martin Klein (34 papers)
  3. Talya Cooper (2 papers)
  4. Vicky Rampin (2 papers)
  5. Michele C. Weigle (55 papers)
  6. Michael L. Nelson (92 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com