Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The availability of research data declines rapidly with article age (1312.5670v1)

Published 19 Dec 2013 in cs.DL, physics.soc-ph, and q-bio.PE

Abstract: Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2-4], and journal [5,6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8-11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested datasets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a dataset being extant fell by 17% per year. In addition, the odds that we could find a working email address for the first, last or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.

Citations (410)

Summary

  • The paper finds that data availability decreases by 17% per year, directly impacting the reproducibility of research.
  • It employs a systematic analysis of 516 articles using Discriminant Function Analysis on biological morphological data to track data trends.
  • It identifies a 7% annual decline in working author emails and issues with obsolete data storage, underscoring the need for robust archival policies.

Decline in Research Data Availability with Article Aging

The examined paper investigates the temporal dynamics of data availability for 516 research articles, spanning a publication period from 2 to 22 years. The research highlights a critical concern in scientific discourse: the availability of research data declines significantly with article age. This paper leverages a methodical approach, focusing on acquiring datasets from articles that utilized Discriminant Function Analysis (DFA) on morphological data from biological specimens.

Key Findings

The paper presents several noteworthy findings regarding data availability as articles age:

  • Data Availability: The probability of a dataset being extant, either reported as shared or existing but not shared, decreases by 17% per year. This decline underscores the challenges associated with long-term data retention and stewardship by individual researchers.
  • Contact Information Decay: There is a 7% annual decline in the likelihood of finding a working email address for the corresponding, first, or last authors, highlighting a decay in author contactability over time.
  • Data Sharing Obstacles: Non-functional email addresses and outdated data storage media were prominent barriers to data sharing. While response rates from authors were equally likely across the age spectrum, the condition of data being lost or stored on obsolete media increased with article age.

Implications

The findings of this paper have substantial theoretical and practical implications, particularly in the fields of biodiversity research and ecological studies where historical data can have unique and non-replaceable value. The observed attrition of data availability emphasizes the limitations of current practices in relying on authors for long-term data preservation. This paper bolsters the argument for mandatory data archiving in public repositories at the time of publication, a strategy that would ensure the accessibility and sustainability of valuable datasets into the future.

Future Directions

In addressing the decline in data availability, future efforts should focus on strengthening policies mandating data archiving at the point of publication. Initiatives such as ORCID and the advent of platforms like ResearchGate offer potential pathways to improve the stability of researcher contact information, which would mitigate some of these identified challenges. Additionally, this research suggests an exploration into more robust archival standards across disciplines to accommodate varying data types and their potential for preservation and reuse.

Conclusion

This paper sheds light on the tangible decline in data availability as research articles age, urging a reevaluation of data stewardship policies in academia. By adopting comprehensive data archiving mandates, the scientific community can vastly improve the sustainable accessibility of research data, thereby enhancing reproducibility and facilitating continued innovation and discovery across multiple domains.