Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mandated data archiving greatly improves access to research data (1301.3744v1)

Published 16 Jan 2013 in cs.DL, physics.soc-ph, and q-bio.QM

Abstract: The data underlying scientific papers should be accessible to researchers both now and in the future, but how best can we ensure that these data are available? Here we examine the effectiveness of four approaches to data archiving: no stated archiving policy, recommending (but not requiring) archiving, and two versions of mandating data deposition at acceptance. We control for differences between data types by trying to obtain data from papers that use a single, widespread population genetic analysis, STRUCTURE. At one extreme, we found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript improve the odds of finding the data online almost a thousand-fold compared to having no policy. However, archiving rates at journals with less stringent policies were only very slightly higher than those with no policy at all. At one extreme, we found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript improve the odds of finding the data online almost a thousand fold compared to having no policy. However, archiving rates at journals with less stringent policies were only very slightly higher than those with no policy at all. We also assessed the effectiveness of asking for data directly from authors and obtained over half of the requested datasets, albeit with about 8 days delay and some disagreement with authors. Given the long term benefits of data accessibility to the academic community, we believe that journal based mandatory data archiving policies and mandatory data availability statements should be more widely adopted.

Citations (160)

Summary

  • The paper demonstrates that journals with mandatory archiving and explicit data statements improve online data availability by up to 974-fold.
  • It compares four policy types, revealing that recommendatory measures offer only a modest 3.6-fold improvement with significant statistical overlap.
  • The study advocates comprehensive archiving mandates to enhance reproducibility and reduce delays and friction from direct data requests.

Mandatory Data Archiving Policies and Research Data Accessibility

This paper elucidates the efficacy of mandatory journal-based data archiving policies in enhancing access to research data within the scientific community. By examining data accessibility across journals with varying archiving requirements, the authors provide a data-driven assessment applicable to scientific policy decision-making.

The paper categorizes journal policies into four distinct groups: no archiving policy, recommending data archiving without mandate, mandatory archiving without a data statement requirement, and mandatory archiving with a data statement requirement. The primary analytical focus rests on the availability of datasets from papers utilizing the population genetic program STRUCTURE, which simplifies assessment through standardized data types such as microsatellite and SNP genotypes.

The research unveiled noteworthy findings, highlighting the stark contrast in data accessibility outcomes across policy types. The paper reports that journals enforcing mandatory archiving paired with data accessibility statements exhibit a dramatic increase—974-fold—in data availability online compared to journals devoid of such policies. This finding underscores the superior effectiveness of comprehensive archiving mandates over recommendatory approaches, which showed marginal improvements—3.6-fold increase—in data availability, with overlapping confidence intervals undermining statistical significance.

The logistical aggregation of authors' datasets presented a complementary insight, revealing that direct requests resulted in a 59% data retrieval rate, although characterized by latency averaging 7.7 days. This approach occasionally led to friction with authors, highlighting challenges in relying on author-based data sharing.

The implications of this research extend into both practical and theoretical realms. Practically, the paper advocates the broader implementation of mandatory data archiving policies to bolster data reproducibility and transparency in scientific endeavors. The success of mandatory policies, particularly those requiring explicit data statements, could drive development and standardization across peer-reviewed journals. The authors suggest introspective evaluations by journals to align their policies with best practices to mitigate data accessibility issues persisting due to nondisclosure or author disagreements.

Theoretically, this research deepens the understanding of policy-induced behavior changes within scientific publishing structures and supports further exploration into the socio-political dynamics of open data sharing in academia. Notably, the paper recognizes potential impediments such as financial considerations; however, it underscores the cost-effectiveness of investing in archival infrastructure, considering the long-term benefits in facilitating meta-analyses and preventing redundant data collection efforts.

This paper contributes substantial evidence supporting strict archiving mandates in journal policy frameworks, further encouraging stakeholders to prioritize data availability for advancing scientific research integrity and reproducibility.