Journal Data Sharing Policies
- Journal data-sharing policies are formal guidelines set by publishers to ensure data transparency, reproducibility, and enhanced research impact.
- They use various typologies, from no policy to strict mandates with DAS and repository deposition, which significantly boost compliance and citation rates.
- Effective implementation integrates FAIR principles, deidentification standards, and code sharing while addressing challenges like enforcement, data decay, and resource constraints.
Journal data-sharing policies are formalized requirements or guidelines issued by scholarly publishers and academic journals to regulate the availability, access, and preservation of datasets underlying published research. Such policies aim to ensure reproducibility, facilitate secondary analysis, increase transparency, and enhance scientific impact. The development, enforcement, and effectiveness of these policies vary considerably across disciplines, journals, and regions. The sections below survey core policy typologies, compliance statistics, evidence on scholarly impact, implementation workflows, open science integration, and policy challenges, drawing upon quantitative and qualitative analyses of major journals such as PLOS, BMC, and leading domain-specific publications.
1. Typologies and Evolution of Data-Sharing Policies
Journals employ a spectrum of data-sharing policies, ranging from no formal requirements to stringent mandates coupled with compliance checks. Canonical categories, exemplified in ecology, phylogenetics, and biomedical publishing, include:
| Policy Category | Description | Representative Example |
|---|---|---|
| No policy | No mention of data sharing or archiving | Early BMC Evolutionary Biology |
| Available on request | Authors must share data when contacted by reasonable request | Legacy policy in several biology journals |
| Data availability statement (DAS) | Manuscript must state where data are available (repository/DOI) | PLOS and BMC post-2014/2015 |
| Mandated repository deposit | Data must be archived in a domain repository before publication | JDAP consortium journals |
| Peer-reviewed datasets | Standalone datasets receive formal peer review | Nature Scientific Data |
The transition from “available upon request” to mandated repository deposition has accelerated following high-profile reproducibility crises, funder mandates (e.g., NIH GREI, White House OSTP), and advocacy for open science (Sholler et al., 2018, Magee et al., 2014, Colavizza et al., 2019, Horton et al., 28 May 2024).
2. Implementation, Enforcement, and Compliance Statistics
Mandated data-sharing policies, especially those coupled with explicit Data Availability Statements (DAS) and repository requirements, yield sharply higher compliance rates. For example, after PLOS mandated DAS in 2014, the proportion of articles with any DAS rose to 93.7% in 2018; BMC achieved 88.2% following its 2015 mandate (Colavizza et al., 2019). However, only 20.8% (PLOS) and 12.2% (BMC) of 2017–2018 articles included explicit repository links (Category 3 in DAS classification).
Statistical analyses controlling for confounds demonstrate that:
- Mere recommendations are minimally effective: journals that only suggest archiving have data availability rates near those with no policy (11% vs. 0%) (Vines et al., 2013).
- Mandates without requiring a DAS achieve ~88% data availability; mandates with both repository deposition and mandatory DAS approach 100% (Vines et al., 2013).
- Bayesian and OLS regression models quantify these effects, e.g., JDAP membership multiplies the odds of both alignment and tree data deposit by 8.58x (95% CI: [1.87, 54.2]) versus a weak-policy baseline (Magee et al., 2014).
Enforcement is typically distributed across authors, reviewers, editors, and production staff, but often responsibility is diffuse and actual content review of datasets is rare (Sholler et al., 2018).
3. Impact Assessment: Reproducibility, Citations, and Data Longevity
Empirically, robust journal data-sharing policies yield multifaceted benefits:
- A 25.36% (±1.07%) increase in three-year citation counts is associated with articles citing a public repository in their DAS, compared to those providing no data (Colavizza et al., 2019). Smaller positive effects accrue for “available on request” (+8.5%) and “in paper/SI” (+5.9%) data sharing.
- Data retention is fundamentally at risk without repository archiving: odds of dataset extancy decrease by 17% per year since publication, due to author unavailability, obsolete storage, and data loss (logistic regression, OR = 0.83 per year, 95% CI [0.79, 0.90]) (Vines et al., 2013). Repository mandates counteract this exponential decay.
- Open repository-based data sharing is essential for reproducibility, meta-analysis, and rapid integration into aggregators and community archives (Chen et al., 2021).
A summary of citation effects is given below:
| DAS Category | Mean Citation Increase | Confidence Interval |
|---|---|---|
| Repository link (Cat. 3) | +25.36% | ±1.07% |
| On request (Cat. 1) | +8.5% | ±2.4% |
| In paper/SI (Cat. 2) | +5.9% | ±1.9% |
| None (Cat. 0) | 0 | Baseline |
4. Core Policy Elements and Best Practices
Policy frameworks converging in major journals, as observed in PLOS, JSDSE, and astronomical publications, share key features:
- “Minimal dataset” requirement: Sufficient data and metadata to replicate all analyses, tables, and figures (Horton et al., 28 May 2024).
- Deidentification standards: Remove direct/indirect identifiers in human data; provide aggregation/synthetic data as appropriate.
- Full code sharing and versioning: Scripts must be posted; version control and environment documentation are standard (Horton et al., 28 May 2024).
- Repository and file format standards: Authors must use FAIR-compliant archives issuing DOIs (e.g., OSF, Zenodo, Dryad). Non-proprietary data formats (CSV, TSV) are enforced; rich metadata (README, data dictionary) are mandatory (Chen et al., 2021).
- Persistent identifiers: Datasets, code, and software require DOI-level citation in manuscripts and DAS (Chen et al., 2021, Colavizza et al., 2019).
- Access control and waivers: Sensitive data may allow controlled access with clear IRB processes (Horton et al., 28 May 2024).
- Machine-readable DAS: Structured, standardized statements facilitate large-scale compliance auditing (Colavizza et al., 2019).
- Peer-review integration: Referees or technical editors are instructed to check for data presence, link validity, and consistency; however, deep curation remains rare (Sholler et al., 2018).
Sample DAS language and checklists are published, e.g., “The deidentified data and analysis code are available at OSF: <URL>” (Horton et al., 28 May 2024).
5. Enforcement, Review Workflows, and Incentivization
Enforcement is typically implemented through a multistage workflow:
- Submission: Author uploads data, code, and DAS. Checklists confirm deposition and metadata (Chen et al., 2021).
- Peer review: Reviewers may be prompted to verify data links and basic metadata, but substantive assessment is rare. Dedicated “data reviewers” are recommended, but seldom required (Sholler et al., 2018).
- Acceptance: Editors or production staff verify presence of DOIs or repository links; papers without compliant data are not published (Vines et al., 2013, Sholler et al., 2018).
- Post-publication: Policy evolution includes periodic audits and reporting on compliance to sustain accountability (Sholler et al., 2018).
Incentive mechanisms include “Open Data Badges,” citation of datasets as independent scholarly outputs, and explicit impact metrics for data reuse (Sholler et al., 2018, Magee et al., 2014).
6. Open Science Integration and Interoperability
Journal data-sharing policies increasingly align with the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) and the broader open science movement:
- Mandates by funders (e.g., NIH GREI, European Union) reinforce journal requirements (Horton et al., 28 May 2024).
- Policies increasingly specify preferred repositories, metadata schemas, and technical interoperability standards (e.g., FITS, VOTable headers in astronomy) (Chen et al., 2021).
- Cross-references, persistent identifiers, and harmonized code/data packaging facilitate discoverability and secondary analysis, supporting reproducibility at scale.
Prominent journals periodically update guidance to remain congruent with federal and international mandates and domain-specific infrastructural advances (Horton et al., 28 May 2024, Chen et al., 2021).
7. Persistent Challenges and Controversies
Despite well-designed policies, several unresolved issues persist:
- Incomplete compliance: Even under strong mandates, only a minority of articles include public repository links (~12–21% in leading journals as of 2018) (Colavizza et al., 2019).
- Quality assurance: Quality and completeness of archived datasets are frequently unverified; repositories seldom curate content beyond metadata/formats, and journals rarely require deep peer review of datasets (Sholler et al., 2018, Magee et al., 2014).
- Role ambiguity: Editors, reviewers, and repository staff often lack consensus or formal training regarding responsibility for data quality checks (Sholler et al., 2018).
- Data decay: Without repository mandates, long-term access is undermined by author attenuation (7%/yr loss of contactability) and technological obsolescence (17%/yr odds of data loss) (Vines et al., 2013).
- Sensitive data: Balancing openness and privacy, particularly for human subjects, requires nuanced waiver and access-control regimes with clear documentation (Horton et al., 28 May 2024).
- Over-anonymization: Excessive deidentification can hinder reuse by stripping datasets of necessary detail (Horton et al., 28 May 2024).
- Resource constraints: Volunteer peer review is limited; comprehensive technical checks are costly (Sholler et al., 2018, Vines et al., 2013).
A plausible implication is that next-generation journal policies will further formalize machine-checkable standards, data-certificate programs, and deeper integration of community curation protocols to bridge the gap between formal compliance and substantive data utility.
Key References
- "The citation advantage of linking publications to research data" (Colavizza et al., 2019)
- "Guidelines and Best Practices to Share Deidentified Data and Code" (Horton et al., 28 May 2024)
- "Enforcing public data archiving policies in academic publishing: A study of ecology journals" (Sholler et al., 2018)
- "The availability of research data declines rapidly with article age" (Vines et al., 2013)
- "The Dawn of Open Access to Phylogenetic Data" (Magee et al., 2014)
- "Mandated data archiving greatly improves access to research data" (Vines et al., 2013)
- "Best Practices for Data Publication in the Astronomical Literature" (Chen et al., 2021)