PRIMAD-LID: Enhancing Reproducibility
- PRIMAD-LID Extension is a framework that enhances computational reproducibility by augmenting the original six PRIMAD dimensions with Lifespan, Interpretation, and Depth modifiers.
- It systematically documents key factors including the execution platform, research objectives, implementation, methods, actors, and data with precise temporal and interpretative metadata.
- The framework supports robust cross-disciplinary reproducibility audits and targeted diagnostic practices by standardizing metadata recording and validation workflows.
The PRIMAD-LID Extension defines an integrated and discipline-diagnostic framework for computational reproducibility by systematically augmenting the original PRIMAD model’s six core dimensions—Platform, Research objective, Implementation, Methods, Actors, and Data—with three cross-cutting modifiers: Lifespan, Interpretation, and Depth. This nine-facet structure formalizes all factors required to achieve, evaluate, and document reproducibility in computational research, enabling unambiguous specification, targeted diagnosis, and robust cross-disciplinary application (Aloqalaa et al., 5 Jan 2026).
1. PRIMAD: The Six Core Dimensions
The foundational PRIMAD framework addresses longstanding terminology ambiguity by identifying six variables whose control or variation must be stated in any reproducibility attempt:
- P (Platform): The execution environment, covering hardware architecture, operating system, libraries, compilers, virtual machines, and containerization.
- R (Research objective): The specific scientific question or goal; e.g., tumor image classification.
- I (Implementation): Codebases, scripts, executables, or pipeline definitions operationalizing the method.
- M (Methods): Abstract algorithms or methodological protocols, e.g., “random forest with cross-validation,” not tied to code instantiation.
- A (Actors): The individuals or teams engaging with the experiment—developers, annotators, experimenters.
- D (Data): All input datasets, configuration parameters, and any data preprocessing transformations.
The PRIMAD formalism operationalizes reproducibility: for a given study, one specifies which components are held constant and which are varied—e.g., “holding P and I fixed, but varying M to perform method-agnostic validation.” This structure clarifies the definitions of repeatability, replicability, portability, and robustness in computational science.
2. The LID Modifiers: Lifespan, Interpretation, Depth
The PRIMAD-LID extension systematically augments each PRIMAD component with three modifiers:
2.1 Lifespan (L)
Lifespan qualifies each artifact temporally, recording creation date (), modification history (), last access (), and predicted end-of-life ():
This enables temporal auditability and supports long-term usability assessments; research artifacts become effectively “expired” unless Lifespan is actively managed.
2.2 Interpretation ()
Interpretation captures the reasoning, heuristic, or contextual logic that mediates between raw numerical outputs and scientific conclusions:
- may include statistical tests, visualization standards, or significance thresholds.
- documents rationale for algorithm selection or empirical parameter search strategies.
This metadata decouples the interpretive layer, making scientific insight itself subject to reproducibility scrutiny.
2.3 Depth ()
Depth denotes the required granularity of artifact description, parameterized by a field-specific attribute vector:
The number and type of required attributes is context-dependent; for example, bioinformatics pipelines standardly require ten distinct metadata fields, while ML experiments may operate with a distinct checklist. Depth formalization increases comparability and completeness while allowing adaptation to community standards (e.g., FAIR principles).
3. Unified Conceptual Structure
The PRIMAD-LID framework is structured as a 0 matrix: rows represent the LID modifiers (Lifespan, Interpretation, Depth); columns represent the PRIMAD dimensions (P, R, I, M, A, D). Each cell specifies the metadata and procedural requirements for the corresponding artifact-factor pair.
| PRIMAD\LID | Lifespan | Interpretation | Depth |
|---|---|---|---|
| Platform (P) | Timestamps, modification and version history | Reason for platform choice, scalability explanation | Container digests, base image versions, resource needs |
| Research (R) | Research start and update times | Hypothesis framing, statistical test rationale | Precise statements, documentation completeness |
| Implementation(I) | Code commit dates, build environments | Justification of coding choices, optimization explanation | Source version, dependencies, source/compiled mapping |
| Methods (M) | Protocol versioning, updates | Algorithm selection rationale, parameter tuning methods | Algorithmic parameters, workflow schemas |
| Actors (A) | Team membership, access/control records | Decision logs, annotation/contribution standards | Roles, background, contribution specifications |
| Data (D) | Acquisition, preprocessing history | Data cleaning choices, statistical thresholds | Checksums, schemas, provenance, licences |
All 18 (6×3) cells supply the composite foundation for computational reproducibility; Figure 1 in (Aloqalaa et al., 5 Jan 2026) visually represents these dependencies.
4. Application Scenarios
Example 1: High-throughput Sequencing (HTS) Pipeline
- Platform: Nextflow 20.10.0 on Ubuntu 18.04 container.
- Lifespan: 1
- Interpretation: Nextflow chosen for scalability
- Depth: Container digest, base image version, resource limits
- Data: Raw FASTQ (v1.2), reference genome build 38.
- Lifespan: Data acquisition and update timeline
- Interpretation: Justification for sequence trimming thresholds
- Depth: Checksums, sample schema
Controlling these facets enabled persistent 98 % workflow “wholeness” despite major platform updates and over long temporal horizons.
Example 2: Cross-platform IR Portability
- Platform: Lucene 8.5 on Ubuntu 20.04 vs. Windows 10
- Lifespan: OS and Java runtime patch tracking
- Interpretation: File-system differences as I/O confounders
- Depth: Java version, heap-size, path separator specifics
Systematic documentation across all nine factors localizes variation sources, separating genuine platform effects from methodological inconsistencies.
Example 3: Method-Independent Validation
Holding Research objective, Actors, and Data constant while swapping Methods (random forest 2 XGBoost), and documenting Lifespan, Depth, and Interpretation for both. Consistent results under controlled method variance establish reproducibility at the Interpretation layer, supporting methodological robustness.
5. Formal PRIMAD-LID Reproducibility Predicate
PRIMAD-LID recasts reproducibility as a formal predicate:
3
or more explicitly,
4
where 5 is an assessment of consistency, transparency, and coverage given the fixed and varied dimensions of a specific reproducibility study.
6. Guidelines and Best Practices
Authors and reviewers are advised to:
- Version all artifacts: Employ commit hashes, container tags, dataset DOIs, or checksums.
- Explicitly record Lifespan metadata for all components.
- Publish Interpretation: Document decision rationales, hypothesis tests, and analysis conventions.
- Parameterize Depth: Adopt domain-relevant schemas and specify resource profiles, provenance chains, licences.
- Use open, persistent repositories (e.g., Zenodo, Figshare) for all artifacts.
- Modularize components: Decouple data ingress, analysis, and reporting to increase maintainability.
- Apply environment control (e.g., CI/CD pipelines) for proactive reproducibility assurance.
- Vary dimensions empirically to test reproducibility coverage.
- Provide an explicit mapping of which of the nine facets are held constant or varied in every replication attempt.
Consistent application of these best practices operationalizes PRIMAD-LID as both a planning mechanism and an audit checklist, supporting discipline-diagnostic reproducibility coverage and transparent evaluation for computational studies (Aloqalaa et al., 5 Jan 2026).