IIR Data Reuse Practices
- The paper synthesizes mixed-method findings to propose a reproducibility framework for data reuse in interactive information retrieval.
- IIR data reuse practices are structured approaches that leverage detailed documentation and community standards to enhance comparability and ethical compliance.
- The study recommends strategies to overcome documentation gaps, fragmented repositories, and legal uncertainties for sustainable, open data sharing.
Interactive Information Retrieval (IIR) researchers’ data reuse practices are governed by the increasing imperative for reproducibility, comparability, and efficient use of resources in human-centered IR experiments. Data reuse in IIR is characterized by multifaceted evaluation criteria, ad-hoc yet community-mediated information-seeking strategies, domain-specific legal and ethical barriers, and ongoing efforts toward standards and infrastructural support. This article synthesizes findings from semi-structured interview studies, quantitative audits, and design-oriented investigations to elucidate the state of data reuse within IIR, provide a conceptual model for reusability evaluation, and outline the ingredients required to move the field toward a sustainable, open, and trustworthy sharing ecosystem (Jiang et al., 20 Dec 2025, Jiang et al., 23 Nov 2024, Rousi, 2021, Dempsey et al., 2022, Gregory et al., 2019, Feger, 2020, Koesten et al., 2019).
1. Data Reuse Motivations and Roles in IIR
Data reuse in IIR is driven by objectives that span exploration, ground-truthing, efficiency, and collaboration. Researchers cite the following primary motives:
- Reducing cost and time: Reusing existing behavioral or interaction datasets obviates duplicative user studies, streamlining research pipelines.
- Ensuring comparability and reliability: Benchmarks and repeated use of canonical datasets enhance validity and facilitate cross-study comparisons.
- Enabling research otherwise impossible: Experimental and user log data are often labor- or resource-intensive to collect; reuse provides access for teams lacking such capacity.
- Fostering collaboration: Contact with data creators often leads to collaboration, knowledge exchange, and community trust.
System-oriented IIR researchers, in particular, value data reuse for developing, testing, and benchmarking algorithms, while user-oriented researchers focus on expanding qualitative inquiry and facilitating pilot work that would not be viable otherwise (Jiang et al., 23 Nov 2024).
2. Criteria and Processes for Assessing Data Reusability
IIR researchers do not treat reusability as a binary attribute but rather as a context-specific judgment composed of several interlocking properties (Jiang et al., 20 Dec 2025).
Principal Criteria
- Context and Methods for Data Production: Detailed protocols describing how, why, and under what conditions data were collected are critical. Judgments of trustworthiness, bias, and alignment to study aims depend on epistemic transparency (“If I don’t know how it was collected and cleaned, I can’t trust the data.”).
- Data Documentation: Comprehensive README files, metadata schemas, and method appendices are expected. Documentation serves three functions:
- Epistemic transparency (data production)
- Pragmatic usability (variable formats, manipulation)
- Social trust signaling (professionalism, accountability)
- Creator and Community Provenance: Datasets from recognized labs, projects, or long-running benchmark initiatives (e.g., TREC) carry built-in reputational trust. For bespoke, lab-based behavioral datasets, trust is often tied to direct interpersonal familiarity.
- Legal and Ethical Constraints: Explicit documentation of consent, privacy, licensing, and permitted use is a gatekeeping step. Absent or ambiguous ethical documentation blocks downstream reuse (Jiang et al., 20 Dec 2025, Jiang et al., 23 Nov 2024, Rousi, 2021).
Reusability Judgment (Conceptual Model)
One proposed formalization defines reusability of a dataset as:
Where:
- : Clarity of context/methods metadata
- : Completeness/quality of documentation
- : Provenance and reputation
- : Ethical and legal compliance
- : Researcher-specific weights
Although not empirically estimated, this structure encapsulates reported decision logic and supports future operationalization (Jiang et al., 20 Dec 2025).
3. Information-Seeking and Discovery Behaviors
IIR researchers operate in a fragmented ecosystem lacking disciplinary-specific behavioral data repositories. As such, data discovery and context acquisition are predominantly ad-hoc, mediated by community conventions and personal networks (Gregory et al., 2019, Jiang et al., 20 Dec 2025, Jiang et al., 23 Nov 2024).
Key Information Sources
| Source | Typical Use and Frequency |
|---|---|
| Academic literature | Primary: methods/appendices provide context; citation chasing |
| Personal/professional network | Direct contact with creators, colleagues; validator of credibility |
| Community venues | Conference workshops, tracks (BIIRRR), poster sessions |
| Field-specific repositories | Trusted by system-oriented researchers for standardized data (TREC) |
| Lab/personal web pages | Common for non-human data (e.g., logs); rare for user studies |
Researchers often combine literature search with direct communication, reporting a two-step pattern: initial awareness via literature/discussions, followed by “hunting down” the known dataset through repositories or personal channels (Gregory et al., 2019, Jiang et al., 20 Dec 2025).
4. Barriers and Enablers of Effective Data Reuse
Barriers
- Uneven documentation: Missing, inconsistent, or nonstandard documentation impedes interpretation and reanalysis.
- Fragmented landscape: Absence of a unified repository for user-study and behavioral data results in scattered discovery and frequent context loss.
- Legal/ethical uncertainty: Privacy, consent, or licensing ambiguities—especially for human-subjects data—restrict or chill reuse.
- Social norms: There is reluctance to reanalyze “exhausted” datasets, concern over perceived lack of originality in reuse-driven research, and dependence on producer reputation instead of transparent metadata (Jiang et al., 23 Nov 2024, Rousi, 2021).
Enablers and Best Practices
- Standardized, complete documentation: Well-defined templates and metadata schemas facilitate immediate sensemaking and reduce the cognitive load of context recovery.
- Community vetting mechanisms: Citation counts, public commentary, and reuse tallies serve as emergent indicators of dataset reliability.
- Clear licensing and explicit consent: Accompanying legal and ethical statements pre-empt barrier formation.
- Infrastructural support modeled on established exemplars (e.g., TREC): Central repositories and persistent identifiers increase durability and discoverability (Jiang et al., 20 Dec 2025, Jiang et al., 23 Nov 2024, Dempsey et al., 2022).
5. Cultural and Organizational Practices
The landscape of IIR research is marked by divergent “data cultures.” Computational and life-sciences subfields exhibit mature data reuse and sharing norms, buttressed by established community repositories and strong traditions of open code. In contrast, empirical and experimental subfields—especially those dependent on bespoke human-interaction data—prioritize original data collection and show resistance or difficulty in open sharing due to ethical, legal, or institutional factors (Rousi, 2021, Jiang et al., 23 Nov 2024).
Adoption of continuous FAIR (Findable, Accessible, Interoperable, Reusable) practices embedded from project inception—rather than as an afterthought—has been advocated as both feasible and impactful, with tools such as Minids (lightweight identifiers), BDBags/RO-Crates (packaged data/metadata with checksums), and schema evolution toolkits supporting continuous provenance and integrity checks (Dempsey et al., 2022). This approach is complementary to but not yet common in IIR, which remains largely reliant on post hoc documentation and contextualization.
Community-building efforts (e.g., BIIRRR workshops) and emerging policies mandating data deposition, transparent metadata, and ethical clarity are recognized as essential to shifting norms toward a culture where high-quality data curation receives professional recognition (Jiang et al., 20 Dec 2025).
6. Practical Recommendations and Future Directions
Institutional and lab-level actions are oriented around the following practices (Jiang et al., 20 Dec 2025, Jiang et al., 23 Nov 2024, Rousi, 2021, Dempsey et al., 2022):
- Develop IIR-specific minimal metadata standards: Explicit context, consent, sampling frames, and versioning fields.
- Extend repositories to incorporate provenance and citation metrics: Integration of code, data, and documentation, with visible acknowledgment of dataset reuses.
- Mandate and incentivize data sharing: Conference and funder policies that require or reward open, well-documented data publication.
- Mentor and train researchers in data management and reuse evaluation: Embedding these skills in graduate education.
- Implement “buddy systems” and contact points: Ensuring data reusers have access to creators for context clarification.
- Automate tracking and notification for data and code dependencies: Reduce risk of context loss and assure up-to-date reusability (e.g., via RDM tools with dependency graphs and notification systems) (Feger, 2020).
A plausible implication is that sustained cultural change in IIR hinges on coordination among standards development, tool deployment, policy evolution, and everyday practices of professional recognition.
7. Conceptual and Analytical Models in Data Reuse
The sensemaking process for evaluating data reusability in IIR is best characterized as a multi-stage, interpretive workflow. Three functional clusters organize this activity (Koesten et al., 2019):
- Inspecting data: Assess basic structure, variable types, missingness, and scope.
- Engaging with content: Investigate encodings, perform initial analyses, flag data “strangeness.”
- Placing data in broader context: Connect dataset to experimental design, disciplinary standards, real-world representativeness, and provenance.
High-functioning documentation and data-delivery tools support these stages with layered documentation, provenance tracking, and mechanisms for collaborative sensemaking and annotation.
Formal models for reusability ( above), and frameworks such as the Stage-Based Model of RDM Commitment or URP (Ubiquitous Research Preservation) spectrum, provide analytical touchpoints for tool and workflow design, but, as yet, no standardized quantitative score is used in IIR practice (Jiang et al., 20 Dec 2025, Feger, 2020).
In summary, IIR researchers’ data reuse practices are complex, driven by epistemic, practical, and ethical considerations. The field is advancing toward more standardized, transparent, and sustainable data sharing and reuse, but faces persistent barriers in documentation quality, infrastructural fragmentation, and the nuanced demands of human-centric data. Emerging conceptual models and best-practice frameworks indicate a path forward toward a robust, FAIR-aligned data sharing culture in interactive information retrieval.