Behavior of IR Benchmarks Under Substantial Corpus Reorganization

Determine how information retrieval benchmarks behave when the underlying document corpus undergoes substantial reorganization over time, with a focus on understanding the effects of corpus drift due to additions, deletions, updates, or restructuring of documents.

Background

The paper discusses temporal drift in information retrieval test collections and notes that prior studies primarily address judgment variation or query drift rather than changes to the corpus itself. In dynamic domains like technical documentation, documents are frequently reorganized or migrated across repositories, creating a distinct form of drift that can affect benchmark validity.

The authors highlight a gap in existing research regarding how benchmarks respond when the underlying corpus undergoes substantial reorganization. This motivates their study of FreshStack across two temporal snapshots to assess whether queries remain grounded and whether model rankings are stable under such corpus changes.

References

What remains unclear is how benchmarks behave when the corpus undergoes substantial reorganization.

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks  (2603.04532 - Kuissi et al., 4 Mar 2026) in Section 2, Background and Related Work