Effective mitigation of benchmark data contamination

Develop effective mitigation strategies for data contamination in Large Language Model pre-training and evaluation that prevent inflated performance metrics and preserve evaluation integrity.

Background

The survey reviews extensive evidence that overlap between evaluation datasets and pre-training corpora can significantly inflate reported capabilities, especially for larger models and complex reasoning tasks. It catalogues recent detection methods, from embedding-space divergence measures to targeted probing protocols, and documents various forms of contamination (verbatim, approximate, and noisy leakage).

Despite progress in detection, the authors state that robust mitigation remains unsettled, especially post-RLHF where likelihood signals are altered. They therefore flag effective contamination mitigation as a major unresolved challenge for reliable model assessment and privacy.

References

In response to these severe impacts, researchers have developed various methods for detection, though effective mitigation remains a significant open problem.

Beyond the Black Box: Theory and Mechanism of Large Language Models  (2601.02907 - Gan et al., 6 Jan 2026) in Subsubsection Data Contamination, Section 2: Data Preparation Stage (Advanced Topics and Open Questions)