Empirical validation of PACIFIC’s contamination resistance
Establish, through systematic empirical evaluation across diverse benchmark configurations and usage scenarios, whether the contamination resistance property of the PACIFIC benchmark-generation framework holds and to what extent it effectively mitigates training data contamination (e.g., via seed-based resampling and representation diversity).
References
Contamination resistance, a core design objective of PACIFIC, has not yet been empirically validated across all scenarios. While the framework’s construction is intended to minimize contamination risk, confirming this property through systematic evaluation remains an important direction for future work.
— PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
(2512.10713 - Dreyfuss et al., 11 Dec 2025) in Section 7, Threats to validity