Detecting and preventing data contamination in LLM evaluation
Develop reliable, scalable techniques to detect and prevent data contamination—defined as any overlap between training data and test data that causes benchmark results to overestimate generalization performance—in the pretraining and evaluation pipelines of large language models.
References
However, detecting and preventing data contamination is currently an open problem~\citep{gunasekar2023textbooks,yang2023rethinking, golchin2023time}.
                — Training on the Test Task Confounds Evaluation and Emergence
                
                (2407.07890 - Dominguez-Olmedo et al., 10 Jul 2024) in Section 7 (Related work), Data contamination paragraph