Develop comprehensive methods for tabular contamination detection beyond row-level deduplication
Develop comprehensive contamination detection and decontamination methodologies for tabular datasets used in training and evaluating foundation models that go beyond row-level deduplication, accounting for column-name variations, multi-source duplication, and task-level leakage that can make evaluation tasks solvable via memorized associations rather than tabular reasoning.
References
Contamination detection for tabular data lacks established best practices and row-level deduplication is insufficient (as we demonstrated), but comprehensive alternatives remain an open problem.
— The Illusion of Generalization: Re-examining Tabular Language Model Evaluation
(2602.04031 - Gorla et al., 3 Feb 2026) in Section: Limitations