Effect of training‑data contamination on LLM gaming performance
Ascertain the extent to which pretraining-data contamination—specifically, exposure to video game assets and solutions during pretraining—affects large language model performance on video‑game evaluation tasks, and determine whether high scores reflect memorization of contaminated content rather than genuine perception, reasoning, and planning.
Sponsor
References
It's also unclear the effect of data contamination on gaming performance since the models might have seen numerous gaming assets during pre-training.
— lmgame-Bench: How Good are LLMs at Playing Games?
(2505.15146 - Hu et al., 21 May 2025) in Section 1 (Introduction)