Evaluation of VLMs on complete, complex, real-time video games
Evaluate the performance of vision-language models (VLMs) on complete, complex, real-time video games, establishing assessment protocols that account for real-time interaction constraints and full-game completion rather than simplified environments or short tasks.
References
Despite this progress, evaluating VLMs on complete, complex, real-time video games remains an open challenge.
— VideoGameBench: Can Vision-Language Models complete popular video games?
(2505.18134 - Zhang et al., 23 May 2025) in Section 2, Related Works