KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation (2505.14552v2)
Abstract: Recent advancements in LLMs underscore the need for more comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing benchmarks are often domain-specific and thus cannot fully capture an LLM's general reasoning potential. To address this limitation, we introduce the Knowledge Orthogonal Reasoning Gymnasium (KORGym), a dynamic evaluation platform inspired by KOR-Bench and Gymnasium. KORGym offers over fifty games in either textual or visual formats and supports interactive, multi-turn assessments with reinforcement learning scenarios. Using KORGym, we conduct extensive experiments on 19 LLMs and 8 VLMs, revealing consistent reasoning patterns within model families and demonstrating the superior performance of closed-source models. Further analysis examines the effects of modality, reasoning strategies, reinforcement learning techniques, and response length on model performance. We expect KORGym to become a valuable resource for advancing LLM reasoning research and developing evaluation methodologies suited to complex, interactive environments.
- Jiajun Shi (9 papers)
- Jian Yang (505 papers)
- Jiaheng Liu (100 papers)
- Xingyuan Bu (24 papers)
- Jiangjie Chen (46 papers)
- Junting Zhou (11 papers)
- Kaijing Ma (12 papers)
- Zhoufutu Wen (10 papers)
- Bingli Wang (6 papers)
- Yancheng He (30 papers)
- Liang Song (60 papers)
- Hualei Zhu (3 papers)
- Shilong Li (25 papers)
- Xingjian Wang (13 papers)
- Wei Zhang (1489 papers)
- Ruibin Yuan (43 papers)
- Yifan Yao (11 papers)
- Wenjun Yang (5 papers)
- Yunli Wang (13 papers)
- Siyuan Fang (6 papers)