Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation (2505.14552v2)

Published 20 May 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Recent advancements in LLMs underscore the need for more comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing benchmarks are often domain-specific and thus cannot fully capture an LLM's general reasoning potential. To address this limitation, we introduce the Knowledge Orthogonal Reasoning Gymnasium (KORGym), a dynamic evaluation platform inspired by KOR-Bench and Gymnasium. KORGym offers over fifty games in either textual or visual formats and supports interactive, multi-turn assessments with reinforcement learning scenarios. Using KORGym, we conduct extensive experiments on 19 LLMs and 8 VLMs, revealing consistent reasoning patterns within model families and demonstrating the superior performance of closed-source models. Further analysis examines the effects of modality, reasoning strategies, reinforcement learning techniques, and response length on model performance. We expect KORGym to become a valuable resource for advancing LLM reasoning research and developing evaluation methodologies suited to complex, interactive environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (29)
  1. Jiajun Shi (9 papers)
  2. Jian Yang (505 papers)
  3. Jiaheng Liu (100 papers)
  4. Xingyuan Bu (24 papers)
  5. Jiangjie Chen (46 papers)
  6. Junting Zhou (11 papers)
  7. Kaijing Ma (12 papers)
  8. Zhoufutu Wen (10 papers)
  9. Bingli Wang (6 papers)
  10. Yancheng He (30 papers)
  11. Liang Song (60 papers)
  12. Hualei Zhu (3 papers)
  13. Shilong Li (25 papers)
  14. Xingjian Wang (13 papers)
  15. Wei Zhang (1489 papers)
  16. Ruibin Yuan (43 papers)
  17. Yifan Yao (11 papers)
  18. Wenjun Yang (5 papers)
  19. Yunli Wang (13 papers)
  20. Siyuan Fang (6 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com