2000 character limit reached
Studying the Korean Word-Chain Game with RLVR:Mitigating Reward Conflicts via Curriculum Learning (2510.03394v1)
Published 3 Oct 2025 in cs.LG and cs.CL
Abstract: Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training LLMs with stronger reasoning abilities. It has also been applied to a variety of logic puzzles. In this work, we study the Korean word-chain game using RLVR. We show that rule-derived rewards can naturally conflict, and demonstrate through experiments that a curriculum-learning scheme mitigates these conflicts. Our findings motivate further studies of puzzle tasks in diverse languages.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.