Effects of policy entropy on agentic RL training
Characterize how policy entropy affects training effectiveness, stability, and final performance in GRPO-based agentic reinforcement learning where language model agents interleave tool calls with internal reasoning.
References
For agentic RL, however, it remains unclear (i) what techniques work best for policy optimization, (ii) what is the relationship between the exploration(pass@k)-exploitation(average@k), and (iii) how does entropy affect training effectiveness, stability, and final performance.
— Demystifying Reinforcement Learning in Agentic Reasoning
(2510.11701 - Yu et al., 13 Oct 2025) in Section 4 (Algorithmic Design and Training Dynamics in Agentic RL), opening paragraph