Optimal RL recipe for agentic reasoning
Develop the optimal reinforcement learning training recipe for agentic reasoning in large language model agents that integrate external tools, specifying the algorithmic components and settings that yield the best performance and stability.
References
Despite rapid progress in GRPO-based variants, the optimal RL recipe for agentic reasoning remains unclear.
— Demystifying Reinforcement Learning in Agentic Reasoning
(2510.11701 - Yu et al., 13 Oct 2025) in Introduction, Algorithm-wise paragraph (#1{2})