Impact of long chain-of-thought priors on multi-turn agentic reasoning
Ascertain how long chain-of-thought predispositions in language models affect performance and tool-use behavior during multi-turn agentic reasoning with external tools.
References
Open puzzles are unsolved regarding the allocation of turn budgets, the trade-off between response length and tool-call efficiency, and the impact of long-CoT predispositions on multi-turn reasoning.
— Demystifying Reinforcement Learning in Agentic Reasoning
(2510.11701 - Yu et al., 13 Oct 2025) in Introduction, Reasoning Mode-wise paragraph (#1{3})