Dice Question Streamline Icon: https://streamlinehq.com

Impact of long chain-of-thought priors on multi-turn agentic reasoning

Ascertain how long chain-of-thought predispositions in language models affect performance and tool-use behavior during multi-turn agentic reasoning with external tools.

Information Square Streamline Icon: https://streamlinehq.com

Background

Long-CoT models often rely on extended internal reasoning and may under-utilize tools, which can hinder agentic RL on reasoning-intensive tasks. The paper later documents this tendency empirically.

A systematic understanding of how Long-CoT priors interact with agentic tool use across tasks and training regimes remains an unresolved issue highlighted by the authors.

References

Open puzzles are unsolved regarding the allocation of turn budgets, the trade-off between response length and tool-call efficiency, and the impact of long-CoT predispositions on multi-turn reasoning.

Demystifying Reinforcement Learning in Agentic Reasoning (2510.11701 - Yu et al., 13 Oct 2025) in Introduction, Reasoning Mode-wise paragraph (#1{3})