Explaining prioritized exploration dynamics in Coconut
Explain the training dynamics that cause models trained with Coconut and Coconut-BFS to converge to similar exploration strategies and to allocate disproportionate attention and representation weight to frontier and optimal nodes during continuous-thought reasoning.
References
We leave explaining this behavior from the perspective of training dynamics as future work.
— Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
(2505.12514 - Zhu et al., 18 May 2025) in Section 5.4 (Exploration Priority)