Efficacy of multimodal interleaved chain-of-thought for surpassing mathematical performance limits
Determine whether multimodal interleaved verbal–visual chain-of-thought reasoning can fundamentally surpass current performance limits in mathematical reasoning, given the completeness of symbolic mathematical representations and the extensive optimization of mathematical reasoning in current large language models.
References
However, as symbolic representations in mathematics are largely complete, and mathematical reasoning has been extensively optimized in modern LLMs, it remains unclear whether multimodal interleaved CoT can fundamentally break through the performance limit, warranting further investigation.
— Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models
(2601.19834 - Wu et al., 27 Jan 2026) in Section 6: Discussions — Limitations and future work