Conditions and Methods for Effective, Generalizable Interleaved Multimodal Chain-of-Thought
Ascertain the precise conditions under which multimodal Chain-of-Thought reasoning extends beyond text-only and image-only Chain-of-Thought approaches, and develop principled techniques that achieve effective and generalizable interleaved reasoning within unified multimodal models across tasks and domains.
References
In summary, prior work highlights the potential of multimodal CoT. However, it leaves open the question of when multimodal CoT can extend beyond text-only and image-only CoT, specifically regarding how to achieve effective and generalizable interleaved reasoning.
— ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
(2510.27492 - Gu et al., 30 Oct 2025) in Section 6: Related Work — Multimodal Chain-of-Thought