Extending test-time scaling to unified multimodal models
Develop a test-time scaling approach for unified multimodal models that supports iterative chain-of-thought reasoning, verification, and refinement across multiple rounds of interleaved text and image processing, thereby extending inference-time compute benefits that have been demonstrated for language models to architectures capable of both multimodal understanding and generation within a single model.
References
While test-time scaling (TTS) has demonstrated that allocating additional inference compute for iterative reasoning substantially improves LLM performance, extending this paradigm to unified multimodal models remains an open challenge.
— UniT: Unified Multimodal Chain-of-Thought Test-time Scaling
(2602.12279 - Chen et al., 12 Feb 2026) in Abstract, page 1