Reasoning Capabilities of Video Generative Models vs. LLMs
Determine whether contemporary video generative models, including image-to-video generation systems, can exhibit reasoning capabilities similar to large language models, i.e., whether these models possess comparable step-by-step reasoning abilities beyond visual fidelity and temporal coherence.
References
However, despite recent breakthroughs such as Veo 3's chain-of-frames reasoning, it remains unclear whether these models can exhibit reasoning capabilities similar to LLMs.
— TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
(2511.13704 - Chen et al., 17 Nov 2025) in Abstract