Whether and how MLLMs reason deliberatively in latent space

Establish whether multimodal large language models actually perform deliberative reasoning within their latent space and, if so, characterize how such latent-space reasoning operates.

Background

LVR posits that models deliberate using latent tokens—the last-layer hidden states—during reasoning. However, it is not evident that these hidden states implement a genuine, step-by-step reasoning process.

By framing the process as a causal chain (input → latent tokens → answer), the paper raises the explicit uncertainty of whether deliberate reasoning occurs in latent space and in what manner.

References

In particular, it is unclear whether and how MLLM actually performs deliberative reasoning within the latent space.

— Imagination Helps Visual Reasoning, But Not Yet in Latent Space (2602.22766 - Li et al., 26 Feb 2026) in Section 1. Introduction

Whether and how MLLMs reason deliberatively in latent space

Background

References

Related Problems