Quantitative physical reasoning capability of state-of-the-art vision-language models
Determine whether state-of-the-art vision-language models can reason about physical properties quantitatively from video observations, specifically by inferring object kinematic quantities such as size, velocity, and acceleration in real-world units rather than merely qualitative judgments.
Sponsor
References
However, it remains unclear whether state-of-the-art vision perception models (e.g., large VLMs) can reason physical properties quantitatively.
— QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
(2512.19526 - Puyin et al., 22 Dec 2025) in Abstract (page 1)