Can video generation discover physical laws from visual data?

Determine whether video generation models can discover fundamental physical laws solely from visual observations without human priors, i.e., whether observing and learning from videos alone suffices for inferring and applying classical mechanics rules to predict future frames in unseen scenarios.

Background

The paper investigates whether scaling video generation models (inspired by systems like OpenAI's Sora) enables discovery of fundamental physical laws from raw visual data, without introducing explicit human-designed physics priors. The authors construct controlled 2D physics simulations (uniform motion, elastic collisions, parabolic motion) and evaluate models under in-distribution, out-of-distribution, and combinatorial generalization settings.

Empirical results show near-perfect in-distribution performance and improvements in combinatorial generalization with scaling, but persistent failures in out-of-distribution scenarios. These observations raise the foundational question of whether purely visual learning is sufficient for recovering true physical laws, motivating further study of the mechanisms by which video generative models generalize.

References

However, it remains an open question whether video generation can discover such rules merely by observing videos, as Sora does.

— How Far is Video Generation from World Model: A Physical Law Perspective (2411.02385 - Kang et al., 4 Nov 2024) in Section 1 (Introduction)

Can video generation discover physical laws from visual data?

Sponsor

Background

References

Related Problems