Watching Physics: the Generative Science of Matter and Motion

Published 18 Apr 2026 in cs.CE | (2604.16843v1)

Abstract: Can we learn the physics of matter in motion directly from images and video--and trust it? Answering this question requires integrating experiments, physics-based simulation, and data across traditionally separate disciplines. Much of this knowledge is visual and temporal rather than textual: images and videos encode structure, dynamics, and causality that equations alone cannot fully capture. Recent generative models produce compelling visual content, yet they rely on observational data and often lack physical validity. Here we show that generative video models gain scientific value when they couple visual data with experiments and high-fidelity simulations. Using deformation mechanics as a testbed, we study three systems of increasing complexity--rubber compression, can crushing, and cardiac motion--and identify regimes in which visual learning succeeds, fails, and requires mechanistic supervision. When physics manifests in visible kinematics, generative models recover measurable quantities such as surface strain; when internal state variables dominate, visual plausibility no longer ensures physical admissibility. We propose that this convergence defines a new frontier, the Generative Sciences of Matter and Motion, which unifies Simulogenics, Physiogenics, and Materiogenics. These physics-grounded foundation models can turn visual generation into a scientific instrument for inference, prediction, and design of matter in motion.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper pioneers a hybrid approach combining generative video AI with experimental observation and finite element simulation to generate physically plausible deformations.
It benchmarks the method through rubber compression, can crushing, and cardiac motion, revealing that visual plausibility does not always ensure mechanical admissibility.
The work emphasizes the need for physics-grounded loss functions and self-supervising models to enhance generative simulations for engineering and biomedical applications.

Watching Physics: The Generative Science of Matter and Motion

Introduction

"Watching Physics: the Generative Science of Matter and Motion" (2604.16843) delineates the integration of generative video AI with experimental observation and physics-based simulation for the inference, prediction, and design of matter in motion. The central thesis posits that contemporary generative models, when augmented with experimental and mechanistic data, can transcend visual realism and approach quantitative physical validity. This convergence defines a new paradigm unifying Simulogenics (generative physical simulation), Physiogenics (learning physics from observation), and Materiogenics (generative design of material structure and function).

Figure 1: Conceptual workflow of the proposed generative-video-AI framework, integrating multimodal sources for learning physically plausible deformation and motion.

Foundations: Simulogenics, Physiogenics, and Materiogenics

The paper formalizes three interlinked generative science domains:

Simulogenics synthesizes physically plausible trajectories from data, thus enabling generation of physical solutions without explicit governing equations. This is grounded in classical simulation (finite elements, variational principles) but inverts the paradigm via data-driven, generative surrogates.
Physiogenics operationalizes inferring physical laws from visual observations. Rather than presupposing equations, latent structure is learned from image and video sequences, leveraging statistical consistency and dynamic priors.
Materiogenics advances generative design, capturing structure–property–function relationships to synthesize new materials and configurations by sampling the learned latent space.

The core framework leverages integration across these regimes to move from mere reproduction of visible motion to interpretable, mechanistically consistent generation.

Experimental Framework and Methodological Advancements

The study implements a unified workflow that combines laboratory experiments, generative video AI (specifically, the Sora model by OpenAI), and high-fidelity finite element (FE) simulation. The critical methodological underpinning is the side-by-side evaluation of realism and physical admissibility across three testbeds of escalating complexity: smooth hyperelastic deformation, structural instability, and coupled, active living matter.

Exploratory Study I: Rubber Block Compression

The first benchmark addresses uniaxial compression of a hyperelastic rubber block.

The generative AI video accurately recapitulates globally visible kinematic phenomena: contraction, lateral expansion, and hourglass shaping.
Quantitative digital image correlation (DIC) on synthetic frames yields surface strain fields comparable to FE solutions using a third-order Ogden model, particularly in regimes where deformation remains smooth and feature tracking robust.
Figure 2: Side-by-side comparison of generative video, DIC fields, and FE simulation for rubber block compression, highlighting agreement in early-stage kinematics.

A notable limitation emerges as strain increases: loss of texture coherence in generated frames hampers DIC, and time-resolved tracking fails, underscoring a breakdown in quantitative reliability for severe deformations.

Exploratory Study II: Crushing a Thin-Walled Can

The second case investigates the complex, instability-driven collapse of an aluminum can.

The experiment records authentic collapse morphologies and force–displacement curves capturing elastic-plastic buckling and fold formation.
Generative video models reproduce global appearance and folding sequences but lack constraint by internal state variables (stress, plastic strain, energy dissipation).
FE simulations resolve the internal mechanics in detail; comparison reveals that generated dynamics achieve visual plausibility but not mechanical admissibility.
Figure 3: Laboratory, AI-generated, and FE simulation outputs for can crushing, demonstrating visually plausible but not physically complete generative sequences.

Force–time analyses expose that visual-consistency-based evaluation is insufficient; only simulation incorporates the governing physics and internal consistency.

Exploratory Study III: Cardiac Motion

The third scenario exemplifies living-system complexity: left ventricular contraction in the human heart.

Generative video accurately captures cyclical contraction/relaxation and general gross motion.
The anatomical fidelity of the generated geometry is inconsistent, and the internal mechanics (fiber architecture, active stress) are not enforced.
FE simulation produces interpretable, physiologically plausible strain fields based on detailed constitutive laws and anatomical geometry.
Figure 4: Generative AI video and multiperspective FE simulation of cardiac motion across a heartbeat, establishing complementary strengths and mechanistic gaps.

From Visual Realism to Physical Validity

A rigorous distinction is maintained throughout: visual realism is neither necessary nor sufficient for physical validity. The progression in case studies demonstrates:

In low-complexity, surface-kinematics-dominated systems (rubber compression), generative AI can approach experimental utility for certain modalities (e.g., calibration, variability augmentation).
As physical phenomena require internal state inference or are dominated by invisible variables (instabilities, active contraction), appearance-based generative models fail to capture the governing mechanistic dependencies.
Moving toward patient-specific modeling in living systems accentuates the requirement for integrating simulation-native supervision and physics constraints.

The implication is that progression to physically valid generative modeling mandates hybrid architectures that enforce mechanistic priors (e.g., via simulation-based regularization, physics-grounded loss functions, or structured latent spaces incorporating governing equations).

Implications and Speculative Outlook

The study identifies several immediate and future implications:

Experimentation and calibration: Generative video models enable rapid, variability-rich virtual experimentation and preliminary calibration but must be grounded for use in mechanistically critical tasks.
Hybrid and self-improving pipelines: Integrating real, simulated, and generated data can yield self-supervising models, iteratively refining their physical reliability.
Simulation democratization: As generative models align with physical laws, scientific simulation may become accessible via natural language interfaces, broadening engagement and interdisciplinary utility.

Strong claims in the paper include that generative models, when aligned with physics, have the capacity to serve both as scientific instruments (enabling inference and prediction) and as platforms for design within high-dimensional, generative material spaces. However, the authors also make a contradictory observation: plausible appearance in AI video does not guarantee admissibility in a physical sense, and failures emerge sharply when latent dynamics are not visually manifest.

Conclusion

"Watching Physics: the Generative Science of Matter and Motion" substantiates that generative AI, when combined with experimental and mechanistic supervision, can bridge the gap between simulation, observation, and open-ended generation. The integration of these modalities defines the generative sciences (Simulogenics, Physiogenics, Materiogenics), providing the foundations for future advances in data-driven physical inference, predictive simulation, and material and structure design. Realizing the vision of democratized, physically reliable simulation and design will require ongoing methodological integration of mechanistic, experimental, and generative paradigms to ensure both scientific rigor and practical utility.

Markdown Report Issue