Papers
Topics
Authors
Recent
2000 character limit reached

Physics-Based Video Generation & Editing

Updated 7 December 2025
  • Physics-based video generation and editing is a research area that synthesizes video content with dynamics governed by physical laws such as momentum, gravity, and friction.
  • It integrates simulation methods, deep generative models, and advanced vision systems to create temporally coherent and photorealistic video outputs.
  • Recent advancements highlight effective simulation-guided and prior-distillation pipelines that enable user-controllable, physically plausible video retargeting.

Physics-based video generation and editing is a research frontier aiming to synthesize video content or edit existing footage such that the resulting dynamics respect physical laws — including conservation of momentum, elasticity, friction, gravity, and complex material behavior — and simultaneously achieve photorealism, temporal coherence, and user-controllable interactivity. The field combines physical simulation (continuum mechanics, rigid-body and deformable/particle-based solvers), deep generative models (especially diffusion-based video priors), and advanced vision systems for semantic and geometric understanding, producing systems that can animate objects under user-specified actions, change material properties, or manipulate forces in a physically plausible manner. Below, the state of the art is summarized across methodologies, model architectures, controllability paradigms, editing interfaces, benchmarks, and outstanding challenges.

1. Core Methodologies: Simulation-Grounded and Prior-Distillation Pipelines

Physics-based video generation approaches can be categorized by the tightness of coupling between explicit simulation models and generative video priors:

2. System Architectures and Mathematical Foundations

State-of-the-art systems instantiate a modular pipeline generally comprising the following:

Stage Typical Operation Core References
Scene/Geometry SAM, Grounded-SAM, InstantMesh, 3DGS, LGM (Zhang et al., 19 Apr 2024, Tan et al., 26 Nov 2024, Chen et al., 26 Mar 2025)
Physical Parameters Material segmentation, LLM reasoning, neural fields (Lin et al., 25 Nov 2024, Zhang et al., 25 Nov 2025, Chen et al., 26 Mar 2025)
Simulation MPM, rigid-body ODE, PBD, elastoplasticity (Tan et al., 26 Nov 2024, Zhang et al., 19 Apr 2024, Fu et al., 27 May 2024)
Rendering Differentiable rasterization, diffusion-based enhancement (Zhang et al., 19 Apr 2024, Tan et al., 26 Nov 2024, Wang et al., 24 Sep 2025)

Key Mathematical Models (selected):

  • Fixed-corotated energy for elasticity:

ψ(F)=μi=1d(σi1)2+λ2(detF1)2\psi(F) = \mu \sum_{i=1}^d (\sigma_i -1)^2 + \frac{\lambda}{2}( \det F - 1 )^2

with Lamé parameters μ=E/2(1+ν)\mu = E / 2(1+\nu), λ=Eν/[(1+ν)(12ν)]\lambda = E \nu / [(1+\nu)(1-2\nu)].

  • MPM substeps: particle-to-grid, grid update via conservation equations, grid-to-particle transfer.
  • Differentiable rendering and photometric/D-SSIM losses for simulation-to-video alignment.

Generative Model Conditioning:

  • Physics parameters (force vectors, material labels, elastic/plastic parameters) embedded as input tokens, spatial maps, or ControlNet branches.
  • Additional losses may enforce velocity consistency, explicit physical constraint proximity, spatiotemporal token relational alignment (Zhang et al., 29 May 2025).

3. Controllability, Editing, and User Interaction Paradigms

Physics-based frameworks offer editing flexibility that far exceeds traditional video or text-to-video generation:

4. Evaluation: Benchmarks, Metrics, and Experimental Findings

Assessment of physics-based video generation is multi-faceted:

Selected Table: Notable Quantitative Results

Method Physical Metric Visual Metric User Preference (%)
PhysDreamer FVD ↓: 146.0 FID ↓: 189.4 66% win (vs PhysGaussian) (Zhang et al., 19 Apr 2024)
WonderPlay MotionFid: highest VBench: top 2 70–80% (2AFC paper) (Li et al., 23 May 2025)
PhysGen Physical: 4.14/5 FID: 105.7 ---
PhysChoreo PC: 4.67 VQ: 4.67 58% (vs Veo 3.1)
PhysCtrl PC: 4.5 VQ: 4.3 81% (phys), 66% (VQ) (Wang et al., 24 Sep 2025)

5. Material and Interaction Modeling: From Segmentation to Physics Fields

High-fidelity physical response in video editing demands per-object and per-part material recognition, state reconstruction, and parameter assignment:

6. Editing, Control, and Interactive Simulation: Capabilities and Limitations

Physics-based frameworks vastly expand the expressivity of video editing compared to text-prompted or unconditional generation:

  • Supported Edits: Arbitrary time-dependent external forces, re-parametrization of material model on-the-fly (e.g., elastic \rightarrow viscoplastic), constraint editing (pins, welds, collision toggles), per-part text-to-physics translation (prompt \rightarrow property map), user-guided force field drawing.
  • UI Integration: Editor control panels for object picking, force vector drawing, material sliders, scripting interfaces, and hybrid text/gesture input (Chen et al., 26 Mar 2025, Zhang et al., 25 Nov 2025, Gillman et al., 26 May 2025).
  • Limitations:
    • Manual segmentation and boundary condition setup may be required (Zhang et al., 19 Apr 2024).
    • Simulation cost is non-negligible (e.g., \sim1 min/sec of video on a V100 for full MPM) (Zhang et al., 19 Apr 2024, Tan et al., 26 Nov 2024).
    • Handling of complex phenomena (fluids, fracture, adhesive contacts) is still an open area—most models restrict to elastic or simple granular materials (Tan et al., 26 Nov 2024, Lin et al., 25 Nov 2024).
    • Failure cases: geometry mis-estimation, inpainting errors, hallucinated deformation for under-constrained settings, occasional physics artifacts due to simulation-discriminator mismatch.

7. Outlook: Directions, Benchmarks, and Open Challenges

Emerging challenges and next steps include:

  • Multi-Material and Composite Dynamics: Methods like Phys4DGen (Lin et al., 25 Nov 2024) are pioneering automated assignment of heterogeneous interior/surface properties and composition-aware simulation, but robust recognition and simulation of multi-component objects remain open problems.
  • End-to-End Differentiable or RL-Guided Optimization: PhysDreamer (Zhang et al., 19 Apr 2024), PhysMaster (Ji et al., 15 Oct 2025), and VideoREPA (Zhang et al., 29 May 2025) highlight the potential of reinforcement learning, human-in-the-loop feedback, and token relation distillation to improve or directly optimize physical realism in deep generative models.
  • Semantic-to-Physics and Language-Conditioned Control: Integration of robust, open-ended semantic parsing (e.g., with GPT-5) and free-form instruction-to-physics mapping dramatically lowers the barrier to realistic, physically plausible video retargeting (Zhang et al., 25 Nov 2025, Yang et al., 30 Mar 2025, Li et al., 23 May 2025).
  • Benchmarks and Metrics: Community-wide adoption of physically calibrated benchmarks (Physics-IQ, VideoPhy, PBench-Edit, VBench) and adoption of VLM-based realism raters are essential for rigorous assessment (Liu et al., 25 Nov 2025, Tan et al., 26 Nov 2024, Wu et al., 5 Oct 2025).

Physics-based video generation and editing—characterized by modular simulation-reconstruction pipelines, deep integration of material and force fields, controllable generative priors, and semantic-to-action translation—is positioned to transform the landscape of interactive animation, digital content creation, and virtual world modeling, enabling editability and realism grounded in physical law rather than mere appearance priors. The continued evolution of hybrid architectures and benchmarks will be critical for advancing both physical fidelity and creative expressivity across domains.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Physics-Based Video Generation and Editing.