Papers
Topics
Authors
Recent
Search
2000 character limit reached

Physics Alignment Post-Training

Updated 27 February 2026
  • Physics alignment post-training is a framework that adjusts pre-trained deep generative models to honor physical laws without redesigning the architecture.
  • It employs methods such as explicit reward-based fine-tuning, supervised physics alignment, inference-time correction, and self-training with differentiable physics metrics.
  • Empirical results show significant improvements, including reduced kinematic errors and enhanced physical plausibility in applications like video generation and fluid simulation.

Physics alignment post-training encompasses methodologies for augmenting pre-trained deep generative and forecasting models to ensure their outputs respect fundamental physical laws, without the need to redesign model architectures from scratch. This paradigm targets the persistent gap between visually plausible outputs and physically correct trajectories, dynamics, or constraints—crucial in fields such as video generation, fluid simulation, and soft-body modeling.

1. Motivation and Challenges in Physics Alignment

State-of-the-art generative models, including diffusion-based video generators and data-driven space-time forecasters, excel at reproducing visually realistic outputs but often disregard basic physical laws such as gravity, inextensibility, conservation of momentum, or diagnostic metrics from fluid dynamics. This disconnect arises because standard training primarily minimizes statistical or perceptual losses over large, uncontrolled datasets where physical feasibility is not enforced. Failure modes include objects floating or behaving inconsistently in videos, physically impossible mesh deformations, and non-physical turbulence statistics. Such artifacts detract from their utility in downstream scientific, engineering, or simulation tasks where physical consistency is essential (Le et al., 29 Nov 2025, &&&1&&&, Yuan et al., 15 Jan 2026, Wu et al., 2024, Geng et al., 2019).

Physics alignment post-training refers to any procedure that takes a pre-trained, non-physically-aligned model and systematically post-processes, fine-tunes, or filters its outputs to enforce physical constraints using either explicit energy terms, proxy rewards, differentiable solvers, or physics-aware metrics.

2. Post-Training Frameworks and Methodological Principles

Explicit Reward-Based Fine-Tuning

The NewtonRewards framework (Le et al., 29 Nov 2025) is exemplary: it augments video diffusion models post hoc via differentiable rewards based on measurable proxies extracted from generated videos. These proxies include optical flow for velocity estimation and high-level appearance embeddings (e.g., from V-JEPA 2) as mass proxies. Two complementary reward terms are used:

  • Newtonian kinematic constraint: encourages constant acceleration in motion primitives via second-order differences in optical flow.
  • Mass conservation reward: penalizes deviation between generated and reference frame-wise mass proxies, preventing degenerate solutions where the model trivially collapses motion.

Optimization is performed over a joint loss:

Ltotal(θ)=Ldiffusion(θ)+λkinematicRkinematic(θ)+λmassRmass(θ)L_{\text{total}}(\theta) = L_{\text{diffusion}}(\theta) + \lambda_{\text{kinematic}} R_{\text{kinematic}}(\theta) + \lambda_{\text{mass}} R_{\text{mass}}(\theta)

with fixed utility models for optical flow and mass proxies.

Physics Supervised Fine-Tuning and Reward Optimization

The PISA framework (Li et al., 12 Mar 2025) employs supervised fine-tuning on a synthetic, physics-valid dataset (e.g., object free-fall simulated via Kubric/PyBullet/Blender). Post-training reward optimization is performed using differentiable metrics such as segmentation mask IoU, optical flow accuracy, or depth consistency, and is inspired by reinforcement learning from human feedback (RLHF) and classifier-guided diffusion (Li et al., 12 Mar 2025).

Inference-Time Physics Alignment

WMReward (Yuan et al., 15 Jan 2026) introduces inference-time physics alignment, where no model parameters are updated post-training. Instead, generated samples are filtered or guided toward higher physics plausibility using an external latent world model; for example, V-JEPA-2 provides a “surprise” signal based on the predictability of future embeddings. The reward is integrated into sampling—either as a best-of-N search, gradient-based score modification, or their combination—so as to select outputs that jointly maximize base model likelihood and physics reward.

Self-Training via Physics-Aware Selection

BeamVQ (Wu et al., 2024) leverages a self-training loop with physics-aware beam search. Model predictions are quantized via a discrete codebook, and beam-expanded candidate sequences are scored by physics metrics (e.g., divergence, turbulent kinetic energy, energy spectra). The highest-scoring samples are used for further self-distillation, progressively aligning predictions with physical laws through data augmentation and retraining on filtered outputs.

Differentiable Physics Projections

Geng et al. (Geng et al., 2019) implement direct projection or correction layers within the network itself. For tasks such as cloth simulation, energy-based constraints (e.g., inextensibility or buckling) are coded as convex optimization or quasistatic solvers, folded into the network's forward and backward pass, ensuring outputs are physically feasible by construction.

3. Proxies, Metrics, and Reward Functions

Physics alignment relies critically on quantifiable proxies or reward functions, which must be:

  • Measurable: derived from readily computed features on generated outputs, such as optical flow (RAFT), segmentation masks (SAM2), depth estimates (Depth-Anything-V2), or encoder embeddings (V-JEPA 2).
  • Differentiable: suitable for backpropagation so the reward can guide model fine-tuning or sampling.

Common proxies and metrics include:

Proxy / Metric Physical Quantity Usage Domain
Optical Flow (RAFT) Velocity Video generation, reward loss
Appearance Feature Embeddings (V-JEPA) Mass proxy Video generation/rewards
Object Mask IoU Object permanence Video, segmentation
Centroid L2, Chamfer Distance Position/Shape Trajectory, physics accuracy
Divergence/TKE/Energy Spectrum Fluid statistics Turbulent flow forecasting

Reward function examples:

  • Kinematic residual: Rkinematic=ϕt+12ϕt+ϕt122R_\text{kinematic} = \|\phi_{t+1} - 2\phi_t + \phi_{t-1}\|_2^2 (Le et al., 29 Nov 2025).
  • Mass-matching residual: Rmass=1Ttztgenztsim22R_\text{mass} = \frac{1}{T}\sum_t \|z^{\text{gen}}_t - z^{\text{sim}}_t\|_2^2 (Le et al., 29 Nov 2025).
  • VJEPA-based “surprise” reward: r(x)=1KkK[1cos(z^kfut,zkfut)]r(x) = \frac{1}{|K|} \sum_{k\in K} [1 - \cos(\hat{z}_k^{\text{fut}}, z_k^{\text{fut}}) ] (Yuan et al., 15 Jan 2026).

4. Empirical Results and Benchmarks

Substantial improvements over base and visually supervised models are observed across tasks:

  • NewtonRewards achieves an average +9.75% improvement across five physics/visual metrics (in-distribution) and +8.60% out-of-distribution, with RMSE of acceleration reduced by up to 16.1% OOD, and IoU improving by 14.8% (Le et al., 29 Nov 2025).
  • PISA (PSFT+ORO) reduces L2 trajectory error by 55%, triples IoU, and delivers further 10–20% improvements with reward-guided fine-tuning (Li et al., 12 Mar 2025).
  • WMReward (inference-time alignment) pushes PhysicsIQ benchmark scores up to +7.42% over SOTA, confirmed by human preference studies showing ≈11% increased preference for physically plausible generations (Yuan et al., 15 Jan 2026).
  • BeamVQ yields relative error reductions: MSE improved by 19–39%, divergence errors down by up to 40%, and long-term L2 errors down by 10–50% across diverse datasets and backbones (Wu et al., 2024).
Approach & Domain Core Metric Improvement (Representative) Source
NewtonRewards, video RMSE_a, IoU, etc. RMSE_a ↓16.1% (OOD), IoU ↑14.8% (ID) (Le et al., 29 Nov 2025)
PISA, video L2 traj., IoU, CD L2 ↓55%, IoU ×3 (PSFT); ORO +10–20% (Li et al., 12 Mar 2025)
WMReward, video PhysicsIQ, VideoPhy PhysicsIQ ↑7.42%, PC ↑7–8% (Yuan et al., 15 Jan 2026)
BeamVQ, fluid forecasting Divergence, MSE, etc. Divergence ↓40%, TKE error ↓50% (Wu et al., 2024)
Geng et al., cloth sim. SqrtMSE, strain SqrtMSE ↓15–25%, strain→0 (Geng et al., 2019)

Residual and heatmap analyses verify that aligned models exhibit lower kinematic violations and maintain physical coherence across frames and test conditions.

5. Limitations, Failure Modes, and Generalization

Known limitations include:

  • Proxy fidelity: many methods assume a static camera, weak perspective, or that mass/velocity proxies are sufficiently accurate for the physics at hand. Dynamic scenes, severe occlusions, or complex depth geometry may invalidate these assumptions (Le et al., 29 Nov 2025).
  • Regime restriction: models such as NewtonRewards enforce constant acceleration but not varying force regimes (e.g., collisions, friction, fluid boundary effects).
  • Distributional mismatch: even physically aligned models under-constrain the uncertainty and can miss the full diversity of valid trajectories (especially under ambiguous initial/frame input) (Li et al., 12 Mar 2025).
  • Degeneracy: absence of complementary rewards (e.g., mass conservation) may cause models to collapse outputs (e.g., object velocity→0 to trivially satisfy kinematic constraints) (Le et al., 29 Nov 2025).
  • Computational cost: methods with differentiable physics projections or solver layers (e.g., Geng et al.) incur 10–50× slower inference and training time (Geng et al., 2019).

Generalization properties:

  • Most frameworks report reduced performance on out-of-distribution (OOD) conditions (e.g., heights outside the trained range, novel objects, new initial velocities), with physics errors increasing under such shifts.
  • The principles extend beyond gravity/alignment: for example, conservation of momentum with mass-velocity product proxies, energy conservation with estimated speed squared, or per-voxel fluid field constraints (Le et al., 29 Nov 2025, Wu et al., 2024).
  • “Good-looking” outputs remain an insufficient proxy for correct physics, necessitating explicit metric-based evaluation.

6. Extensions and Future Research Directions

Active research seeks to address richer and more complex physical domains:

  • Enforcing additional or more general physical laws (energy, momentum, multi-body, fluid, or deformable mechanics) via differentiable physics engines, neural surrogate models, or hybrid supervisory signals (Le et al., 29 Nov 2025, Li et al., 12 Mar 2025, Wu et al., 2024).
  • Improving proxies (e.g., learned or geometric mass regressors, depth cues) and broadening evaluation to multi-modal or ambiguous tasks (Le et al., 29 Nov 2025, Li et al., 12 Mar 2025).
  • Incorporating uncertainty quantification—increasingly, training models to sample calibrated distributions over possible physical evolutions, critical for applications in planning, safety, and risk-averse settings (Li et al., 12 Mar 2025).
  • Efficient inference-time architectures and reward plug-ins (e.g., WMReward, BeamVQ) for decoupling base model weights from physics-specific adaptation, enabling rapid tuning during deployment (Yuan et al., 15 Jan 2026, Wu et al., 2024).
  • General plug-in layers or backward-differentiable constraints to enforce general PDE or energy restrictions on arbitrary ML outputs (Geng et al., 2019).

Further benchmark development (e.g., extending PisaBench toward broader sets of physical primitives and real-world annotated data) is recognized as essential for systematic tracking and robust generalization assessment (Li et al., 12 Mar 2025).

7. Historical Context and Relation to Broader Research

Physics alignment post-training synthesizes and extends approaches from multiple areas: differentiable physics (energy-based modeling, projection layers), reinforcement learning (reward optimization, RLHF), self-supervised representation learning (latent world models), and classical numerical simulation (constraint satisfaction, PDE conformity). Its emergence is a direct response to the proliferation of large, pre-trained generative models in scientific domains and the demonstrated insufficiency of pure data-driven training for reliable physical accuracy. Cross-pollination with advances in neural PDE solvers, differentiable rendering, and simulation learning is expected to further expand the capabilities and efficiency of post-training alignment frameworks (Le et al., 29 Nov 2025, Geng et al., 2019, Wu et al., 2024).


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Physics Alignment Post-Training.