Newtonian Motion Primitives Explained
- Newtonian Motion Primitives are canonical motion patterns defined by constant-acceleration dynamics, modeling fundamental motions like free fall, throws, and ramp sliding.
- They serve as benchmarks and enforcement targets in post-training generative video models using physics-grounded reward functions that leverage optical flow and mass proxies.
- Evaluations on the NewtonBench-60K suite demonstrate improved visual fidelity and reduced trajectory errors, with consistent RMSE reduction across in-distribution and OOD settings.
Newtonian Motion Primitives (NMPs) define a set of canonical, physically plausible motion patterns governed by constant-acceleration dynamics, which serve as benchmarks and enforcement targets for evaluating and improving the physical realism of generative video models. Characterized by their adherence to Newton’s laws, NMPs formalize five essential classes of single-object motion: free fall, horizontal throw, parabolic throw, and sliding on a ramp (downward and upward), spanning a range of gravitational and frictional regimes. These primitives are central to methods that use verifiable, physics-grounded reward functions to post-train generative video models, closing the gap between visual plausibility and physical correctness (Le et al., 29 Nov 2025).
1. Definition and Formalization of Newtonian Motion Primitives
NMPs are defined as trajectories parameterized by constant-acceleration equations, modeling archetypal object motions under Newtonian mechanics. The five primitives, as instantiated in the NewtonBench-60K benchmark, are:
| Primitive | Defining Dynamics | Constant Acceleration Components |
|---|---|---|
| Free Fall (NMP-F) | Released from rest under uniform gravity with no initial velocity. | , |
| Horizontal Throw (NMP-TH) | Launched with horizontal velocity , zero vertical speed, under gravity. | , |
| Parabolic Throw (NMP-TP) | Arbitrary velocity under gravity. | , |
| Ramp Sliding Down (NMP-RD) | Sliding down incline with kinetic friction . | |
| Ramp Sliding Up (NMP-RU) | Initial uphill speed on same ramp, subject to gravity/friction, resulting in negative acc. |
Ramp sliding accelerations are projected into image-plane axes using the tangent vector : .
NMPs are chosen to exemplify motions where Newton’s second law hypothesizes constant acceleration, encompassing both gravity-induced and contact-dynamics scenarios.
2. Verifiable, Physics-Grounded Reward Functions
Two verifiable rewards complement each other to enforce adherence to NMP dynamics within video diffusion models:
a) Newtonian Kinematic Constraint:
Derived from discrete kinematics, the constant-acceleration residual for velocity proxies is:
Operationalized via optical-flow proxies (approximating ), the loss is:
enforcing that estimated accelerations remain strictly constant along the image plane.
b) Mass-Conservation Reward:
Prevents degenerate solutions (e.g., objects slowing to near-zero speed) by anchoring motion to consistent visual identity/mass:
where and are high-level encoder features for generated and simulated reference frames, serving as mass proxies. Minimizing maintains temporal invariance of object attributes.
The combined loss for post-training is .
3. Extraction of Physical Proxies
Physical quantities required for reward evaluation are extracted with frozen, pretrained utility models:
- Velocity Proxy: Optical flow is estimated for each frame pair via frozen RAFT, yielding , from which .
- Mass Proxy: High-level frame encodings are obtained from a frozen V-JEPA2 video encoder. These features capture object identity and material, providing a differentiable stand-in for mass in the reward computation.
No depth, force, or direct trajectory supervision is used; all supervision arises from these measurable proxies.
4. Post-Training with NewtonRewards
The algorithm applies verifiable reward functions to post-train video diffusion models:
- Initialize from a pretrained generator (OpenSora v1.2), text- and frame-conditioned.
- For each iteration:
- Sample latent noise and conditioning ; generate a 32-frame clip .
- Compute optical flow () and mass proxy embeddings () with frozen RAFT and V-JEPA2 on .
- Pair features with reference simulated features (); calculate , .
- Aggregate into ; update via AdamW optimizer ( LR, batch size 1).
- Utility models remain fixed during all updates.
Backpropagation is restricted to , enabling scalable, reward-driven adaptation to physics priors.
5. NewtonBench-60K Evaluation Suite
The NewtonBench-60K dataset provides ground-truth trajectories for the five NMPs across a wide regime of physical parameters:
- Composition: 50K training clips (10K per NMP), 10K benchmark clips (2K per NMP), evenly split into in-distribution (ID) and out-of-distribution (OOD) settings (parameter shifts in height, velocity, friction).
- Rendering pipeline: Kubric + PyBullet + Blender, 32 frames/clip at 16 fps and resolution.
- Metrics:
- Physics: Velocity RMSE, Acceleration RMSE computed from centroids extracted via SAM2 segmentation.
- Visual: Trajectory L2 error, Chamfer distance, Binary mask Intersection over Union (IoU).
These metrics dissect both physical and visual fidelity of generated video sequences.
6. Quantitative Impact and Ablative Analyses
Application of NewtonRewards confers consistent improvements across NMPs and metric suites:
| Test Regime | Model | L2 | CD | IoU | vRMSE | aRMSE | Avg. Gain (%) |
|---|---|---|---|---|---|---|---|
| ID | OpenSora (SFT) | 0.1098 | 0.3159 | 0.1103 | 0.2792 | 3.3244 | — |
| +NewtonRewards | 0.0962 | 0.2930 | 0.1266 | 0.2628 | 3.0432 | +9.75 | |
| OOD | OpenSora (SFT) | 0.1297 | 0.4082 | 0.0998 | 0.4230 | 6.1451 | — |
| +NewtonRewards | 0.1207 | 0.3780 | 0.1025 | 0.3816 | 5.1575 | +8.60 |
- All five NMPs show reduced trajectory and contour error, improved IoU, and lower RMSE in velocity/acceleration under both ID and OOD.
- Ablations confirm that visual-feature alignment alone yields minor spatial gains but may substantially worsen velocity/acceleration errors.
- Removing the kinematic loss nearly nullifies motion regularization; eliminating the mass term results in degenerate “reward hacking” with >66% speed collapse.
7. Significance and Broader Context
NMPs, as structured benchmarks and enforcement targets, enable discriminative assessment of physical plausibility in video generation. Their integration within NewtonRewards demonstrates that post-training with verifiable, physics-grounded rewards is feasible using only measurable proxies, without reliance on explicit trajectory or force data. This approach maintains quantitative fidelity to Newton’s laws across in-distribution and novel physical regimes, including frictional contacts and parameter shifts. A plausible implication is that such post-training could serve as a foundation for scaling physics-aware video generation to broader, more complex settings, provided further extension of proxy models and motion primitive classes (Le et al., 29 Nov 2025).