Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 90 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 21 tok/s

GPT-5 High 14 tok/s Pro

GPT-4o 109 tok/s

GPT OSS 120B 469 tok/s Pro

Kimi K2 181 tok/s Pro

2000 character limit reached

Neural Robot Dynamics (2508.15755v1)

Published 21 Aug 2025 in cs.RO, cs.AI, cs.GR, and cs.LG

Abstract: Accurate and efficient simulation of modern robots remains challenging due to their high degrees of freedom and intricate mechanisms. Neural simulators have emerged as a promising alternative to traditional analytical simulators, capable of efficiently predicting complex dynamics and adapting to real-world data; however, existing neural simulators typically require application-specific training and fail to generalize to novel tasks and/or environments, primarily due to inadequate representations of the global state. In this work, we address the problem of learning generalizable neural simulators for robots that are structured as articulated rigid bodies. We propose NeRD (Neural Robot Dynamics), learned robot-specific dynamics models for predicting future states for articulated rigid bodies under contact constraints. NeRD uniquely replaces the low-level dynamics and contact solvers in an analytical simulator and employs a robot-centric and spatially-invariant simulation state representation. We integrate the learned NeRD models as an interchangeable backend solver within a state-of-the-art robotics simulator. We conduct extensive experiments to show that the NeRD simulators are stable and accurate over a thousand simulation steps; generalize across tasks and environment configurations; enable policy learning exclusively in a neural engine; and, unlike most classical simulators, can be fine-tuned from real-world data to bridge the gap between simulation and reality.

Collections

Summary

The paper introduces NeRD, a neural dynamics model that replaces traditional physics solvers, achieving robust simulation across tasks and environments.
It employs a causal Transformer with a robot-centric, hybrid prediction framework to ensure long-horizon stability and spatial invariance.
Experimental results demonstrate efficient sim-to-real transfer and superior error metrics, validating NeRD's effectiveness over conventional simulators.

Neural Robot Dynamics: A Technical Analysis

Motivation and Context

The simulation of articulated rigid-body robots is foundational for robotics research, enabling policy learning, control evaluation, and design optimization. Traditional analytical simulators, while physically grounded, struggle with modeling complex contact dynamics and adapting to real-world discrepancies. Neural simulators have emerged as a promising alternative, but prior approaches often lack generalizability due to inadequate state representations and overfitting to specific controllers or environments. "Neural Robot Dynamics" introduces NeRD, a neural dynamics model designed to replace the low-level physics solvers in classical simulators, leveraging a hybrid, robot-centric state representation to achieve generalization across tasks, environments, and controllers.

NeRD Architecture and Hybrid Prediction Framework

NeRD is architected as a modular neural backend for articulated rigid-body simulation. Rather than adopting an end-to-end approach, NeRD replaces only the application-agnostic components—forward dynamics and contact solvers—within a classical simulator. The model ingests a history window of robot-centric states, contact information, and joint torques, all expressed in the robot's base frame, enforcing spatial invariance under translation and rotation about the gravity axis.

Formally, NeRD is a parametric function:

$\text{NeRD}_\theta(\{s_k, C_k, T_k, g_k\}_{k=t-h+1}^t) \rightarrow \Delta s_{t+1}$

where $s_k$ is the robot state, $C_k$ contact quantities, $T_k$ joint torques, and $g_k$ gravity vector, all in the robot base frame. The model predicts the state difference $\Delta s_{t+1}$ , which is then transformed back to the world frame for integration.

The architecture is implemented as a causal Transformer (lightweight GPT-2 variant), with input and output normalization to stabilize training, and a history window ( $h=10$ ) to leverage temporal context for improved velocity estimation and long-horizon stability.

Implementation Details

NeRD is integrated into NVIDIA's Warp simulator as an interchangeable backend, utilizing GPU-parallelized collision detection. Training datasets are generated in a task-agnostic manner: 100K random trajectories per robot, each with 100 timesteps, randomized initial states, joint torques, and environment configurations. The model is trained via teacher-forcing, minimizing MSE between predicted and ground-truth state differences, with normalization to mitigate velocity term dominance.

Key hyperparameters include:

Transformer block size and depth tuned per robot DoF
Input/output normalization
History window size $h=10$
Batch size and learning rate with linear decay

Experimental Evaluation

Long-Horizon Stability and Accuracy

NeRD demonstrates high-fidelity long-horizon prediction on both contact-free (Cartpole) and contact-rich (Ant) systems. For Cartpole, after 1000 steps, the prismatic joint error is $0.033$ m and revolute joint error $0.075$ rad. For Ant (14-DoF, floating base), after 500 steps, base position error is $0.057$ m and orientation error $0.095$ rad. These results indicate minimal drift and robust stability over extended simulation horizons.

Generalization Across Contact Configurations

On the Double Pendulum with seven distinct ground configurations (contact-free, sliding, collision-induced stop), a single NeRD model achieves joint errors typically below $1^\circ$ and a maximum mean joint error of $0.056$ rad after 100 steps, demonstrating effective generalization to unseen contact scenarios.

Task, Controller, and Spatial Generalizability

NeRD supports policy learning via PPO for diverse tasks (Cartpole swing-up, Franka reach, Ant running/spinning, ANYmal velocity tracking) and controllers (joint-torque, joint-position). Policies trained exclusively in NeRD simulators achieve near-identical rewards when deployed in both NeRD and analytical simulators, with reward errors typically $<5\%$ and up to $17\%$ in the most challenging tasks. Notably, NeRD-trained policies generalize to spatial regions far outside the training distribution.

Sim-to-Real Transfer

Zero-shot sim-to-real transfer is validated on the Franka reach task. Policies trained in NeRD achieve a mean steady-state error of $1.927 \pm 0.699$ mm, outperforming those trained in the analytical simulator ( $4.647 \pm 2.667$ mm). This demonstrates NeRD's capacity to bridge the sim-to-real gap without explicit adaptation.

Fine-Tuning on Real-World Data

On the cube-tossing task, NeRD pretrained on simulation data and fine-tuned on real-world trajectories achieves position error $0.018$ m and orientation error $0.266$ rad, outperforming the analytical simulator and matching specialized models (ContactNets, GNN-Rigid). Fine-tuning converges in $<5$ epochs, $10\times$ faster than training from scratch, and requires $<10$ minutes, compared to $12$ hours for ContactNets.

Ablation Studies

Critical design choices are validated:

Transformer sequence modeling outperforms MLP and RNNs, especially for high-variance velocity prediction.
Hybrid prediction framework (using intermediate simulation quantities) is essential for generalization; end-to-end baselines fail on unseen contacts and spatial regions.
Relative state prediction stabilizes training; absolute state prediction increases error $26\times$ .
Robot-centric representation is vital for spatial generalizability; world-frame models fail when robots move outside training distribution.
Input/output normalization is necessary for balanced accuracy.
History window size $h=10$ provides optimal stability.

Computational Performance

NeRD achieves $46$K FPS with $512$ parallel Ant environments, exceeding Warp's $28$K FPS (with $16$ substeps). As a neural model, NeRD benefits from ongoing hardware and ML software acceleration.

Limitations and Future Directions

NeRD has not yet been evaluated on the most complex robots (humanoids, $20$-$50$ DoF), where analytical simulation is particularly challenging. Dataset construction via random sampling may become inefficient for high-dimensional systems; more effective, task-agnostic sampling strategies are needed. Fine-tuning currently assumes full state observability; future work should address adaptation from partially observable real-world data.

Conclusion

NeRD presents a robust, generalizable neural dynamics model for articulated rigid-body simulation, achieving stable long-horizon prediction, generalization across tasks, environments, and controllers, and efficient sim-to-real transfer. Its hybrid, robot-centric state representation and modular integration into classical simulators enable practical deployment and fine-tuning. The results suggest that neural simulation backends like NeRD can supplant traditional physics solvers for a wide range of robotics applications, with implications for scalable policy learning, sim-to-real transfer, and lifelong adaptation. Future research should extend NeRD to more complex systems, optimize dataset generation, and address partial observability in real-world adaptation.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (6)

Tweets

https://twitter.com/EmbodiedAIRead/status/1959125627866685595

https://twitter.com/arxivsanitybot/status/1959450119750762886

alphaXiv

Neural Robot Dynamics (27 likes, 0 questions)