IsaacGym Physics Simulator

Updated 17 November 2025

IsaacGym is a GPU-native physics simulation environment that enables the parallel execution of thousands of environments using CUDA-accelerated computation.
It integrates NVIDIA’s PhysX engine with deep learning frameworks like PyTorch via zero-copy CUDA tensors, dramatically speeding up robotics and RL experiments.
The simulator supports large-scale applications in system identification, model-based control, and planning, achieving orders-of-magnitude speedups over traditional CPU frameworks.

IsaacGym is a GPU-native physics simulation environment developed by NVIDIA to enable large-scale, high-throughput reinforcement learning (RL) and robotics research. By executing both physics simulation and policy learning entirely on the GPU, IsaacGym achieves massive acceleration over traditional CPU-based simulators, supporting thousands of concurrent environments and facilitating rapid development and evaluation of robot learning algorithms. This system has influenced subsequent frameworks such as Isaac Lab and has been central to advances in scalable system identification, model-based control, and domain randomization in robotics.

1. GPU-Native Architecture and Simulation Pipeline

IsaacGym is architected around the tight integration of a high-performance physics engine and deep learning libraries, all resident on the GPU. At its core, IsaacGym extends NVIDIA PhysX with GPU-only data structures, enabling batched simulation of thousands of environments in parallel without intermediate CPU transfers or bottlenecks (Makoviychuk et al., 2021).

Simulation data—including positions, velocities, joint forces, contact Jacobians, and mass matrices—are maintained as flat, contiguous CUDA tensors with direct interop to PyTorch for zero-copy access. The primary simulation pipeline comprises fused CUDA kernel launches per timestep, advancing all rigid-body environments (each represented as an independent “env”) in lock-step. CUDA kernel orchestration covers the following sequence per step:

Broadphase and narrowphase collision detection
Contact generation
Constraint resolution via single-stage Temporal Gauss-Seidel (TGS)
Semi-implicit Euler time integration

This architecture achieves upwards of 10,000–20,000 environments at 1000–2000 steps/sec on a single high-end GPU, with minimal CPU overhead (Antonova et al., 2021, Makoviychuk et al., 2021).

2. Physics Engine: Dynamics, Contact, and Integration

The simulation of dynamics in IsaacGym comprises maximal or reduced-coordinate rigid bodies. Each environment is governed by the equations:

$M(q)\;\dot{v} + C(q,v) + G(q) = \tau + J^{\top}\lambda$

where $M$ is the generalized mass/inertia matrix, $v$ the joint velocities, $C(q,v)$ Coriolis and centrifugal effects, $G$ gravity, $\tau$ actuation torques, $J$ the contact Jacobian, and $\lambda$ contact impulses (Makoviychuk et al., 2021, Pezzato et al., 2023).

Constraint solving leverages TGS integration, where per-constraint updates are iteratively solved, exploiting mixed complementarity for friction cones and normal impulses:

Residual $r = J_c \cdot v + b_c$ for each constraint $c$
Per-body velocity deltas accumulated
Scaling of updates to mimic multi-step integration at the cost of a single loop

Contact dynamics employ rigid-body, impulse-based solvers, with Project-Gauss-Seidel for resolving hard friction cones. The solver supports rich robotic systems, including floating-base agents (e.g., ANYmal, ShadowHand) and articulated manipulators.

Time integration uses semi-implicit Euler, advancing generalized positions and velocities according to:

$v_{k+1} = v_k + M^{-1}\left(\tau + J^\top\lambda - C(v_k) - g\right)\Delta t$

$q_{k+1} = q_k + v_{k+1}\Delta t$

All computation is batched across environments in CUDA, achieving orders-of-magnitude speedup over traditional CPU-based solvers (Makoviychuk et al., 2021).

3. Data Pipeline and Workflow Integration

IsaacGym exposes a transparent data pipeline via a Tensor API, which facilitates direct access to simulation buffers as PyTorch tensors on the device. The workflow unfolds as follows:

Scene setup (on CPU) via URDF/MJCF asset loading, randomization, and environment instantiation
Device pointers (descriptors) for DOF states, root states, and contact forces obtained with API calls (e.g., gym.acquire_dof_state_tensor)
These pointers are wrapped as high-level PyTorch tensors via CUDA interop, enabling both observation construction and reward computation directly in PyTorch (Makoviychuk et al., 2021)
Each timestep involves:
- Stepping the simulation (gym.step(sim))
- Batch forward policy inference (e.g., PPO in PyTorch) on observations
- Writing action outputs into control tensors, which are consumed by the simulator in the next step
- Optional resets or environment-specific interventions

The entire RL loop, including reward calculation, observation construction, and policy updates, remains GPU-resident. No data crosses PCIe except on explicit user request, which virtually eliminates CPU–GPU transfer overheads and enables tight integration with modern RL libraries.

4. Large-Scale Bayesian Inference and Domain Randomization

IsaacGym’s design supports specialized workflows for likelihood-free Bayesian inference and adaptive domain randomization at unprecedented scale. In BayesSimIG (Antonova et al., 2021), the GPU engine is combined with mixture density neural networks (MDNNs) to infer high-dimensional simulation parameter posteriors from observation trajectories without explicit likelihoods.

The procedure is as follows:

A prior $p(\theta)$ is specified over $D$ simulator parameters (e.g., joint frictions, link masses)
$M$ samples $\theta_i$ are drawn, and each is dispatched to $K$ parallel envs
All rollouts are collected in one “step_all” call per timestep; trajectory summaries $s_i$ are computed via on-GPU trajectory summarization (e.g., path signatures)
An MDNN (or MDRFF) $q_\phi(\theta|s)$ is trained via conditional density estimation, maximizing $\sum_i \log q_\phi(\theta_i | s_i)$
New real or surrogate rollouts yield $s^r$ ; the posterior is approximated as:

$\hat p(\theta|s^r) \propto \frac{p(\theta)}{\tilde p(\theta)} q_\phi(\theta|s^r)$

Samples from $\hat p(\theta|s^r)$ re-initialize the simulator for robust RL training

A case with $D=107$ (ShadowHand) and $M=100,000$ demonstrates the framework's scalability: full-scale posterior estimation and RL training iterations can be performed in 5–10 minutes on a single A100 GPU, compared to many hours on CPU: an approximate 50–100× speedup (Antonova et al., 2021).

5. Performance Benchmarks and Empirical Capabilities

IsaacGym achieves dramatic improvements in simulation and training throughput relative to legacy CPU-based frameworks (Makoviychuk et al., 2021). Key empirical results include:

Ant (8 DoF): $\sim$ 540K environment-steps/second with 4K environments; full task convergence in under 2 minutes
Humanoid (21 DoF): $>200$ K env-steps/s; full converge in $<$ 17 minutes
ShadowHand (20 DoF): standard symmetric policy—20 task successes in 35 minutes; LSTM—20 task successes in 1 hour (compared to 30 hours on a 6144-core CPU cluster with 8 V100s using OpenAI's own pipeline)
Up to $\sim$ 150K env-steps/s on 16K envs; 2–3 orders of magnitude speedup versus CPU simulators

Similar performance is observed for wide-ranging robotics tasks including legged locomotion, dexterous hand manipulation, and character animation (Makoviychuk et al., 2021).

6. Applications in Model-Based Planning and Control

IsaacGym serves as a generic, high-fidelity forward model for sampling-based control algorithms. In particular, Pezzato et al. (Pezzato et al., 2023) employ IsaacGym as the backbone of a sampling-based Model Predictive Path Integral (MPPI) controller. The simulator acts as the black-box $f(x,u)$ dynamical model for computing the forward propagation of sampled trajectories.

For each MPPI timestep, $K=300$ to $500$ parallel rollouts are executed in independent environments
Each trajectory is perturbed, integrated via IsaacGym’s simulation API, and the terminal/post-trajectory states are directly accessed as GPU tensors
Rollout costs and importance weights are accumulated to determine the optimal control sequence

This pipeline achieves control frequencies of $\sim$ 25 Hz for whole-body tasks with $K=500$ on an NVIDIA RTX 3070 Ti, demonstrating feasibility for real-time control in high-dimensional, contact-rich scenarios without explicit analytic robot models (Pezzato et al., 2023).

7. Limitations and Evolutionary Path

IsaacGym’s architecture, while enabling unprecedented parallelism, presents several limitations:

Physics modeling is restricted to rigid bodies and reduced-coordinate articulations (via PhysX); soft bodies, fluids, and cloth are not natively supported (Makoviychuk et al., 2021)
Single-GPU constraint; no out-of-the-box support for distributed multi-GPU (multi-node) simulation
Fine-tuning of low-level PhysX parameters (integration time step, solver iterations, friction parameters) is the user’s responsibility
Micro-scale deformation and certain contact phenomena (e.g., stick–slip) may not be captured with high fidelity; model mismatch with hardware necessitates domain randomization (Pezzato et al., 2023)

Subsequent frameworks such as Isaac Lab extend IsaacGym’s architecture with multi-GPU and data-center scale capabilities, photorealistic rendering, richer sensor simulation, modular actuator models, and a roadmap for differentiable physics integration with open-source engines like Newton (NVIDIA et al., 6 Nov 2025).

References:

"Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning" (Makoviychuk et al., 2021)
"BayesSimIG: Scalable Parameter Inference for Adaptive Domain Randomization with IsaacGym" (Antonova et al., 2021)
"Sampling-based Model Predictive Control Leveraging Parallelizable Physics Simulations" (Pezzato et al., 2023)
"Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning" (NVIDIA et al., 6 Nov 2025)