cuRobo: GPU-Accelerated Robotics Library
- cuRobo is a GPU-accelerated robotics library designed for real-time, collision-free motion planning with applications in industrial and research settings.
- It employs a modular architecture with PyTorch front end and optimized CUDA kernels for fast inverse kinematics, trajectory optimization, and collision checking.
- The library integrates digital twin workflows and supports high-DOF robots using advanced techniques like particle-based seeding and gradient-based refinement.
cuRobo is a high-performance, GPU-accelerated robotics library tailored for real-time, collision-free motion generation, with a particular focus on trajectory optimization for robotic manipulators. By leveraging massively parallel GPU computation, cuRobo achieves high-speed inverse kinematics, minimum-jerk trajectory generation, and advanced collision checking within complex environments. Its modular design, optimized CUDA kernels, and integration with simulation and digital twin workflows make it distinctive within the motion planning ecosystem and suitable for both research and industrial settings.
1. Core Architecture and Computational Paradigm
cuRobo is architected with a PyTorch front end for differentiable computation and a low-level backend composed of custom, high-throughput CUDA kernels. Robot kinematics are represented via 4×4 homogeneous transformation matrices and geometric primitives are approximated using sets of overlapping spheres for computationally efficient collision checking. This "spherical approximation" enables robot self-collision and environment collision calculations to be reduced to simple sphere–sphere and sphere–world distance checks, making the system highly amenable to vectorized GPU execution.
The core computational components include:
- Forward and backward kinematics kernels (multi-threaded per query, e.g. 4 threads/forward pass, 16 threads/backward pass), optimized with shared memory and warp-level primitives.
- Custom collision checking kernels for both discrete and novel continuous ("swept") collision evaluation, supporting a variety of world representations such as cuboids, triangular meshes (via NVIDIA Warp’s BVH routines), and ESDF maps from measured environments (via nvblox).
- Trajectory optimization and inverse kinematics kernels leveraging high-bandwidth parallelism to run particle-based sampling, L-BFGS optimization, and massive-scale inverse kinematics queries.
cuRobo’s architecture exploits CUDA Graphs to minimize kernel launch overhead—beneficial when executing complex optimization pipelines at low latency.
2. Motion Generation Formulation and Optimization Workflow
Motion planning within cuRobo is formulated as a global trajectory optimization problem. The optimization seeks a collision-free, smooth, time-parameterized joint trajectory from an initial configuration to a user-defined goal (e.g., end-effector pose ), while minimizing a composite cost:
where:
- is a log-cosh error on end-effector position and orientation with respect to the goal,
- penalizes joint acceleration and jerk (finite-differenced using a five-point stencil for accurate higher-order derivatives) and soft-constraints enforce joint, velocity, acceleration, and jerk limits.
Trajectory optimization proceeds in two stages:
- Particle-based seeding: Hundreds/thousands of trajectories are sampled in parallel, scored, and the best are passed as warm-starts to the next phase.
- Gradient-based refinement: Parallel, batch-wise L-BFGS optimization is applied, utilizing a novel parallel noisy line search. Step sizes across candidate updates are evaluated concurrently to satisfy Armijo and Wolfe-like conditions; if none pass, a small ‘noisy’ reduction is performed to ensure robustness.
Time discretization is treated via a two-step refinement: a coarse solution is computed, optimal time scaling is inferred (to approach physical actuation limits without overshoot), and a final fine-resolution optimization is run.
3. Inverse Kinematics and Collision-Free IK Acceleration
cuRobo provides a GPU-parallelized inverse kinematics (IK) solver capable of executing thousands of queries per second. This acceleration is made possible through:
- Massively parallel batch execution, utilizing thread blocks for simultaneous IK queries.
- Collision-free IK: Each candidate IK solution is checked for both joint-limit compliance and environment collisions (self or external), leveraging the same spherical collision models as in the main optimization core. Performance benchmarks report standard IK acceleration of 23× and collision-aware IK up to 80× compared to conventional methods such as TracIK.
4. Parallel Geometric Planning and Real-Time Application
A key feature of cuRobo is its integration of a parallel geometric planner, which provides graph-based global path solutions as seeds or backups for trajectory optimization. This planner:
- Samples feasible joint-space configurations, connects nodes via a batch "steering" kernel (which performs parallelized collision-checked interpolation), and solves for shortest paths using parallel graph methods.
- Executes in real time (≤20 ms for typical workcells on modern GPUs), making it suitable for dynamic replanning scenarios and as a first-stage seed generator for the optimization pipeline.
This capacity enables cuRobo to support model-predictive control (MPC) workflows, dynamic obstacle avoidance, and multi-arm coordination in cluttered environments with low latency (e.g., planning latency of ~45 ms in large industrial scenes).
5. Integration of Digital Twins and Extended DOF Support
cuRobo natively interfaces with digital twinning infrastructure by accepting CAD-based input models for both robots and environments:
- Collision models are generated automatically via mesh processing to create the sphere-approximate geometry needed for fast collision checking.
- Extended DOF support: cuRobo handles systems with additional axes (e.g., 7th-axis gantries) using both integrated kinematic chains (via combined URDF models) and world obstacle models. The integrated approach optimizes coordination and achieves ~15% faster planning times, while the alternative provides memory savings at the expense of reduced coordination.
These features facilitate robust, collision-aware planning for high-DOF robots in expansive workspaces.
6. Quantitative Performance and Industrial Benchmarks
Published benchmarks demonstrate order-of-magnitude improvements over leading CPU-based planners:
- Planning time: Mean ~45 ms (cuRobo) vs. ~1200 ms (MoveIt) across representative pick-and-place and dynamic environments.
- Cycle time: 3.1 s vs. 9.9 s per pick-and-place iteration.
- Trajectory quality: cuRobo produces lower maximum jerk (2.1 rad/s³ vs. 5.8 rad/s³) and more efficient (∼12% shorter) paths.
- Robustness: 98.5% task completion rate in industrial test scenarios, despite workspace complexity and moving obstacles.
Even on embedded platforms (e.g., NVIDIA Jetson Orin NX), cuRobo achieves ~100 ms latency for complex, high-DOF planning tasks with trajectory libraries capable of handling 512 parallel solutions at 64 time steps and up to 500 Hz re-optimization.
7. Research Applications, Extensibility, and Integration
cuRobo exposes a set of modular components:
- Rollout and optimization APIs support custom cost functions, learning-based augmentations, and integration with PyTorch frameworks for end-to-end differentiability. This enables the library to serve as a backend in imitation learning, reinforcement learning, or model-based RL.
- Geometry module allows for plug-in environment representations and custom collision layers.
- Wrapper API and differentiable layers facilitate drop-in use in simulation, digital twin, or higher-level motion planning stacks.
Applications include production-level real-time industrial robotics (pick-and-place, dynamic collision avoidance, multi-arm systems), research platforms for large-scale trajectory evaluation, and as a backend for reachability, task and motion planning, and trajectory smoothing.
Table: cuRobo Capabilities and Benchmarks
Capability | Performance Metric | Context/Notes |
---|---|---|
Trajectory planning time | ~45 ms (GPU) | 60× faster than MoveIt |
Collision-free IK | >7000 queries/s | Up to 80× faster than TracIK |
Parallel trajectory eval | 512 samples × 64 steps @500 Hz | Jetson Orin NX |
Task completion rate | 98.5% | Industrial workflows |
Peak jerk (planned trj.) | 2.1 rad/s³ (cuRobo) | vs. 5.8 rad/s³ (MoveIt) |
These metrics confirm cuRobo’s suitability for deployment in time-critical, high-volume environments, as well as for scalable research experimentation.
cuRobo’s approach demonstrates that reformulating global motion optimization for modern GPU hardware, with a parallelism-oriented algorithm and data model, delivers substantive gains in industrial, research, and dynamic planning applications (Sundaralingam et al., 2023, Abuelsamen et al., 6 Aug 2025).