mjlab: A Lightweight Framework for GPU-Accelerated Robot Learning
Abstract: We present mjlab, a lightweight, open-source framework for robot learning that combines GPU-accelerated simulation with composable environments and minimal setup friction. mjlab adopts the manager-based API introduced by Isaac Lab, where users compose modular building blocks for observations, rewards, and events, and pairs it with MuJoCo Warp for GPU-accelerated physics. The result is a framework installable with a single command, requiring minimal dependencies, and providing direct access to native MuJoCo data structures. mjlab ships with reference implementations of velocity tracking, motion imitation, and manipulation tasks.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper introduces mjlab, a simple, fast, and open-source software toolkit that helps researchers teach robots new skills inside a computer simulation. It runs many robot simulations at the same time on a graphics card (GPU), is easy to install, and is built on top of a trusted physics engine called MuJoCo. The goal is to let people focus on the “learning” part (rewards, goals, and training) instead of wrestling with complicated setup.
What questions were the authors trying to answer?
- How can we make robot learning in simulation both fast and easy to use?
- Can we get the best of both worlds: the clean, reusable “building blocks” of a big framework, without heavy downloads or slow startup?
- Can we keep the physics transparent (easy to inspect and debug) so researchers can trust and tweak the details that matter when moving from simulation to real robots?
How did they build and test it?
The authors designed mjlab around a few simple ideas and tools that work well together.
Fast physics on the GPU
- Think of a GPU like a supermarket with thousands of cashiers instead of one. It can handle many small jobs at once. mjlab uses MuJoCo Warp (a GPU-powered version of MuJoCo) to run thousands of robot “worlds” in parallel. Each world is a separate copy of the robot and its environment taking its own steps.
- mjlab also “records” the sequence of physics steps once (like making a macro or a playlist) and then replays it. This cuts down on overhead and makes each step faster.
Building environments like LEGO
- Instead of writing one giant, messy script for each task, mjlab lets you snap together small pieces called “managers.” Each manager handles one part of training—like computing rewards or checking when to reset the robot.
- This “manager-based” design acts like a set of LEGO blocks. You can reuse the same blocks across different robots and tasks, which saves time and reduces bugs.
Core pieces the system uses
To keep things practical, mjlab includes a few key building blocks you can mix and match:
- Robots and objects (“entities”): Anything physical in the scene—like a humanoid, a cube, or the ground.
- Sensors: Ways for the robot to “feel” the world, such as contact forces, rays that scan the terrain, or simple cameras.
- Actuators: The robot’s “muscles” (motors) that create movement. mjlab supports basic MuJoCo motors and custom ones like PD controllers (which nudge joints toward target positions smoothly).
- Terrain: Ready-made ground types from flat floors to stairs and wavy surfaces, with easy difficulty settings.
The training loop and its managers
During each training step, mjlab runs a pipeline where different managers do their job in order. Here are the managers and what they do:
- Action manager: Takes the policy’s action (what the robot wants to do) and sends it to the motors.
- Simulation: Advances the physics a few tiny sub-steps so motion is accurate and stable.
- Termination manager: Checks if an episode should end (for example, the robot falls or time runs out).
- Reward manager: Adds up points and penalties to guide learning (like scoring in a game).
- Reset and curriculum: Resets failed robots and, over time, makes tasks harder—like moving up levels in a video game.
- Event manager: Adds variety (domain randomization), for example by changing friction or weight so the robot doesn’t overfit to one exact setup.
- Command manager: Sets goals (like “walk forward at 1 m/s”).
- Observation manager: Builds what the robot “sees” (all the sensor readings and goal info) for the next decision.
Ease of use and code design
- PyTorch-friendly: mjlab shares memory between the simulator and PyTorch without copies, so you can write rewards and observations in regular PyTorch code.
- Simple configs: Settings are plain, typed Python configs you can tweak from the command line. No complicated class inheritance required.
- Minimal dependencies: Install and run training with a single command using a fast Python tool. This reduces setup headaches.
- Testing and typing: The codebase is well-tested and uses static typing, making it easier to trust and extend.
What did they find?
- Speed and scale: mjlab can simulate thousands of robots at once on a single GPU. That means faster experiments and training.
- Modularity that works: The “manager” design makes it easy to build and reuse tasks without repeating code.
- Transparent physics: Because it’s built on MuJoCo and exposes its data directly, researchers can debug low-level details (like contact forces) when needed.
- Real tasks out of the box: mjlab ships with three example tasks:
- Walking and running while following speed and turn commands, on both flat and rough ground.
- Humanoid motion imitation (like copying a dance or spin kick).
- A robot arm lifting a cube to a target.
- Early adoption: It’s already been used in a university robotics class and by open-source projects, and it has demo videos showing natural-looking motions and real robot transfers.
Why does this matter?
- Faster progress in robot learning: Running many simulations at once shortens training time, so ideas can be tested and improved quickly.
- Better sim-to-real transfer: Trustworthy, inspectable physics and smart variety (domain randomization) help policies learned in simulation work better on real robots.
- Lower barrier to entry: Easy installation and clear building blocks make it feasible for students, researchers, and hobbyists to start experimenting without weeks of setup.
- Reusability and collaboration: Clean, modular pieces encourage sharing and extending tasks and robots across labs and projects.
In short, mjlab is like a fast, clean workshop for robot learning: it gives you the right tools—speedy simulations, snap-together components, and clear physics—so you can focus on teaching robots useful skills and moving those skills from the computer to the real world.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a focused list of what the paper leaves missing, uncertain, or unexplored, framed as concrete, actionable items for future work:
- Absent quantitative performance benchmarks:
- No throughput (steps/sec), latency, or scaling curves vs CPU MuJoCo, Isaac Lab/PhysX, or other GPU simulators.
- No analysis of manager-layer overhead vs monolithic step loops, nor the gains from CUDA graph capture.
- No GPU memory profiling or characterization of maximum parallel environment counts across different robots/tasks.
- Single-GPU simulation focus:
- Multi-GPU simulation support (partitioning worlds across devices, inter-GPU synchronization, and data movement) is not described or evaluated.
- Interaction between multi-GPU training (via
torchrunx) and single-GPU simulation is not characterized (e.g., contention, throughput bottlenecks).
- Determinism and reproducibility:
- No statement on bitwise determinism across runs/GPUs/driver versions, or handling nondeterministic GPU reductions.
- Seeding strategy and reproducibility guarantees for per-world randomization and curriculum transitions are unspecified.
- Physics fidelity and parity with CPU MuJoCo:
- No validation that MuJoCo Warp reproduces CPU MuJoCo dynamics for contact-rich scenarios (e.g., penetration depths, impulse distributions, constraint stabilization).
- Unclear support and testing for closed-chain mechanisms, equality constraints, and joint limit behaviors in GPU mode.
- No assessment of numerical precision choices (FP32/TF32/FP16) and their impact on stability/accuracy.
- Domain randomization mechanics and cost:
- Rebuilding CUDA graphs when expanding model fields to per-world arrays is described but not benchmarked; amortized cost and recommended randomization frequency are unknown.
- Memory overhead and fragmentation risks from per-world expansions are not analyzed.
- Actuator model validation and identification:
- PD, DC-motor, and learned MLP actuators are provided, but there is no methodology or quantitative validation against real hardware dynamics.
- Training pipeline, data requirements, and regularization for the learned MLP actuator (to avoid instability and ensure generalization) are not specified.
- Delay and latency modeling:
- Only fixed, timestep-quantized actuation delays are supported; variable/jittery network delays and time-synchronization issues are not addressed.
- End-to-end latency budgets (sensor→policy→actuator) and their impact on control quality are not analyzed.
- Sim-to-real transfer:
- Beyond anecdotal videos, there is no systematic evaluation of transfer success rates, failure modes, or ablations on domain randomization recipes.
- No guidance for system identification, parameter calibration, or automated friction/contact tuning for specific hardware.
- Sensing limitations:
- High-fidelity RGB rendering is out of scope; the experimental tiled camera lacks evaluation (latency, resolution, depth-of-field, anti-aliasing, noise models).
- Missing support/validation for common robotics sensors (e.g., multi-beam LiDAR, event cameras, multi-view cameras) and realistic noise/latency/sync models.
- Vision policy pipeline:
- The privileged-to-vision distillation workflow is mentioned but not implemented or evaluated (e.g., datasets, renderers, augmentation, and training scripts).
- Task diversity and standardization:
- Only three reference tasks/robots are shipped; no standard benchmark suite, task templates, or leaderboards for reproducible comparison across methods.
- Absent tasks for dexterous hands, bimanual manipulation, mobile manipulation, multi-agent interactions, or contact-rich assembly.
- Curriculum learning design:
- No ablation or policy on curriculum progression/regression criteria, stability under non-stationary objectives, or sample efficiency trade-offs.
- Algorithmic scope:
- RSL-RL on-policy focus; off-policy algorithms (SAC/TD3) with large replay buffers on GPU are neither integrated nor benchmarked.
- No support/evaluation for multi-task, meta-RL, or hierarchical RL under the manager-based API.
- Robustness and failure analysis:
- While NaN/Inf detection and replay buffers are provided, there is no automated triage, root-cause analysis, or systematic cataloging of failure modes (e.g., task, manager term, or kernel-level issues).
- No stress tests for extreme contacts, high-frequency actuation, or stiff joints that commonly cause instability.
- Viewer and visualization scalability:
- Performance limits of the Viser-based web viewer (frame rate, bandwidth, simultaneous clients) with thousands of environments are not reported.
- No profiling of visualization overhead on simulation throughput.
- Real-time deployment and middleware integration:
- No ROS2/LCM integration, real-time scheduling guarantees, or hardware IO bridges are provided for closing the loop on physical robots.
- Safety interlocks, constraint enforcement, and emergency-stop integration during deployment are not discussed.
- Extensibility and plugin ecosystem:
- Plugin patterns, versioning/ABI stability for custom managers/sensors/actuators, and backward-compatible configuration migration are not specified.
- Serialization of environment configurations and seeds for artifact reproducibility is not described.
- Terrain and environment modeling:
- Terrain module is limited to static rigid terrains; no support for compliant or dynamic surfaces, moving obstacles, or environment agents.
- Friction anisotropy, rolling resistance, or more complex contact/friction models are not exposed or validated.
- Operating system and hardware support matrix:
- Installation and CI test matrix across OSes (Linux/Windows/macOS), GPU architectures, and CUDA/driver versions is not documented.
- Absence of performance regression tests and long-run stability tests across hardware configurations.
- Interoperability:
- No path for cross-simulator validation (e.g., CPU MuJoCo for cross-checks) or adapters to/from other ecosystems (Isaac Lab, Gazebo/Ignition) despite adopting a similar manager paradigm.
- Data logging and evaluation tooling:
- No standardized logging schema, dataset export, or evaluation harness (success metrics, policy checkpoints, diagnostics) for fair comparisons.
- Precision-performance trade-offs:
- Unclear whether mixed-precision or kernel-level optimizations are leveraged, and their effect on stability vs throughput.
- Security and quality controls for AI-assisted contributions:
- While AI-generated PRs are mentioned, policies for code verification (beyond unit tests), safety checks, and long-term maintainability are not detailed.
Practical Applications
Immediate Applications
Below are concrete, deployable use cases that leverage the paper’s released framework, shipped tasks, and existing integrations. Each item notes the primary sector(s), the potential tool/product/workflow, and assumptions/dependencies that affect feasibility.
- GPU-accelerated training of locomotion controllers for commercial legged robots
- Sectors: robotics, logistics, public safety
- Tool/Product/Workflow: Use
mjlab’s velocity-tracking task with curriculum-based terrain to train Unitree Go1/G1 policies; deploy training viauvsingle-command install, scale withtorchrunx, and debug with theViserweb viewer - Assumptions/Dependencies: Access to a modern NVIDIA GPU;
MuJoCo WarpandPyTorchavailable; accurate actuator modeling and domain randomization configured for the target hardware; sim-to-real transfer validated per robot
- Rapid prototyping of manipulation policies (pick-and-lift) for industrial and lab arms
- Sectors: manufacturing, research labs
- Tool/Product/Workflow: Start from the shipped YAM cube-lifting task; build custom observation and reward terms via the manager-based API; iterate rewards and events in pure
PyTorchusingTorchArray - Assumptions/Dependencies: MJCF model quality for the target arm; contact sensing configuration; sim-to-real requires calibration of friction, delays, and actuator dynamics
- Modular RL environment development for custom robots without code duplication
- Sectors: software, robotics startups
- Tool/Product/Workflow: Adopt the manager-based API (observation, reward, termination, curriculum, events) to author reusable environment terms; ship internal “environment templates” per robot family
- Assumptions/Dependencies: Team familiarity with MJCF/MuJoCo conventions; single physics backend (no cross-simulator portability)
- Course-ready robot learning labs for universities and bootcamps
- Sectors: education, academia
- Tool/Product/Workflow: Deliver hands-on assignments using
mjlab’s CLI-first configs andViserfor headless visualization; replicate the UC Berkeley deployment for ME 292b/193b - Assumptions/Dependencies: GPU availability in labs or cloud; teaching staff comfortable with
tyro/CLI overrides; basic Python/RL background for students
- Large-scale policy benchmarking on a single GPU
- Sectors: academia, software tooling
- Tool/Product/Workflow: Run thousands of parallel environments via
MuJoCo Warpwith CUDA graph capture; compare on-policy learners usingRSL-RL - Assumptions/Dependencies: Sufficient GPU memory; stable reward scaling (time-invariant reward magnitudes); logging of per-term diagnostics to avoid training pathologies
- Web-based remote debugging and monitoring of simulation runs
- Sectors: software, DevOps for robotics
- Tool/Product/Workflow: Use the
Viserviewer to pause/resume, visualize contacts, and inspect recent state buffers when NaN/Inf is detected by the termination manager - Assumptions/Dependencies: Server/network access; headless rendering acceptable (RGB fidelity out of scope)
- Hardware-specific actuator characterization using learned MLP actuators
- Sectors: robotics R&D, manufacturing QA
- Tool/Product/Workflow: Fit the
MLPactuator term to logged hardware data; compare against the provided ideal PD and DC motor models to improve sim fidelity - Assumptions/Dependencies: Representative actuator datasets; consistent control latency modeling via the provided delay wrapper; careful validation to prevent overfitting
- Domain randomization pipelines for robustness testing
- Sectors: robotics, QA/compliance
- Tool/Product/Workflow: Author event terms to randomize friction, masses, and terrain; rely on
mjlab’s per-world model expansion and transparent CUDA graph rebuilds - Assumptions/Dependencies: Well-chosen randomization ranges; monitoring of per-term stability; acceptance that randomized sim may diverge from specific hardware edge cases
- Hobbyist and creator-friendly robot learning demos
- Sectors: daily life, creator economy
- Tool/Product/Workflow: Reproduce humanoid motion imitation (e.g., dance/tricks) with shipped tasks; follow community tutorials and the single-command install to share results
- Assumptions/Dependencies: Consumer-grade GPU; external rendering if RGB is needed; availability of motion clips for imitation
Long-Term Applications
The following use cases are plausible extensions or scale-ups that require further research, engineering, or productization before broad deployment.
- End-to-end sim-to-real pipelines for heterogeneous fleets (locomotion + manipulation)
- Sectors: logistics, manufacturing, service robotics
- Tool/Product/Workflow: Unified manager-based environments across multiple robot morphologies; shared curriculum and event libraries; fleet-wide policy training and evaluation
- Assumptions/Dependencies: Integration with ROS 2 and operations tooling; robust sim-to-real procedures per platform; safety certification and fail-safes
- Vision-based controllers via privileged-to-vision policy distillation
- Sectors: robotics, software
- Tool/Product/Workflow: Train privileged policies in
mjlab(full-state), then distill to camera-based policies using external high-fidelity rendering and datasets - Assumptions/Dependencies: External rendering pipeline (RGB out of scope in
mjlab); quality camera models and datasets; careful domain randomization of visual conditions
- Generalist humanoid skills for entertainment and service tasks
- Sectors: entertainment, hospitality, retail
- Tool/Product/Workflow: Scale motion imitation with BeyondMimic-style guided diffusion and expanded motion libraries; build “skill packs” for common tasks (dance, greet, carry)
- Assumptions/Dependencies: Reliable hardware (humanoid balance, contact-rich skills); curated motion datasets; safety policies for public interaction
- Cloud-hosted “robot learning lab” as a managed service
- Sectors: software, education, enterprise R&D
- Tool/Product/Workflow: Offer hosted
mjlabclusters with GPU pools,Viserdashboards, and per-tenant environment libraries for coursework and prototyping - Assumptions/Dependencies: Cost-effective GPU provisioning; multi-tenant isolation; usage-based billing and quota management
- Regulatory and audit frameworks built on transparent, open simulation
- Sectors: policy, public sector procurement
- Tool/Product/Workflow: Use
mjlab’s inspectable MuJoCo-native data structures and typed configs to create auditable training artifacts and safety test batteries - Assumptions/Dependencies: Agreed-upon benchmarks and reporting standards; independent validation bodies; bridging from sim tests to field trials
- Energy-aware controller optimization
- Sectors: energy, operations
- Tool/Product/Workflow: Add reward terms for torque/energy budgets; run large parallel sweeps to identify energy-efficient policies for robot fleets
- Assumptions/Dependencies: Accurate actuator energy models; field telemetry for validation; potential trade-offs with task performance and safety
- Multi-agent and multi-robot coordination in shared environments
- Sectors: warehousing, agriculture, construction
- Tool/Product/Workflow: Extend the entity/manager abstractions to multi-agent RL tasks (collision avoidance, cooperative transport) with curriculum across difficulty tiers
- Assumptions/Dependencies: New coordination reward/event terms; scalable observation pipelines; rigorous safety constraints and scenario generation
- On-robot fine-tuning and adaptive control
- Sectors: field robotics, defense, disaster response
- Tool/Product/Workflow: Use lightweight policies trained in
mjlabas priors; adapt online with limited on-board compute for changing terrains or payloads - Assumptions/Dependencies: Embedded GPU/accelerators; safe online learning methods; reliable fallbacks and supervisors
- Workforce upskilling and K–12 STEM expansions
- Sectors: education, public policy
- Tool/Product/Workflow: Preconfigured kits and curricula using
mjlabfor foundational RL concepts in robotics; remote visualization to reduce lab hardware needs - Assumptions/Dependencies: Budget for GPUs or cloud credits; teacher training; age-appropriate content and assessment frameworks
- Financial planning and ROI modeling for robot learning infrastructure
- Sectors: finance (corporate), operations
- Tool/Product/Workflow: Cost models comparing GPU-accelerated parallel training vs. physical trials; portfolio of pre-trained policies to shorten deployment timelines
- Assumptions/Dependencies: Accurate accounting of compute costs and failure rates; validated transfer rates from sim to real; organizational readiness to adopt RL-driven workflows
Glossary
- Actuation delay: Latency between issuing a control command and its effect on the actuator/system. "Actuation delay---common in real robots---is modeled by a wrapper class that buffers control signals and replays them with a latency quantized to the physics timestep."
- Anchor pose: A reference pose from a motion trajectory used for tracking in imitation tasks. "The policy observes an anchor pose from the reference trajectory, base velocities, joint states, and the current action."
- Articulation: The presence of joints in a body enabling internal motion. "base type (fixed or floating) and articulation (with or without joints)."
- Asymmetric actor-critic architectures: RL setups where the actor and critic use different observation pipelines. "Multiple observation groups (e.g., policy and critic) can coexist, each with its own processing pipeline, enabling asymmetric actor-critic architectures."
- CUDA graph: A captured sequence of GPU kernel launches that can be replayed to reduce CPU overhead. "mjlab further captures the simulation step as a CUDA graph: the kernel execution sequence is recorded once and replayed on subsequent calls, eliminating CPU-side dispatch overhead."
- Curriculum manager: Component that adjusts task difficulty or training conditions based on performance. "The curriculum manager adjusts training conditions based on policy performance."
- DC motor model: A physics-based model of a DC motor’s torque–speed behavior. "a DC motor model with velocity-dependent torque saturation"
- DeepMimic: A framework for example-guided deep reinforcement learning of physics-based character skills. "implementing the DeepMimic~\citep{peng2018deepmimic} framework with extensions from BeyondMimic~\citep{liao2025beyondmimic}."
- Decimation: Performing multiple physics sub-steps per control step for stability and accuracy. "For each of decimation sub-steps: apply actuator commands, write controls to the simulation, advance physics, and update entity state."
- Domain randomization: Randomly varying simulation parameters to improve robustness and sim-to-real transfer. "The most common use case is domain randomization."
- End-effector: The terminal link of a robot arm (e.g., gripper) that interacts with objects. "the vector from end-effector to cube"
- Gym interface: The standard environment API exposing reset and step for MDPs. "mjlab environments implement the Gym~\citep{brockman2016gym} interface, a standard API for defining Markov decision processes (MDPs)."
- Heightfield terrains: Terrain represented by a height map defining continuous surface profiles. "heightfield terrains for smoother, continuous profiles (sloped pyramids, uniform noise, sinusoidal waves)."
- IMU: An inertial measurement unit providing accelerations and angular velocities. "The policy observes IMU readings, projected gravity, joint positions and velocities, the previous action, and the commanded twist."
- Interpenetration: Undesired overlap of bodies indicating collision issues in simulation. "A self-collision cost discourages interpenetration."
- Isaac Lab: NVIDIA’s GPU-accelerated simulation platform with a manager-based API. "mjlab adopts the manager-based API introduced by Isaac Lab~\citep{mittal2025isaaclab}"
- Manager-based API: Environment design pattern where modular terms are registered under managers that handle their lifecycle. "mjlab adopts the manager-based API introduced by Isaac Lab."
- Markov decision processes (MDPs): The mathematical formalism for sequential decision-making with states, actions, and rewards. "mjlab environments implement the Gym interface, a standard API for defining Markov decision processes (MDPs)."
- MJCF: MuJoCo’s XML-based format for defining models, robots, and scenes. "constructs scenes by composing entity descriptions defined via MJCF into a single \href{https://mujoco.readthedocs.io/en/stable/programming/modeledit.html}{MjSpec}."
- MjData: MuJoCo structure holding time-varying simulation state. "while {MjData} carries the time-varying simulation state."
- MjModel: MuJoCo structure holding the static kinematic and dynamic model description. "{MjModel} holds the static kinematic and dynamic description of the scene"
- MjSpec: A specification used to compose and compile models before creating an MjModel. "into a single MjSpec."
- MLP actuator: A learned actuator modeled by a multilayer perceptron to capture hardware-specific dynamics. "a learned MLP actuator for capturing hardware-specific dynamics from data."
- MuJoCo Warp: A GPU-accelerated backend for MuJoCo built on NVIDIA Warp. "MuJoCo Warp \cite{mujocowarp2025} (\href{https://mujoco.readthedocs.io/en/stable/mjwarp/index.html}{docs}) is a GPU-accelerated backend for MuJoCo built on \href{https://nvidia.github.io/warp/}{NVIDIA Warp}~\citep{warp2022}."
- NVIDIA Warp: A high-performance framework for GPU simulation and graphics used by MuJoCo Warp. "built on \href{https://nvidia.github.io/warp/}{NVIDIA Warp}~\citep{warp2022}."
- On-policy algorithms: RL methods that learn from data generated by the current policy. "Training uses RSL-RL~\citep{schwarke2025rslrl} for on-policy algorithms,"
- PD controller: A proportional-derivative feedback controller that computes torques from error and its rate. "an ideal PD controller,"
- PhysX: NVIDIA’s physics engine used in many simulators and games. "Its physics engine, PhysX, was closed-source until recently, making low-level debugging and introspection difficult."
- Ray-cast sensor: Sensor that casts rays to measure geometry or distances, e.g., terrain height scanning. "a ray-cast sensor for terrain height scanning,"
- RSL-RL: A robotics-focused reinforcement learning library used for training policies. "Training uses RSL-RL~\citep{schwarke2025rslrl} for on-policy algorithms,"
- Self-collision cost: A penalty discouraging collisions between a robot’s own links. "A self-collision cost discourages interpenetration."
- Sim-to-real: Transferring policies learned in simulation to real hardware. "The fidelity of this sim-to-real pipeline hinges on getting simulation details right:"
- Tiled-rendering camera: A camera that renders via tiles to manage performance or resolution. "an experimental tiled-rendering camera."
- Torque saturation: Limits on actuator torque output, often dependent on velocity. "velocity-dependent torque saturation,"
- TorchArray: A zero-copy wrapper exposing Warp arrays as PyTorch tensors. "mjlab bridges this gap with a {TorchArray} abstraction: a zero-copy wrapper that exposes Warp arrays as PyTorch tensors."
- Warp arrays: GPU-resident arrays used by MuJoCo Warp to store simulation state. "MuJoCo Warp stores simulation state in Warp arrays,"
- World dimension: A leading dimension indexing multiple parallel simulation instances. "The key addition is a leading world dimension:"
- Zero-copy wrapper: A mechanism to share memory across frameworks without duplicating data. "a zero-copy wrapper that exposes Warp arrays as PyTorch tensors."
Collections
Sign up for free to add this paper to one or more collections.