- The paper introduces SRB, a simulation platform that uses procedural generation and domain randomization to enhance robot learning in space.
- It demonstrates high-throughput GPU-accelerated simulations and successful zero-shot transfer of policies to physical robots.
- SRB supports diverse scenarios—including orbital, lunar, and terrestrial domains—to train adaptive and robust reinforcement learning agents.
Space Robotics Bench: Robot Learning Beyond Earth
Introduction
"Space Robotics Bench: Robot Learning Beyond Earth" introduces the Space Robotics Bench (SRB), an open-source simulation framework designed to address the challenges of robot learning in space environments. The paper outlines the necessity of robust autonomous systems capable of operating in extraterrestrial conditions and proposes SRB as a tool to facilitate learning in such domains. The framework leverages extensive procedural content generation (PCG) and domain randomization (DR) to produce a diverse set of training environments, thereby enhancing the generalization capabilities of learning agents.
Framework Architecture
SRB is built upon NVIDIA Isaac Sim and the Isaac Lab, incorporating GPU-accelerated simulations and modular components that facilitate extensive customization. The framework is engineered for massive parallelism, allowing the simulation of thousands of environments concurrently, which is crucial for the data-intensive demands of reinforcement learning (RL).
Procedural Generation and Training
The SRB heavily utilizes PCG and DR to create diverse training datasets, which are critical for bridging the sim-to-real gap. By exposing agents to a wide range of scenarios, SRB helps in training policies that are robust against variations in real-world environments. This diversity is not only in the environment but extends to the robots themselves, promoting research in adaptive and generalist robotic systems.
Figure 1: Space Robotics Bench brings robot learning to space by leveraging procedural diversity across a wide range of cross-domain scenarios. The entire workflow is validated through a successful zero-shot sim-to-real transfer to a physical robot.
Scenarios and Domains
SRB supports various domains, including orbital, lunar, and terrestrial environments, each with unique challenges and characteristics. Tasks such as dynamic waypoint tracking and manipulation in microgravity are part of the benchmark suite, designed to test and improve algorithm capabilities across different robotic morphologies and mission scenarios.
Figure 2: SRB supports diverse scenarios that encompass a wide range of challenges across multiple domains of space robotics.
The framework’s efficiency allows for high-throughput simulations, vital for RL. Baseline performance evaluations using standard RL algorithms, such as PPO, TD3, and DreamerV3, establish the difficulty and realism of the tasks. The DreamerV3 agent, in particular, demonstrates superior sample efficiency, although with increased computational demands.















Figure 3: Learning curves of RL baselines, averaged over three random seeds. Shaded regions represent the standard deviation. The dashed lines in plots with a defined success condition qualitatively indicate the threshold for consistent task success.
Sim-to-Real Transfer
A key contribution of the SRB is the successful zero-shot transfer of policies from simulation to real-world hardware. The framework’s design, which incorporates procedural diversity, plays a critical role in minimizing the sim-to-real gap. The paper showcases this through dynamic waypoint tracking tasks, demonstrating robustness and generalization abilities when deploying trained policies on physical robots.

Figure 4: Sim-to-real trajectory tracking of different RL agents.
Advanced Features and Customizability
SRB extends beyond standard policy training by providing customizable actuation models, including OSC for learning adaptive compliance, and support for complex visuomotor tasks. These features enable research into advanced robotic capabilities, such as compliant manipulation and end-to-end sensorimotor control, offering a rich environment for developing novel RL algorithms.
Figure 5: The action space of the OSC agent combines motion commands with stiffness and damping gains to achieve learned adaptive compliance.
Discussion
The Space Robotics Bench addresses the significant challenges in developing autonomous systems for space applications through procedural diversity and robust simulation capabilities. While limitations such as the perceptual sim-to-real gap remain, the framework offers a valuable platform for bridging these challenges and advancing research in space robotics.

Figure 6: The sim-to-real gap in depth perception, illustrated by the contrast between a clean simulated map and noisy data from a physical camera.
Conclusion
The SRB is a comprehensive tool for advancing space robotics research, providing a procedural framework that enhances the robustness and adaptability of learning agents. Its open-source nature and integration of diverse robotic and environmental scenarios position it as a crucial resource for developing the next generation of autonomous space systems.