SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL (2506.04147v2)

Published 4 Jun 2025 in cs.RO, cs.AI, and cs.LG

Abstract: Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information, code, and videos at robo-rl.github.io

Summary

The paper introduces a simulation-pretrained latent action space that mitigates unsafe exploration and sample inefficiency in real-world RL.
It leverages temporal abstraction and disentanglement to simplify complex whole-body manipulation tasks with minimal physical interactions.
Empirical results demonstrate SLAC’s superior performance and scalability versus traditional sim-to-real transfer methods in mobile manipulation tasks.

An Analysis of SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World Reinforcement Learning

The presented research introduces SLAC, a novel framework designed to address the challenges in real-world reinforcement learning (RL) for complex high-degree-of-freedom (DoF) robotic systems, particularly mobile manipulators engaged in whole-body manipulation tasks. The authors propose a simulation-pretrained latent action space to facilitate task-agnostic skill acquisition, which enhances the learning of downstream tasks in real-world conditions.

Overview of SLAC Framework

SLAC stands apart by its two-step process: pretraining a latent action space in a low-fidelity simulator and deploying this space in a novel off-policy RL algorithm for real-world applications. This approach is particularly effective because it circumvents two significant obstacles associated with direct real-world RL: unsafe exploration and sample inefficiency.

The simulation environment utilized by SLAC is not required to be visually accurate but must preserve essential geometric affordances of the real physical space. Notably, this reduces the need for domain randomization or high-fidelity digital twins typically used in sim-to-real transitions, thus alleviating the "reality gap."

The latent action space is crafted using a specialized unsupervised skill discovery method. This method is finely tuned to achieve:

Temporal abstraction, allowing reduction in decision frequency and enabling more extended sequences of actions.
Disentanglement, ensuring that each latent action impacts a distinct aspect of the environment, which assists in multi-objective optimization without interdependencies.
Safety, enforcing behaviors that avoid potentially damaging interactions or configurations.

Once developed, this latent action space serves as an interface for learning complex tasks directly in the physical world without prior demonstrations. The methodology reports streamlining contact-rich task learning processes to under an hour of real-world robot interactions, which presents a staggering reduction in learning time compared to traditional methods.

Empirical Evaluation and Results

The authors implemented SLAC on a suite of mobile manipulation tasks, evaluating against other state-of-the-art methods like popular zero-shot sim-to-real transfer techniques and adaptive RL approaches. Results indicate that SLAC achieves superior performance in learning efficacy and safety, addressing multi-term reward structures efficiently through factorized Q-function decomposition—a technique that decomposes complex reward components into simpler tasks, conducive to improved learning efficiency.

Theoretical and Practical Implications

Theoretically, SLAC introduces a paradigm shift in leveraging low-fidelity simulations for effective action space pretraining, with implications for reducing computational and operational costs in robotics. Practically, this approach mitigates reliance on comprehensive simulation environments and extensive human-annotated demonstrations, pushing the envelope toward more autonomous and adaptable robotic systems for varied real-world applications.

Future Directions

Future research could explore the optimization of SLAC's latent action space for specific tasks—balancing between granularity and decision temporal span for different applications. Further, integrating adaptive real-world model learning alongside the latent space utilizing the latest advancements in model-based RL could potentially enhance task performance, even for fine-grained applications.

In conclusion, SLAC represents a forward-thinking strategy in the pursuit of autonomous systems capable of mastering complex real-world tasks with minimal human oversight and intervention. Its ability to address safety and efficiency issues inherent in traditional RL systems marks it as a significant contribution to both the academic community and potential real-world robotic applications.