- The paper presents a novel taxonomy that categorizes DRL methods in robotics by competencies, problem formulations, solution approaches, and real-world success levels.
- It analyzes diverse domains such as locomotion, navigation, and manipulation, demonstrating advances in sim-to-real transfer and hierarchical control.
- It highlights open challenges including sample efficiency, safe exploration, and long-horizon skill composition, guiding future DRL research.
Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes
This survey rigorously categorizes and analyzes the body of work in deep reinforcement learning (DRL) for robotics with a distinct focus on real‐world deployments. The paper establishes a novel taxonomy that segments the literature along four dimensions: the robotic competencies learned via DRL, problem formulations, solution approaches, and the level of real‐world success observed. This systematic framework provides both a unified perspective over diverse robotic tasks and a structured basis for identifying open challenges in the field.
( Figure 1 )
Figure 1: Taxonomy of DRL in robotics delineating robot competencies, problem formulations, solution approaches, and real-world success levels.
A central contribution of the survey is its detailed taxonomy. The authors classify robotic applications into domains such as locomotion, navigation, manipulation (including subcategories like pick-and-place, contact-rich, in-hand, and non-prehensile), mobile manipulation, human–robot interaction (HRI), and multi-robot interaction. For each domain, the paper breaks down the problem formulation into three axes:
- Action Space: Policies may output low-level joint or motor commands, mid-level commands in task-space (often interfaced with classical controllers), or high-level, temporally extended skills. The survey discusses the trade-offs inherent in each choice, noting that while low-level control provides maximum flexibility, it is also high-dimensional and sensitive to exploration.
- Observation Space: Solutions utilize either high-dimensional sensor modalities (e.g., raw images, LiDAR scans) or low-dimensional state representations derived via estimation. The choice affects both sample efficiency and the complexity of transferring policies to the real world.
- Reward Function: The distinction between sparse and dense rewards is highlighted. Dense rewards, often shaped using domain knowledge, can improve sample efficiency but may embed bias that affects generalization, particularly in long-horizon tasks.
Solution Approaches
The paper further breaks down the proposed solution approaches along several key dimensions:
- Simulator Usage: The survey differentiates between zero-shot sim-to-real transfer, where policies are trained in simulation and directly deployed, and few-shot methods where limited real-world finetuning is required. In certain cases, systems forego simulators entirely and learn with real-world data.
- Model Learning: An important trend is the increasing integration of model learning—whether full dynamics models or residual models—and its combination with model-free reinforcement learning for improved sample efficiency.
- Expert Usage: The use of expert data, including human demonstrations or oracle policies, is analyzed as a means to overcome exploration challenges and to shape reward functions that are otherwise sparse.
- Policy Optimization and Representation: The survey provides a comprehensive summary of policy optimization methods ranging from on-policy algorithms (e.g., PPO, TRPO) to off-policy alternatives (e.g., SAC) and even planning-based methods in combination with learned world models. Moreover, it discusses the network architectures employed (MLP, CNN, RNN, Transformers) for representing policies and models—a key factor influencing performance in high complex tasks.
( Figure 2 )
Figure 2: Overview of the key dimensions in solution approaches, illustrating simulator usage, model learning, expert usage, policy optimization, and representation strategies.
Domain-Specific Reviews
The survey presents detailed reviews of DRL approaches in multiple robotic domains:
- Locomotion: The review covers quadrupedal, bipedal, and even aerial locomotion, with many works demonstrating zero‐shot sim-to-real transfer. In quadrupedal locomotion, robust performance has been achieved on diverse terrains with strategies that incorporate dynamics randomization and hierarchical control. The analysis also discusses recent research addressing challenges in bipedal locomotion where more complex and underactuated dynamics necessitate additional stabilization techniques.
- Navigation: DRL has been applied to both wheeled and legged navigation as well as to aerial navigation. The taxonomy distinguishes between approaches that directly learn end-to-end visuomotor mappings and those that integrate traditional mapping and planning modules. Significant challenges remain in achieving generalization, safety, and explainability; strong performance is typically reported in structured environments, with modular strategies proving more robust for real-world deployment.
- Manipulation: The survey reviews DRL for manipulation tasks, including pick-and-place, contact-rich manipulation (such as assembly and handling deformable objects), in-hand manipulation, and non-prehensile control. While some tasks—especially those with predefinable object sets and dense reward functions—have seen mature real-world performance, complex open-world tasks remain challenging due to the need for tight contact control and safe exploration.
- Mobile Manipulation (MoMa): Combining locomotion and manipulation, MoMa presents unique challenges such as synchronizing coordination among many degrees of freedom and handling long-horizon tasks. The survey examines works employing hierarchical architectures that decouple high-level decision making from low-level whole-body control, emphasizing the importance of choosing an appropriate action space given the robot’s morphology.
- Human–Robot Interaction (HRI) and Multi-Robot Interaction: In HRI, studies focus on both collaborative and non-collaborative tasks, while approaches to multi-robot interaction variably rely on multi-agent reinforcement learning frameworks. The survey highlights that data collection and simulation for human interactions are particularly challenging, and that integrating safe and sample-efficient learning strategies is an open research direction.
( Figure 3 )
Figure 3: Summary of the domain-specific reviews, highlighting locomotion, navigation, manipulation, mobile manipulation, HRI, and multi-robot interaction.
General Trends and Open Challenges
The authors identify several cross-cutting trends and challenges in the current body of work:
- Stability and Sample Efficiency: While on-policy algorithms have demonstrated robustness, they are often sample inefficient. Promising avenues for future work include integrating off-policy and offline reinforcement learning techniques.
- Real-World Learning: Overcoming the gap between simulation and reality remains critical, particularly for tasks with complex physical interactions. Safe exploration methods and automatic reset mechanisms during real-world learning are noted as high priority.
- Long-Horizon Tasks and Skill Composition: Learning to chain low-level skills into coherent, long-horizon behaviors remains an active area of inquiry. Approaches leveraging hierarchical reinforcement learning and unsupervised skill discovery are discussed as promising research directions.
- Principled System Design and Benchmarking: One pressing need is the development of standardized evaluation protocols and benchmarks that quantify the real-world success of DRL policies across domains and settings.
( Figure 4 )
Figure 4: Key open challenges in DRL for robotics, including stability and sample efficiency, sim-to-real transfer, long-horizon skill composition, and the need for principled benchmarks.
Conclusion
The survey provides a comprehensive and systematic analysis of the state of deep reinforcement learning in robotics. By categorizing works across multiple axes and highlighting quantitative and qualitative differences across domains, the paper serves as a valuable reference for researchers and practitioners. Its detailed taxonomy and nuanced discussion of application-specific challenges pave the way for addressing open questions in stable, sample-efficient, and generalizable DRL. Future breakthroughs in algorithm design, safe real-world learning, and the integration of foundation models are expected to further extend DRL’s impact on developing robust, versatile robotic systems.