Asymmetric Actor Critic for Image-Based Robot Learning (1710.06542v1)

Published 18 Oct 2017 in cs.RO, cs.AI, and cs.LG

Abstract: Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.

Authors (5)

Lerrel Pinto (81 papers)
Marcin Andrychowicz (22 papers)
Peter Welinder (15 papers)
Wojciech Zaremba (34 papers)
Pieter Abbeel (372 papers)

Citations (329)

View on Semantic Scholar

Summary

Asymmetric Actor Critic for Image-Based Robot Learning: A Technical Summary

The paper presents a novel approach to deep reinforcement learning (RL) in robotics by introducing an asymmetric actor-critic framework. The research addresses the long-standing challenges in robotic learning, particularly the impracticalities of training directly on physical systems due to safety and cost concerns, by leveraging physics simulators. Despite recent advancements in transferring policies from simulators to real-world robots, this paper identifies an opportunity to improve performance by optimally utilizing the full state observability available in simulators.

Methodology

The proposed framework employs an actor-critic algorithm that utilizes asymmetric inputs, where the actor network is trained with partial observations (compressed into RGBD images), while the critic is provided with full state data from the simulator. This design exploits the computational advantages of having full state information during training, enhancing the critic's ability to generate effective feedback for updating the actor. During deployment, the actor functions independently and relies solely on vision-based inputs.

To facilitate real-world applicability, the method incorporates domain randomization—a process of varying the simulator's textures, lighting, and camera positions to improve the robustness of the trained policy against discrepancies between the simulated and real environments. The research applies this methodology to several robotic tasks, including object picking, pushing, and block manipulation, demonstrating a successful simulation-to-reality transfer without requiring real-world data for additional training.

Results and Comparisons

Experimental results show substantial improvements in performance across different tasks using this asymmetric setup as compared to traditional symmetric actor-critic models. Notably, in tasks like Fetch Pick, asymmetric HER (Hindsight Experience Replay) outperformed symmetric approaches, indicating the practical advantage of the proposed method's exploitation of full state observability during training.

Moreover, the integration of bottleneck layers further accelerated the training process. These layers serve as auxiliary tasks allowing the network to predict full states from the observations, thereby expediting the learning of more complex manipulation behaviors.

Interestingly, the paper conducted a comparative analysis with behavior cloning via DAgger—a supervised learning approach leveraging expert demonstrations. Although DAgger allowed for rapid initial learning, it plateaued at suboptimal performance levels compared to asymmetric HER, underscoring the efficiency and effectiveness of the unsupervised learning approach in long-term policy refinement.

Practical Implications and Future Directions

The implications of this research are significant for the development of robust, vision-based robotic control policies without the need for real-world data collection—an often prohibitive and resource-intensive endeavor. By facilitating effective policy transfer from simulation to real robots, this paper contributes valuable insights to the field of robotic learning and control.

The paper also opens avenues for future research in enhancing domain randomization techniques, exploring additional asymmetric input combinations, and applying the proposed framework to more diverse and challenging real-world tasks. Furthermore, adapting the strategy to integrate continual learning concepts where the actor can incrementally improve with real-world feedback could be a fruitful direction, enhancing the applicability of this framework to dynamic environments.

In summary, this paper presents a methodological advancement in deep RL for robotics, offering a practical solution to the challenges of real-world policy deployment through strategic utilization of simulator advantages, asymmetric training paradigms, and robust domain adaptation techniques.

PDF Markdown

Related Papers

Find Related Papers