Unsupervised Visuomotor Control through Distributional Planning Networks (1902.05542v1)

Published 14 Feb 2019 in cs.RO, cs.CV, cs.LG, and stat.ML

Abstract: While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of the environment needed to compute progress are not directly accessible. To enable robots to autonomously learn skills, we instead consider the problem of reinforcement learning without access to rewards. We aim to learn an unsupervised embedding space under which the robot can measure progress towards a goal for itself. Our approach explicitly optimizes for a metric space under which action sequences that reach a particular state are optimal when the goal is the final state reached. This enables learning effective and control-centric representations that lead to more autonomous reinforcement learning algorithms. Our experiments on three simulated environments and two real-world manipulation problems show that our method can learn effective goal metrics from unlabeled interaction, and use the learned goal metrics for autonomous reinforcement learning.

Authors (4)

Tianhe Yu (36 papers)
Gleb Shevchuk (3 papers)
Dorsa Sadigh (162 papers)
Chelsea Finn (264 papers)

Citations (42)

View on Semantic Scholar

Summary

Unsupervised Visuomotor Control through Distributional Planning Networks

The paper "Unsupervised Visuomotor Control through Distributional Planning Networks" introduces a novel reinforcement learning (RL) approach that enables robots to autonomously learn visuomotor skills without relying on human-provided reward functions. This research addresses a critical challenge in RL: the need for manually engineered reward functions, especially in real-world scenarios where environmental states that signify progress are not directly observable.

Overview of the Methodology

The authors propose an approach to reinforcement learning that sidesteps the necessity of manually defined rewards by learning an unsupervised embedding space. This space allows robots to autonomously measure progress toward goals. The core idea revolves around optimizing for a metric space in which sequences of actions are deemed optimal if they reach a particular state, specifically when the goal is set as the final state achieved. By doing so, the research pioneers a strategy whereby robots can learn representations and execute tasks with a high degree of autonomy.

Central to the method is the concept of Distributional Planning Networks (DPN). DPNs establish a metric through unsupervised learning that assesses how sequences of actions lead to achieving a given goal image. The approach builds on universal planning networks by extending them to model distributions over potential action sequences, capitalizing on the optimality of actions in reaching terminal states specific to the collected data rather than simulations of expert demonstrations.

Experiments and Results

The paper reports experiments conducted in simulated and real-world settings, showcasing the method's efficacy across diverse visuomotor tasks. In simulation, experiments covered tasks like reaching, pushing, and rope manipulation. Similarly, real-world experiments utilized a Fetch robot for reaching and object manipulation tasks.

Quantitative results underscore the approach's proficiency in learning effective goal metrics from unlabeled interactions. For instance, in the simulated reaching task, the paper reports achieving a fine precision of reaching within 0.05cm after 100 reinforcement learning steps. The DPN approach outpaces existing methods, such as inverse models and VAEs, which either lose focus on task-specific variables or fail to deal with subtle distinctions in control tasks. This improvement is attributed to DPN's ability to learn a control-centric representation that captures the relevant task aspects while disregarding irrelevant features.

Theoretical and Practical Implications

The research holds significant theoretical and practical implications. Theoretically, it offers a new paradigm in representation learning by emphasizing control-centric features, enabling robots to execute longer and more complex action sequences without manual rewards. Practically, the insights could transform robot learning in uncertain or dynamic environments where object states are complex and scarce labels impede task specification.

Future Directions

Looking forward, this approach could serve as a foundation for several advancements in artificial intelligence and robotics. Expanding the scope to non-reachability tasks could result in universally applicable control strategies. Furthermore, the learned metrics could integrate into multi-goal reinforcement learning and planning with learned models, thereby enhancing applications like goal relabeling and automatic curriculum generation in RL.

In conclusion, this paper presents a significant advancement in autonomous robotic learning by effectively eliminating the need for hand-engineered reward functions. It opens avenues for further research to enhance the versatility and autonomy of reinforcement learning-based systems in various domains.

PDF Markdown