Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion (2008.12228v1)

Published 6 Aug 2020 in cs.RO, cs.AI, cs.LG, and stat.ML

Abstract: Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to truly make good on this promise, however, we need algorithms and learning setups that can work across a broad range of problems with minimal problem specific adjustments or engineering. In this paper, we study this idea of generality in the locomotion domain. We develop a learning framework that can learn sophisticated locomotion behavior for a wide spectrum of legged robots, such as bipeds, tripeds, quadrupeds and hexapods, including wheeled variants. Our learning framework relies on a data-efficient, off-policy multi-task RL algorithm and a small set of reward functions that are semantically identical across robots. To underline the general applicability of the method, we keep the hyper-parameter settings and reward definitions constant across experiments and rely exclusively on on-board sensing. For nine different types of robots, including a real-world quadruped robot, we demonstrate that the same algorithm can rapidly learn diverse and reusable locomotion skills without any platform specific adjustments or additional instrumentation of the learning setup.

Authors (9)

Roland Hafner (23 papers)
Tim Hertweck (14 papers)
Philipp Klöppner (1 paper)
Michael Bloesch (24 papers)
Michael Neunert (29 papers)
Markus Wulfmeier (46 papers)
Saran Tunyasuvunakool (19 papers)
Nicolas Heess (139 papers)
Martin Riedmiller (64 papers)

Citations (19)

View on Semantic Scholar

Summary

Overview of Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion

This paper presents a paper focused on developing a general learning framework for locomotion tasks in legged robotics, relying primarily on Reinforcement Learning (RL) methodologies. The authors pursue a learning architecture adaptable across diverse robotic platforms, such as bipeds, tripeds, quadrupeds, and hexapods, and seek to operate with minimal customization to specific robots or environments. This pursuit is driven by the goal of creating a system where locomotion behaviors are learned autonomously using on-board sensors, without dependency on external sensory systems or modifications.

The framework leverages a multi-task RL approach with a uniform set of semantically interpreted reward functions across different robotic architectures. The paper is notable for maintaining identical hyper-parameters and reward definitions throughout numerous experimental scenarios, regardless of the platform. This method showcases a commitment to the framework's general applicability.

Numerical Results and Claims

The paper details experiments across nine robotic platforms, including simulated and real-world environments. It highlights the capability of a single RL algorithm to learn separate locomotion skills—such as standing upright, walking in various directions, and turning—efficiently and effectively, without task-specific tuning. Particularly, the paper demonstrates that complex skills can be gained in approximately a few hours of interaction time, showcasing potential for direct real-world application.

In real-world tests on the quadruped robot 'Daisy4', the system learned to walk forward in about 40 minutes of direct interaction (or about two hours in full experiment time), maintaining results comparable to simulations. This outcome underscores the potential applicability of the proposed RL framework to situations beyond controlled laboratory conditions.

Implications and Speculation on Future Work

The implications of this research extend into scaling robotic applications in settings that require adaptability and minimal infrastructural dependency, spanning industries where robotic deployment in dynamic or partially-known environments is essential. Theoretically, the work forwards the understanding of applying RL to broader classes of robotic morphologies with reduced reliance on domain-specific engineering skills.

Considering future developments, this general RL framework could expand to accommodate more sophisticated tasks involving interactive objects or complex terrains without human intervention. Enhanced safety measures and more robust learning from incomplete or noisy sensory data could widen deployment to bipedal robots and dynamic environments.

Final Considerations

By showcasing a method that supports transferable locomotion tasks across varied platforms without reward recalibration or hardware-specific adaptations, this paper contributes significantly to the exploration of general RL frameworks in robotics. These findings advocate for the feasibility of more autonomous and adaptable robotic systems, a crucial direction in AI research that aligns with broader trends across applications requiring enhanced degrees of autonomy and resilience.

PDF Markdown