A Learning Framework for Diverse Legged Robot Locomotion Using Barrier-Based Style Rewards (2409.15780v4)

Published 24 Sep 2024 in cs.RO

Abstract: This work introduces a model-free reinforcement learning framework that enables various modes of motion (quadruped, tripod, or biped) and diverse tasks for legged robot locomotion. We employ a motion-style reward based on a relaxed logarithmic barrier function as a soft constraint, to bias the learning process toward the desired motion style, such as gait, foot clearance, joint position, or body height. The predefined gait cycle is encoded in a flexible manner, facilitating gait adjustments throughout the learning process. Extensive experiments demonstrate that KAIST HOUND, a 45 kg robotic system, can achieve biped, tripod, and quadruped locomotion using the proposed framework; quadrupedal capabilities include traversing uneven terrain, galloping at 4.67 m/s, and overcoming obstacles up to 58 cm (67 cm for HOUND2); bipedal capabilities include running at 3.6 m/s, carrying a 7.5 kg object, and ascending stairs-all performed without exteroceptive input.

Summary

The paper presents a novel RL framework that leverages relaxed logarithmic barrier rewards to enforce desirable gait styles.
It employs multi-critic architecture and gait encoding to achieve adaptable quadruped, tripod, and biped locomotion.
Experimental results demonstrate agile performance, including a quadruped gallop at 4.67 m/s and biped running at 3.6 m/s, overcoming 67 cm obstacles.

A Framework for Diverse Legged Robot Locomotion Using Barrier-Based Rewards

The paper presents a model-free reinforcement learning (RL) framework focused on enhancing the locomotion capabilities of legged robots. By leveraging barrier-based style rewards, the framework aims to facilitate diverse and adaptable motion modes such as quadruped, tripod, and biped, enabling these robots to perform various complex tasks. The pivotal innovation is the use of a relaxed logarithmic barrier function to guide the learning process, emphasizing desirable motion styles, including specific gait patterns, foot clearance, and joint positions.

Methodology

The proposed RL framework is characterized by several distinct features:

Barrier-Based Style Rewards: The framework employs a relaxed logarithmic barrier function, traditionally used in trajectory optimization, to incorporate soft constraints within the reward structure. This approach allows the system to balance flexibility and constraint satisfaction without requiring additional algorithms to handle infinite values during constraint violations.
Gait Encoding: The framework encodes predefined gait cycles to inform phase timing during motion, allowing for on-the-fly adjustments to stance and swing times based on task requirements.
Multi-Critic Architecture: By adopting multiple critics to separately handle barrier and standard rewards, the system can efficiently manage the complexities of learning desirable motion characteristics while avoiding potential pitfalls like early termination.
Task-Specific Rewards: The framework is adaptable, enabling task-specific tuning by adjusting constraints for variables such as joint positions and body height based on defined tasks and modes.

Experimental Results

The paper demonstrates the framework's efficacy via extensive experiments on the KAIST HOUND and HOUND2 robots. Key results include:

Quadrupedal Locomotion: The robot displayed agility and robustness over uneven terrains, achieving high-speed galloping at 4.67 m/s and overcoming obstacles as tall as 67 cm.
Bipedal Locomotion: The system enabled running at speeds of up to 3.6 m/s, which is noteworthy given the typical constraints of quadruped robots transitioning to bipedal configurations.
Tripod Mode: Through strategic lifting of one leg, the framework enabled smooth transitions between different locomotion modes, showcasing adaptability and versatility in movement.

Key Implications

The introduction of barrier-based rewards represents a substantial shift in how motion characteristics can be enforced in RL frameworks without extensive manual reward engineering. The capability to adjust motion features and task-specific gait patterns could pave the way for more versatile robotic systems capable of adapting to real-world variabilities.

Future Directions

The framework's success suggests promising avenues for further exploration. Extending this approach to other robotic morphologies and integrating it with external sensory data could lead to even more adaptable and resilient robotic systems. Additionally, investigating the application of similar techniques in multi-robot collaborative settings may yield valuable insights.

In conclusion, this paper contributes significant advancements in the field of legged robotics, particularly in enhancing locomotion adaptability through a novel RL framework. The barrier-based reward mechanism showcases potential for broader applications, reinforcing the importance of sophisticated reward structures in complex robotic control tasks.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ChongZitaZhang/status/1841926486783054112

https://twitter.com/_bbelousov/status/1842293348197105673

https://twitter.com/OWW/status/1839098543560081514

YouTube

Show All Videos