Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking (2404.19173v2)

Published 30 Apr 2024 in cs.RO

Abstract: A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress. To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers. We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces a novel benchmarking framework and reward formulation that improves disturbance rejection and command following in humanoid locomotion.
It establishes low-cost, repeatable experiments assessing key metrics like energy efficiency and standing fall percentages for rigorous evaluation.
The results highlight that the Single Contact RL controller and its enhanced version achieve significantly improved robustness in real-world conditions.

Analysis of Reward Design and Evaluation in Humanoid Locomotion

The paper, "Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking," addresses critical challenges in the development of robust humanoid robot controllers for standing and walking (SaW) using reinforcement learning (RL). While significant advancements have been achieved in sim-to-real RL for humanoid locomotion, the endeavor lacks a standardized methodology for evaluating different reward functions and assessing controller performance in a rigorous, quantitative manner. This paper contributes a comprehensive benchmarking approach, alongside proposing an alternative reward design to improve the real-world performance of SaW controllers.

Overview and Methodology

The paper sets out to create a reliable, low-cost benchmarking framework that evaluates SaW controllers on fundamental metrics such as command following, disturbance recovery, and energy efficiency. The benchmarks entail repeatable physical experiments that can be set up with widely available materials, ensuring accessibility for a broad spectrum of researchers. The authors emphasize disturbance rejection as a key capability, devising practical methods to simulate real-world physical disturbances systematically and safely. The focus on creating reproducible metrics, such as the Standing Fall Percentage under controlled impulse disturbances, provides critical insights into the relative strengths and weaknesses of different controller designs.

The authors introduce a reward design that diverges from previous work, which often employs highly prescriptive constraints, such as clock-based signals or reference motion trajectories. Instead, they propose a minimally-constraining reward function that intends to provide sufficient flexibility for controllers to learn effective SaW policies while still achieving robust performance metrics. This is exemplified by their addition of targeted reward terms—like single foot contact and base height stabilization—to encourage natural bipedal locomotion without unnecessary constraints.

Results and Implications

The evaluation encompassed three primary SaW controllers: the newly designed Single Contact RL controller, a clock-based RL controller, and the manufacturer-supplied Agility Controller. Using their novel benchmarks, the authors could quantitatively assess each controller's capability in disturbance rejection, command following accuracy, and energy utilization. The results highlighted clear performance variations among the controllers. For instance, the Single Contact RL controller showed a higher disturbance tolerance in the x-direction, while both RL approaches outperformed in maintaining rotational positioning with minimal drift.

The benchmarking facilitated an iterative refinement process. Observed weaknesses, particularly in disturbance rejection profiles, drove further enhancements that yielded the improved Single Contact++ RL controller, which displayed marked advancements in disturbance resilience. This iterative improvement cycle underscores the practical utility of systematic benchmarking in evolving SaW controllers.

Future Directions

The research offers noteworthy implications for future AI and robotics developments. The proposed SaW benchmarks provide a foundation for continual refinement of RL-based humanoid locomotion policies. Future research could explore further minimizing energy expenditures and enhancing sim-to-real transfer fidelity, addressing identified shortcomings in energy efficiency and command execution precision.

The methodology and insights presented could significantly impact the broader field of humanoid robotics, offering a reliable framework through which effective locomotion strategies can be assessed, refined, and implemented. Collaborative efforts to expand these benchmarks could facilitate accelerated progress, converging towards controllers that exhibit refined, stable, and energy-efficient performances akin to human-like locomotion in complex real-world environments.