Automatic Gain Tuning for Humanoid Robots Walking Architectures Using Gradient-Free Optimization Techniques

Published 27 Sep 2024 in cs.RO | (2409.18649v1)

Abstract: Developing sophisticated control architectures has endowed robots, particularly humanoid robots, with numerous capabilities. However, tuning these architectures remains a challenging and time-consuming task that requires expert intervention. In this work, we propose a methodology to automatically tune the gains of all layers of a hierarchical control architecture for walking humanoids. We tested our methodology by employing different gradient-free optimization methods: Genetic Algorithm (GA), Covariance Matrix Adaptation Evolution Strategy (CMA-ES), Evolution Strategy (ES), and Differential Evolution (DE). We validated the parameter found both in simulation and on the real ergoCub humanoid robot. Our results show that GA achieves the fastest convergence (10 x 10³ function evaluations vs 25 x 10³ needed by the other algorithms) and 100% success rate in completing the task both in simulation and when transferred on the real robotic platform. These findings highlight the potential of our proposed method to automate the tuning process, reducing the need for manual intervention.

Abstract PDF HTML Upgrade to Chat

Summary

The paper demonstrates a unified methodology for automatic gain tuning in hierarchical humanoid walking controllers using gradient-free optimization.
It benchmarks Genetic Algorithm, CMA-ES, Differential Evolution, and Evolution Strategy, with GA achieving a 100% convergence rate in both simulation and hardware evaluations.
Results confirm robust transfer from simulation to the ergoCub robot, highlighting efficient, automated tuning that mitigates extensive manual efforts.

Automatic Gain Tuning in Humanoid Robot Walking Control via Zeroth-Order Optimization

Introduction

The tuning of control gains in hierarchical model-based architectures for humanoid locomotion remains a significant bottleneck, necessitating expert time and extensive trial and error due to the proliferation of tunable parameters across cascaded controllers. The paper "Automatic Gain Tuning for Humanoid Robots Walking Architectures Using Gradient-Free Optimization Techniques" (2409.18649) provides a unified methodology for automating gain and cost weight selection throughout all layers of a walking control stack. A systematic exploration and empirical comparison of several gradient-free black-box optimizers is carried out, with evaluation both in simulation and on hardware using the complex and deformation-sensitive ergoCub humanoid platform.

Figure 1: The ergoCub robot walking with an optimized control architecture, defined by parameters identified through gradient-free techniques.

Hierarchical Walking Control Architecture

The walking controller consists of a hierarchical cascade including (i) a centroidal Model Predictive Controller (MPC) for computing desired contact forces and velocities, (ii) a Zero Moment Point (ZMP)-CoM controller for CoM tracking, and (iii) a whole-body quadratic programming (QP) kinematic controller for reference velocity allocation. Each layer introduces multiple tunable gain matrices and weighting factors, resulting in a high-dimensional nonconvex parameter search space.

Figure 2: The walking hierarchical control architecture, tuned via gradient-free techniques, composed of Centroidal Predictive Control (MPC) for calculating desired contact point forces and velocities, ZMP and CoM Controller for computing reference CoM velocity $\dot{x}$ , and whole-body QP kinematic control.

The components are:

Centroidal MPC: Manages contact and centroidal momentum tasks.
ZMP/CoM Control: Linear Inverted Pendulum Model-based CoM velocity regulation.
Whole-Body QP Controller: Allocates reference velocities and complies with task hierarchy.

The optimization variables $\xi \in \mathbb{R}^{14}$ encapsulate the weights and gains for all hierarchical levels, with enforced symmetry and physical constraints to keep the search space relevant and feasible.

Zeroth-Order Gain Optimization Framework

Given the non-differentiability and significant risk of hardware damage from random parameterizations, the objective function is only computed in realistic simulation; transfer to the real robot is performed only when a safe solution is established. Two main performance measures are used:

$\mathcal{G}_1(\xi)$ : Reciprocal of time-to-failure, promoting parameters that maximize task duration.
$\mathcal{G}_2(\xi)$ : Extension of $\mathcal{G}_1$ with a torque penalty, targeting energy efficiency.

Four gradient-free optimizers are benchmarked:

Genetic Algorithm (GA)
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
Differential Evolution (DE)
Evolution Strategy (ES)

The entire optimization and validation infrastructure is built on bipedal-locomotion-framework and MuJoCo.

Empirical Benchmarking and Results

Extensive experiments compare the four algorithms across 80 independent runs (40 per objective function). The results reveal GA as the only method with $100\%$ convergence rate in all simulated and physical deployments. It achieves feasible gain parameterizations for the hierarchical walking stack in as few as $10 \times 10^3$ function evaluations, much faster than CMA-ES and DE, which require up to $25 \times 10^3$ evaluations and exhibit failure cases or high variance.

Notably:

All algorithms but vanilla ES solved the task with $G_1$ (walk completion), but only GA consistently succeeded with the joint-torque-augmented $G_2$ .
Diverse local optima exist (demonstrated by solution variance across runs), validating the complexity of the nonconvex search landscape.
Transfer to hardware: Robustness was validated by running 20 different optimal configurations (from 20 independent GA runs, each with distinct local optima) on the physical robot for a trajectory different from optimization, with all configurations yielding successful walking without catastrophic events.

Validation on Hardware

Optimal gain sets found in simulation generalize to new reference trajectories on hardware, with little variance observed in major control objectives (CoM, ZMP, angular momentum). However, some sensitive dependence is seen in the ZMP temporal drift, which is likely due to differences in identified ZMP gains across local optima.

Implications and Future Directions

The presented results establish the practicality of full-stack automated gain tuning for humanoid locomotion architectures via zeroth-order, black-box optimization. This closes a critical bottleneck in the deployment of robust, data-efficient model-based controllers for complex legged robots, especially where RL alternatives remain data-prohibitive or lack formal safety guarantees. The transferability of optimal parameters to trajectories unseen during optimization also suggests a degree of robustness previously only achievable with extensive manual tuning.

Extending beyond the results, several forward-looking research directions are motivated:

Hybrid cost design: Adding further terms (e.g., energy, tracking accuracy, slip/fall avoidance) for multi-objective robust walking.
Incorporating experimental constraints: Online fine-tuning on hardware with safety filters, leveraging safe exploration methods.
Search space reduction: Structure-exploiting or functionally-equivalent parameterizations (e.g., via learned low-dimensional manifolds) to increase optimizer efficiency.
Algorithmic advances: Comparison with new families of structured zeroth-order optimizers, including stochastic search directions and adaptive discretization [e.g., NEURIPS2023_7429f4c1], and possible meta-optimization over the optimizer hyperparameters.

Conclusion

This work rigorously demonstrates that gradient-free black-box optimization, specifically genetic algorithms, can reliably and efficiently tune all relevant gains in a layered walking architecture for humanoid robots, including successful transfer from simulation to hardware on the challenging ergoCub platform. The consistent outperformance of GA over other black-box strategies and the successful physical deployment across unseen tasks highlight the method’s robustness and generalization. Future progress will involve incorporating richer cost functionals, online hardware-in-the-loop adaptation, and integration of advanced structured optimization algorithms.

(2409.18649)