OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering (2401.16719v3)

Published 30 Jan 2024 in cs.RO, cs.LG, cs.SY, and eess.SY

Abstract: State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState

References (25)

Citations (2)

View on Semantic Scholar

Summary

The paper presents a hybrid approach combining Kalman filtering, convex MPC, and gated networks to robustly estimate a robot’s state with predictive uncertainty.
It integrates proprioceptive and exteroceptive data by leveraging GRUs and Vision Transformers, resulting in a 65% RMSE improvement over VIO SLAM benchmarks.
Extensive experiments on a quadruped robot validate enhanced precision in z-axis height and velocity predictions across challenging terrains, enabling risk-aware navigation.

Introduction

The paper in discussion, "OptiState," introduces a novel state estimation framework for legged robots that innovatively integrates proprioceptive data with exteroceptive sensory input. This is accomplished through the synergy between model-based Kalman filtering, convex Model Predictive Control (MPC), Gated Recurrent Units (GRUs), and Vision Transformers (ViT), refining the traditional approaches to more accurately capture the robot's state even in varied and challenging terrains.

Hybrid Approach for Enhanced State Estimation

A hybrid methodology underpins the state estimation system. The researchers employ a Kalman filter that harnesses joint encoder and IMU measurements, incorporating a single-rigid body model as the system model. This model utilizes ground reaction force outputs from MPC, optimizing the state propagation process. To address non-linear errors from the model and sensor inaccuracies, the Kalman filter's output, coupled with the state input history and latent space representation of depth images from a ViT, are fed into a GRU. The GRU's objective is to correct the Kalman filter output, relying on its ability to predict state components robustly even when model-based assumptions falter.

Key Contributions and Evaluation

The paper sets forth several significant contributions:

The fusion of model-based Kalman filter with GRU and ViT to produce an estimator that provides the robot's trunk state along with predictive uncertainty.
Leveraging the control outputs from convex MPC in the Kalman filter's system model, using these forces demonstrably for state propagation.
Hardware evaluation on a quadruped robot across disparate terrains showcases a substantial 65% improvement in RMSE when benchmarked against a state-of-the-art VIO SLAM baseline.

Experimental Setup and Findings

Experiments carried out on a quadruped robot validate the framework's efficacy. The architecture tangibly outperformed the VIO SLAM solution, particularly in z-axis height estimation and velocity prediction in challenging conditions such as slippery or incline surfaces. Not only does the model enhance precision, but it also predicts its GRU estimate's uncertainty. This uncertainty metric could be vital for informing risk-aware decisions or when to default to the Kalman filter's output in real-world applications.

Reflections and Potentials

While the OptiState system marks a notable advancement in legged robot state estimation, it does have limitations, especially when operating beyond the confines of the training space provided by the motion capture system. The researchers point out that OptiState's ability to predict world coordinate positions reduces when the robot moves outside the calibrated area, though its prediction of velocity components remains unaffected. Nonetheless, the paper sets a new bar for state estimation in legged robots, providing a robust solution that adeptly combines model-based techniques with the adaptive strengths of machine learning. The implications for practical applications in robotic navigation and interaction with complex, dynamic environments are profound, paving the way for further research and refinement in ground truth motion capture and state estimation methodologies.

PDF Markdown

Related Papers

GitHub

GitHub - AlexS28/OptiState: State estimator for legged robots using Optimization, Kalman filtering, and Learning (16 stars)

Tweets

https://twitter.com/ASchperberg/status/1752800223758553101

https://twitter.com/picofanta/status/1754190933468570024