Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering (2401.16719v3)

Published 30 Jan 2024 in cs.RO, cs.LG, cs.SY, and eess.SY

Abstract: State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. M. F. Fallón, M. Antone, N. Roy, and S. Teller, “Drift-free humanoid state estimation fusing kinematic, inertial and lidar sensing,” in 2014 IEEE-RAS International Conference on Humanoid Robots, 2014, pp. 112–119.
  2. H. Durrant Whyte and T. Bailey, “Simultaneous localization and mapping: part i,” vol. 13, no. 2, 2006, pp. 99–110.
  3. X. Xinjilefu, S. Feng, and C. G. Atkeson, “Dynamic state estimation using quadratic programming,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014, pp. 989–994.
  4. M. Brossard, A. Barrau, and S. Bonnabel, “Ai-imu dead-reckoning,” IEEE Transactions on Intelligent Vehicles, vol. 5, no. 4, pp. 585–595, 2020.
  5. R. Buchanan, M. Camurri, F. Dellaert, and M. Fallon, “Learning inertial odometry for dynamic legged robot state estimation,” Conference on Robot Learning, 2021.
  6. J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim, “Dynamic locomotion in the MIT Cheetah 3 through convex model-predictive control,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1–9.
  7. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16 000–16 009, June 2022.
  8. P. Hausamann, C. B. Sinnott, M. Daumer, and P. R. MacNeilage, “Evaluation of the intel realsense t265 for tracking natural human head motion,” Scientific Reports, vol. 11, no. 1, p. 12486, 6 2021.
  9. P. Agarwal et al., “State estimation for legged robots: Consistent fusion of leg kinematics and imu,” in Robotics: Science and Systems VIII, 2013, pp. 17–24.
  10. M. Bloesch, C. Gehring, P. Fankhauser, M. Hutter, M. A. Hoepflinger, and R. Siegwart, “State estimation for legged robots on unstable and slippery terrain,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 6058–6064.
  11. G. Fink and C. Semini, “Proprioceptive sensor fusion for quadruped robot state estimation,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10 914–10 920.
  12. T. Bailey and H. Durrant-Whyte, “Simultaneous localization and mapping (slam): part ii,” vol. 13, no. 3, 2006, pp. 108–117.
  13. D. Scaramuzza and F. Fraundorfer, “Visual odometry [tutorial],” vol. 18, no. 4, 2011, pp. 80–92.
  14. F. Fraundorfer and D. Scaramuzza, “Visual odometry : Part ii: Matching, robustness, optimization, and applications,” vol. 19, no. 2, 2012, pp. 78–90.
  15. A. Concha, G. Loianno, V. Kumar, and J. Civera, “Visual-inertial direct slam,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1331–1338.
  16. M. Camurri, M. Ramezani, S. Nobili, and M. Fallon, “Pronto: A multi-sensor state estimator for legged robots in real-world scenarios,” Frontiers in Robotics and AI, vol. 7, 2020. [Online]. Available: https://www.frontiersin.org/articles/10.3389/frobt.2020.00068
  17. S. Yang, Z. Zhang, Z. Fu, and Z. Manchester, “Cerberus: Low-drift visual-inertial-leg odometry for agile locomotion,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 4193–4199.
  18. M. Zhang, M. Zhang, Y. Chen, and M. Li, “Imu data processing for inertial aided navigation: A recurrent neural network based approach,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 3992–3998.
  19. A. Schperberg, Y. Tanaka, F. Xu, M. Menner, and D. Hong, “Real-to-sim: Predicting residual errors of robotic systems with sparse data using a learning-based unscented kalman filter,” in 2023 20th International Conference on Ubiquitous Robots (UR), 2023, pp. 27–34.
  20. V. G. Satorras, Z. Akata, and M. Welling, “Combining generative and discriminative models for hybrid inference,” arXiv, 2019.
  21. A. Schperberg, S. Tsuei, S. Soatto, and D. Hong, “Saber: Data-driven motion planner for autonomously navigating heterogeneous robots,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 8086–8093, 2021.
  22. D. Wisth, M. Camurri, and M. Fallon, “Robust legged robot state estimation using factor graph optimization,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4507–4514, 2019.
  23. G. Revach, N. Shlezinger, X. Ni, A. L. Escoriza, R. J. G. van Sloun, and Y. C. Eldar, “Kalmannet: Neural network aided kalman filtering for partially known dynamics,” Trans. Sig. Proc., vol. 70, p. 1532–1547, jan 2022. [Online]. Available: https://doi.org/10.1109/TSP.2022.3158588
  24. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
  25. Y. Tanaka, Y. Shirai, X. Lin, A. Schperberg, H. Kato, A. Swerdlow, N. Kumagai, and D. Hong, “Scaler: A tough versatile quadruped free-climber robot,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 5632–5639.
Citations (2)

Summary

  • The paper presents a hybrid approach combining Kalman filtering, convex MPC, and gated networks to robustly estimate a robot’s state with predictive uncertainty.
  • It integrates proprioceptive and exteroceptive data by leveraging GRUs and Vision Transformers, resulting in a 65% RMSE improvement over VIO SLAM benchmarks.
  • Extensive experiments on a quadruped robot validate enhanced precision in z-axis height and velocity predictions across challenging terrains, enabling risk-aware navigation.

Introduction

The paper in discussion, "OptiState," introduces a novel state estimation framework for legged robots that innovatively integrates proprioceptive data with exteroceptive sensory input. This is accomplished through the synergy between model-based Kalman filtering, convex Model Predictive Control (MPC), Gated Recurrent Units (GRUs), and Vision Transformers (ViT), refining the traditional approaches to more accurately capture the robot's state even in varied and challenging terrains.

Hybrid Approach for Enhanced State Estimation

A hybrid methodology underpins the state estimation system. The researchers employ a Kalman filter that harnesses joint encoder and IMU measurements, incorporating a single-rigid body model as the system model. This model utilizes ground reaction force outputs from MPC, optimizing the state propagation process. To address non-linear errors from the model and sensor inaccuracies, the Kalman filter's output, coupled with the state input history and latent space representation of depth images from a ViT, are fed into a GRU. The GRU's objective is to correct the Kalman filter output, relying on its ability to predict state components robustly even when model-based assumptions falter.

Key Contributions and Evaluation

The paper sets forth several significant contributions:

  1. The fusion of model-based Kalman filter with GRU and ViT to produce an estimator that provides the robot's trunk state along with predictive uncertainty.
  2. Leveraging the control outputs from convex MPC in the Kalman filter's system model, using these forces demonstrably for state propagation.
  3. Hardware evaluation on a quadruped robot across disparate terrains showcases a substantial 65% improvement in RMSE when benchmarked against a state-of-the-art VIO SLAM baseline.

Experimental Setup and Findings

Experiments carried out on a quadruped robot validate the framework's efficacy. The architecture tangibly outperformed the VIO SLAM solution, particularly in z-axis height estimation and velocity prediction in challenging conditions such as slippery or incline surfaces. Not only does the model enhance precision, but it also predicts its GRU estimate's uncertainty. This uncertainty metric could be vital for informing risk-aware decisions or when to default to the Kalman filter's output in real-world applications.

Reflections and Potentials

While the OptiState system marks a notable advancement in legged robot state estimation, it does have limitations, especially when operating beyond the confines of the training space provided by the motion capture system. The researchers point out that OptiState's ability to predict world coordinate positions reduces when the robot moves outside the calibrated area, though its prediction of velocity components remains unaffected. Nonetheless, the paper sets a new bar for state estimation in legged robots, providing a robust solution that adeptly combines model-based techniques with the adaptive strengths of machine learning. The implications for practical applications in robotic navigation and interaction with complex, dynamic environments are profound, paving the way for further research and refinement in ground truth motion capture and state estimation methodologies.