Papers
Topics
Authors
Recent
Search
2000 character limit reached

A generic approach for reactive stateful mitigation of application failures in distributed robotics systems deployed with Kubernetes

Published 24 Oct 2024 in cs.RO | (2410.18825v2)

Abstract: Offloading computationally expensive algorithms to the edge or even cloud offers an attractive option to tackle limitations regarding on-board computational and energy resources of robotic systems. In cloud-native applications deployed with the container management system Kubernetes (K8s), one key problem is ensuring resilience against various types of failures. However, complex robotic systems interacting with the physical world pose a very specific set of challenges and requirements that are not yet covered by failure mitigation approaches from the cloud-native domain. In this paper, we therefore propose a novel approach for robotic system monitoring and stateful, reactive failure mitigation for distributed robotic systems deployed using Kubernetes (K8s) and the Robot Operating System (ROS2). By employing the generic substrate of Behaviour Trees, our approach can be applied to any robotic workload and supports arbitrarily complex monitoring and failure mitigation strategies. We demonstrate the effectiveness and application-agnosticism of our approach on two example applications, namely Autonomous Mobile Robot (AMR) navigation and robotic manipulation in a simulated environment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. L. Abdollahi Vayghan, M. A. Saied, M. Toeroe, and F. Khendek, “Microservice Based Architecture: Towards High-Availability for Stateful Applications with Kubernetes,” in 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS), 2019, pp. 176–185.
  2. F. Mirus, F. Pasch, and K.-U. Scholl, “Towards fault-tolerant deployment of mobile robot navigation in the edge: An experimental study,” in 41st IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2024, pp. 6791–6797.
  3. B. Kehoe, S. Patil, P. Abbeel, and K. Goldberg, “A Survey of Research on Cloud Robotics and Automation,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 2, pp. 398–409, Apr. 2015.
  4. W. J. Beksi, J. Spruth, and N. Papanikolopoulos, “Core: A cloud-based object recognition engine for robotics,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2015, pp. 4512–4517.
  5. A. Rashid, C. M. Kim, J. Kerr, L. Fu, K. Hari, A. Ahmad, K. Chen, H. Huang, M. Gualtieri, M. Wang, C. Juette, N. Tian, L. Ren, and K. Goldberg, “Lifelong LERF: Local 3D Semantic Inventory Monitoring Using FogROS2,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 7740–7747.
  6. B. Kehoe, D. Warrier, S. Patil, and K. Goldberg, “Cloud-based grasp analysis and planning for toleranced parts using parallelized Monte Carlo sampling,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 2, pp. 455–470, 2014.
  7. M. Zahid and F. T. Pokorny, “CloudGripper: An Open Source Cloud Robotics Testbed for Robotic Manipulation Research, Benchmarking and Data Collection at Scale,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 12 076–12 082.
  8. J. Ichnowski, W. Lee, V. Murta, S. Paradis, R. Alterovitz, J. E. Gonzalez, I. Stoica, and K. Goldberg, “Fog robotics algorithms for distributed motion planning using lambda serverless computing,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 4232–4238.
  9. M. Groshev, G. Baldoni, L. Cominardi, A. de la Oliva, and R. Gazda, “Edge robotics: are we ready? an experimental evaluation of current vision and future directions,” Digital Communications and Networks, May 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352864822000888
  10. M. Balogh, A. Vidács, G. Fehér, M. Maliosz, M. Á. Horváth, N. Reider, and S. Rácz, “Cloud-Controlled Autonomous Mobile Robot Platform,” in 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2021, pp. 1–6.
  11. S. Chinchali, A. Sharma, J. Harrison, A. Elhafsi, D. Kang, E. Pergament, E. Cidon, S. Katti, and M. Pavone, “Network offloading policies for cloud robotics: a learning-based approach,” Autonomous Robots, vol. 45, no. 7, pp. 997–1012, July 2021.
  12. K. Chen, M. Wang, M. Gualtieri, N. Tian, C. Juette, L. Ren, J. Ichnowski, J. Kubiatowicz, and K. Goldberg, “FogROS2-LS: A Location-Independent Fog Robotics Framework for Latency Sensitive ROS2 Applications,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 10 581–10 587.
  13. J. Ichnowski, K. Chen, K. Dharmarajan, S. Adebola, M. Danielczuk, V. Mayoral-Vilches, N. Jha, H. Zhan, E. Llontop, D. Xu, C. Buscaron, J. Kubiatowicz, I. Stoica, J. Gonzalez, and K. Goldberg, “FogROS2: An Adaptive Platform for Cloud and Fog Robotics Using ROS 2,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5493–5500.
  14. Y. Zhang, C. Wurll, and B. Hein, “KubeROS: A Unified Platform for Automated and Scalable Deployment of ROS2-based Multi-Robot Applications,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 9097–9103.
  15. Z. Huang and H. Huang, “Proactive Failure Recovery for Stateful NFV,” in 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), 2020, pp. 536–543.
  16. P. S. Junior, D. Miorandi, and G. Pierre, “Stateful Container Migration in Geo-Distributed Environments,” in 2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2020, pp. 49–56.
  17. M.-N. Tran, X. T. Vu, and Y. Kim, “Proactive Stateful Fault-Tolerant System for Kubernetes Containerized Services,” IEEE Access, vol. 10, pp. 102 181–102 194, 2022.
  18. M. Barletta, M. Cinque, C. D. Martino, Z. T. Kalbarczyk, and R. K. Iyer, “Mutiny! How does Kubernetes fail, and what can we do about it?” 2024. [Online]. Available: https://arxiv.org/abs/2404.11169
  19. H. Jiang, S. Elbaum, and C. Detweiler, “Inferring and monitoring invariants in robotic systems,” Autonomous Robots, vol. 41, no. 4, pp. 1027–1046, 2017.
  20. Kubernetes. (2024) K8s deployment. [Online]. Available: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
  21. ——. (2024) K8s Network Policies. [Online]. Available: https://kubernetes.io/docs/concepts/services-networking/network-policies/
  22. F. Pasch, F. Mirus, Y. Zhang, and K.-U. Scholl, “Scenario Execution for Robotics: A generic, backend-agnostic library for running reproducible robotics experiments and tests,” 2024. [Online]. Available: https://arxiv.org/abs/2409.07080
  23. Frederik Pasch and Florian Mirus. (2024) Scenario Execution for Robotics. [Online]. Available: https://github.com/IntelLabs/scenario˙execution
  24. Association for Standardization of Automation and Measurement Systems (ASAM). (2024) OpenScenario V2.0. [Online]. Available: https://www.asam.net/project-detail/asam-openscenario-v20-1/
  25. Py-trees. (2024) Py-trees. [Online]. Available: https://py-trees.readthedocs.io/en/devel/introduction.html
  26. N. Koenig and A. Howard, “Design and use paradigms for Gazebo, an open-source multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3, 2004, pp. 2149–2154 vol.3.
  27. Open Robotics. (2024) Gazebo. [Online]. Available: http://gazebosim.org/home
  28. C. Robotics. (2024) Turtlebot 4. [Online]. Available: https://clearpathrobotics.com/turtlebot-4/
  29. Interbotix. (2024) WidoxX-200. [Online]. Available: https://docs.trossenrobotics.com/interbotix˙xsarms˙docs/specifications/wx200.html
  30. S. Macenski, T. Moore, D. V. Lu, A. Merzlyakov, and M. Ferguson, “From the desks of ROS maintainers: A survey of modern & capable mobile robotics algorithms in the robot operating system 2,” Robotics and Autonomous Systems, p. 104493, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S092188902300132X
  31. S. Macenski. (2024) Nav2. [Online]. Available: https://docs.nav2.org/
  32. D. Coleman, I. A. Sucan, S. Chitta, and N. Correll, “Reducing the Barrier to Entry of Complex Robotic Software: a MoveIt! Case Study,” Journal of Software Engineering for Robotics, vol. 5, no. 1, pp. 3–16, 2014.
  33. I. A. Sucan and S. Chitta. (2024) MoveIt2. [Online]. Available: https://moveit.ros.org/
  34. Cadvisor. (2024) CAdvisor. [Online]. Available: https://prometheus.io/docs/guides/cadvisor/
  35. Kubernetes. (2024) K8s, the meaning of CPU. [Online]. Available: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu
  36. ——. (2024) K8s Stateful Sets. [Online]. Available: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.