Papers
Topics
Authors
Recent
Search
2000 character limit reached

SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation

Published 1 Mar 2024 in cs.RO, cs.CV, and cs.LG | (2403.00991v2)

Abstract: Autonomous self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems. In this paper, we propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies efficiently. SELFI applies online model-free reinforcement learning on top of offline model-based learning to bring out the best parts of both learning paradigms. Specifically, SELFI stabilizes the online learning process by incorporating the same model-based learning objective from offline pre-training into the Q-values learned with online model-free reinforcement learning. We evaluate SELFI in multiple real-world environments and report improvements in terms of collision avoidance, as well as more socially compliant behavior, measured by a human user study. SELFI enables us to quickly learn useful robotic behaviors with less human interventions such as pre-emptive behavior for the pedestrians, collision avoidance for small and transparent objects, and avoiding travel on uneven floor surfaces. We provide supplementary videos to demonstrate the performance of our fine-tuned policy on our project page.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Object detection by yolov5. https://github.com/ultralytics/yolov5.
  2. Peter Anderson et al. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757, 2018.
  3. Adaptive control. Courier Corporation, 2008.
  4. High-speed accurate robot control using learned forward kinodynamics and non-linear least squares optimization. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11789–11795. IEEE, 2022.
  5. A survey of iterative learning control. IEEE control systems magazine, 26(3):96–114, 2006.
  6. Self-supervised visual planning with temporal skip connections. CoRL, 12:16, 2017.
  7. Deep visual foresight for planning robot motion. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 2786–2793. IEEE, 2017.
  8. Principles and guidelines for evaluating social robot navigation algorithms. arXiv preprint arXiv:2306.16740, 2023.
  9. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
  10. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pages 1587–1596. PMLR, 2018.
  11. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  12. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
  13. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019b.
  14. Depth360: Self-supervised learning for monocular depth estimation using learnable camera distortion model. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 317–324. IEEE, 2022.
  15. Deep visual mpc-policy learning for navigation. IEEE Robotics and Automation Letters, 4(4):3184–3191, 2019.
  16. Probabilistic visual navigation with bidirectional image prediction. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1539–1546. IEEE, 2021.
  17. Exaug: Robot-conditioned navigation policies via geometric experience augmentation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4077–4084. IEEE, 2023.
  18. Sacson: Scalable autonomous control for social navigation. IEEE Robotics and Automation Letters, 9(1):49–56, 2024. doi: 10.1109/LRA.2023.3329626.
  19. Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA), pages 6023–6029. IEEE, 2019.
  20. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
  21. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  22. Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
  23. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
  24. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  25. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
  26. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research, 37(4-5):421–436, 2018.
  27. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  28. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  29. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  30. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
  31. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 7559–7566. IEEE, 2018.
  32. Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. arXiv preprint arXiv:2303.05479, 2023.
  33. Spatio-temporal graph localization networks for image-based navigation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3279–3286. IEEE, 2022.
  34. Open x-embodiment: Robotic learning datasets and rt-x models. arXiv preprint arXiv:2310.08864, 2023.
  35. Zero-shot visual imitation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 2050–2053, 2018.
  36. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
  37. Moto: Offline pre-training to online fine-tuning for model-based robot learning. In Conference on Robot Learning, pages 3654–3671. PMLR, 2023.
  38. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  39. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
  40. Semi-parametric topological memory for navigation. In International Conference on Learning Representations, 2018.
  41. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  42. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  43. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583–8592. PMLR, 2020.
  44. Ving: Learning open-world navigation with visual goals. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13215–13222. IEEE, 2021.
  45. Vint: A foundation model for visual navigation. arXiv preprint arXiv:2306.14846, 2023.
  46. Fastrlap: A system for learning high-speed driving via deep rl and autonomous practicing. arXiv preprint arXiv:2304.09831, 2023.
  47. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  48. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9068–9079, 2018.
  49. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364. IEEE, 2017.
Citations (4)

Summary

  • The paper presents SELFI, which integrates reinforcement learning techniques to fine-tune pre-trained social navigation policies for robots.
  • It combines model-free online learning with offline model-based learning to stabilize and accelerate policy improvement in complex, real-world scenarios.
  • Extensive tests demonstrate that SELFI enables robots to navigate seamlessly around pedestrians and obstacles with minimal human intervention.

Autonomous Self-Improvement in Social Navigation Robots Through Reinforcement Learning

Overview

In recent developments, the endeavor to enhance robotic systems' efficiency and effectiveness in real-world applications has seen a notable shift toward enabling these systems to learn and adapt through experience. A significant contribution to this field is presented in the study titled "SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation", where the authors introduce SELFI, an online learning method designed for fine-tuning pre-trained control policies of robots engaged in social navigation tasks. This method amalgamates online model-free reinforcement learning (RL) with offline model-based learning to synthesize the strengths of both approaches, thus enabling rapid and efficient policy improvement.

Methodology

SELFI stabilizes the online learning process by incorporating a model-based learning objective, utilized during the offline pre-training phase, into the Q-values learned through online model-free reinforcement learning. This integration facilitates the fine-tuning of control policies in real-world environments by enabling robots to learn from online experiences without significant human intervention. The method is particularly tailored for social navigation, where robots must navigate indoor spaces while avoiding obstacles and maintaining socially compliant behavior around pedestrians.

The SELFI framework consists of several key components:

  1. Model-Free Reinforcement Learning: At its core, SELFI employs model-free RL to learn from real-world interactions, with the ultimate goal of maximizing the expected sum of future rewards. It leverages actor-critic methods to optimize both policy and action-value functions.
  2. Model-Based Learning Objective: A novel aspect of SELFI is its use of a hybrid objective that combines a learned model-free critic with a pre-existing model-based trajectory value estimate. This combination allows the robot to begin its online learning phase with a reasonable approximation of desired behaviors, facilitating a smoother learning process and enabling more rapid improvements.
  3. Social Navigation in Real-World Environments: The application of SELFI to social navigation is thoroughly evaluated. The method not only improves basic navigational capabilities, such as collision avoidance, but also enhances the robot's performance in socially relevant aspects, such as preemptively avoiding pedestrians and navigating smoothly around small or transparent obstacles.
  4. Fine-Tuning and Behavioral Improvement: Through extensive real-world testing, SELFI demonstrates its capability to fine-tune pre-trained policies effectively. Robots are able to adapt to specific environmental challenges, learning complex behaviors like avoiding uneven floor surfaces, which would be difficult to encode directly through offline model-based methods or learn efficiently from scratch using model-free RL alone.

Implications and Future Directions

The SELFI framework represents a significant step forward in the field of robotic learning, especially in contexts requiring nuanced interaction with humans and complex navigation tasks. By leveraging the strengths of both online and offline learning methods, SELFI enables robots to adapt more effectively to their operational environments, reducing the need for human intervention during the learning process.

The implications of this research are vast, potentially impacting a wide range of applications from service robots in public spaces to assistive devices in healthcare settings. Future work could explore the integration of more complex social behaviors into the SELFI framework, further enhancing robots' ability to seamlessly integrate into human environments. Expanding the methodology to include diverse learning objectives and environmental contexts will also be crucial for advancing the state of autonomous robotic systems.

In conclusion, the development of SELFI marks a promising advancement in the quest for creating autonomous robots capable of self-improvement through reinforcement learning. Its successful application in social navigation tasks opens up new avenues for research and development, potentially leading to more adaptable, efficient, and socially aware robotic systems in the future.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 209 likes about this paper.