SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation
Abstract: Autonomous self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems. In this paper, we propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies efficiently. SELFI applies online model-free reinforcement learning on top of offline model-based learning to bring out the best parts of both learning paradigms. Specifically, SELFI stabilizes the online learning process by incorporating the same model-based learning objective from offline pre-training into the Q-values learned with online model-free reinforcement learning. We evaluate SELFI in multiple real-world environments and report improvements in terms of collision avoidance, as well as more socially compliant behavior, measured by a human user study. SELFI enables us to quickly learn useful robotic behaviors with less human interventions such as pre-emptive behavior for the pedestrians, collision avoidance for small and transparent objects, and avoiding travel on uneven floor surfaces. We provide supplementary videos to demonstrate the performance of our fine-tuned policy on our project page.
- Object detection by yolov5. https://github.com/ultralytics/yolov5.
- Peter Anderson et al. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757, 2018.
- Adaptive control. Courier Corporation, 2008.
- High-speed accurate robot control using learned forward kinodynamics and non-linear least squares optimization. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11789–11795. IEEE, 2022.
- A survey of iterative learning control. IEEE control systems magazine, 26(3):96–114, 2006.
- Self-supervised visual planning with temporal skip connections. CoRL, 12:16, 2017.
- Deep visual foresight for planning robot motion. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 2786–2793. IEEE, 2017.
- Principles and guidelines for evaluating social robot navigation algorithms. arXiv preprint arXiv:2306.16740, 2023.
- A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pages 1587–1596. PMLR, 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
- Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019b.
- Depth360: Self-supervised learning for monocular depth estimation using learnable camera distortion model. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 317–324. IEEE, 2022.
- Deep visual mpc-policy learning for navigation. IEEE Robotics and Automation Letters, 4(4):3184–3191, 2019.
- Probabilistic visual navigation with bidirectional image prediction. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1539–1546. IEEE, 2021.
- Exaug: Robot-conditioned navigation policies via geometric experience augmentation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4077–4084. IEEE, 2023.
- Sacson: Scalable autonomous control for social navigation. IEEE Robotics and Automation Letters, 9(1):49–56, 2024. doi: 10.1109/LRA.2023.3329626.
- Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA), pages 6023–6029. IEEE, 2019.
- Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
- Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
- Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research, 37(4-5):421–436, 2018.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
- Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 7559–7566. IEEE, 2018.
- Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. arXiv preprint arXiv:2303.05479, 2023.
- Spatio-temporal graph localization networks for image-based navigation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3279–3286. IEEE, 2022.
- Open x-embodiment: Robotic learning datasets and rt-x models. arXiv preprint arXiv:2310.08864, 2023.
- Zero-shot visual imitation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 2050–2053, 2018.
- Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
- Moto: Offline pre-training to online fine-tuning for model-based robot learning. In Conference on Robot Learning, pages 3654–3671. PMLR, 2023.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Semi-parametric topological memory for navigation. In International Conference on Learning Representations, 2018.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583–8592. PMLR, 2020.
- Ving: Learning open-world navigation with visual goals. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13215–13222. IEEE, 2021.
- Vint: A foundation model for visual navigation. arXiv preprint arXiv:2306.14846, 2023.
- Fastrlap: A system for learning high-speed driving via deep rl and autonomous practicing. arXiv preprint arXiv:2304.09831, 2023.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9068–9079, 2018.
- Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364. IEEE, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.