Learning to Terminate in Object Navigation (2309.16164v1)
Abstract: This paper tackles the critical challenge of object navigation in autonomous navigation systems, particularly focusing on the problem of target approach and episode termination in environments with long optimal episode length in Deep Reinforcement Learning (DRL) based methods. While effective in environment exploration and object localization, conventional DRL methods often struggle with optimal path planning and termination recognition due to a lack of depth information. To overcome these limitations, we propose a novel approach, namely the Depth-Inference Termination Agent (DITA), which incorporates a supervised model called the Judge Model to implicitly infer object-wise depth and decide termination jointly with reinforcement learning. We train our judge model along with reinforcement learning in parallel and supervise the former efficiently by reward signal. Our evaluation shows the method is demonstrating superior performance, we achieve a 9.3% gain on success rate than our baseline method across all room types and gain 51.2% improvements on long episodes environment while maintaining slightly better Success Weighted by Path Length (SPL). Code and resources, visualization are available at: https://github.com/HuskyKingdom/DITA_acml2023
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Learning for autonomous navigation. IEEE Robotics & Automation Magazine, 17(2):74–84, 2010.
- Learning to explore using active neural slam. arXiv preprint arXiv:2004.05155, 2020a.
- Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems, 33:4247–4258, 2020b.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 10.1109/CVPR.2009.5206848.
- Dynamic potential-based reward shaping. In Proceedings of the 11th international conference on autonomous agents and multiagent systems, pages 433–440. IFAAMAS, 2012.
- Habicrowd: A high performance simulator for crowd-aware visual navigation. arXiv e-prints, pages arXiv–2306, 2023.
- Visual object search by learning spatial context. IEEE Robotics and Automation Letters, 5(2):1279–1286, 2020.
- Learning object relation graph and tentative policy for visual navigation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 19–34. Springer, 2020.
- Vtnet: Visual transformer network for object goal navigation. arXiv preprint arXiv:2105.09447, 2021.
- Visual simultaneous localization and mapping: a survey. Artificial intelligence review, 43:55–81, 2015.
- Object memory transformer for object goal navigation. In 2022 International Conference on Robotics and Automation (ICRA), pages 11288–11294, 2022. 10.1109/ICRA46639.2022.9812027.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 10.1109/CVPR.2016.90.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Learning to utilize shaping rewards: A new approach of reward shaping. Advances in Neural Information Processing Systems, 33:15931–15941, 2020.
- Reinforcement learning algorithm for partially observable markov decision problems. Advances in neural information processing systems, 7, 1994.
- Terminal prediction as an auxiliary task for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 15, pages 38–44, 2019.
- Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022.
- Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474, 2017.
- Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, 123:32–73, 2017.
- Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth international conference on 3D vision (3DV), pages 239–248. IEEE, 2016.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- Dying relu and initialization: Theory and numerical examples. arXiv preprint arXiv:1903.06733, 2019.
- Navigates like me: Understanding how people evaluate human-like ai in video games. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–18, 2023.
- Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673, 2016.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
- On-line map building and navigation for autonomous mobile robots. In Proceedings of 1995 IEEE international conference on robotics and automation, volume 3, pages 2900–2906. IEEE, 1995.
- Count-based exploration with neural density models. In International conference on machine learning, pages 2721–2730. PMLR, 2017.
- Learning hierarchical relationships for object-goal navigation. In Conference on Robot Learning, pages 517–528. PMLR, 2021.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics. 10.3115/v1/D14-1162. URL https://aclanthology.org/D14-1162.
- Poni: Potential functions for objectgoal navigation with interaction-free learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18890–18900, 2022.
- Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5173–5183, 2022.
- Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12240–12249, 2019.
- Efficient deep reinforcement learning for optimal path planning. Electronics, 11(21), 2022. ISSN 2079-9292. 10.3390/electronics11213628. URL https://www.mdpi.com/2079-9292/11/21/3628.
- Minos: Multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931, 2017.
- Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565, 2022.
- Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.
- Object-based reliable visual navigation for mobile robot. Sensors, 22(6), 2022. ISSN 1424-8220. 10.3390/s22062387. URL https://www.mdpi.com/1424-8220/22/6/2387.
- Emergence of maps in the memories of blind navigation agents. arXiv preprint arXiv:2301.13261, 2023.
- Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6750–6759, 2019.
- Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.
- Visual semantic navigation using scene priors. arXiv preprint arXiv:1810.06543, 2018.
- Auxiliary tasks and exploration enable objectnav. arXiv preprint arXiv:2104.04112, 2021.
- Hierarchical object-to-zone graph for object navigation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15130–15140, 2021.
- Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018.
- Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1851–1858, 2017.
- Soon: Scenario oriented object navigation with graph-based exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12689–12699, 2021.
- Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364. IEEE, 2017.