Feudal Networks for Visual Navigation (2402.12498v3)
Abstract: Visual navigation follows the intuition that humans can navigate without detailed maps. A common approach is interactive exploration while building a topological graph with images at nodes that can be used for planning. Recent variations learn from passive videos and can navigate using complex social and semantic cues. However, a significant number of training videos are needed, large graphs are utilized, and scenes are not unseen since odometry is utilized. We introduce a new approach to visual navigation using feudal learning, which employs a hierarchical structure consisting of a worker agent, a mid-level manager, and a high-level manager. Key to the feudal learning paradigm, agents at each level see a different aspect of the task and operate at different spatial and temporal scales. Two unique modules are developed in this framework. For the high-level manager, we learn a memory proxy map in a self supervised manner to record prior observations in a learned latent space and avoid the use of graphs and odometry. For the mid-level manager, we develop a waypoint network that outputs intermediate subgoals imitating human waypoint selection during local navigation. This waypoint network is pre-trained using a new, small set of teleoperation videos that we make publicly available, with training environments different from testing environments. The resulting feudal navigation network achieves near SOTA performance, while providing a novel no-RL, no-graph, no-odometry, no-metric map approach to the image goal navigation task.
- Edward C Tolman. Cognitive maps in rats and men. Psychological review, 55(4):189, 1948.
- From cognitive maps to cognitive graphs. PloS one, 9(11):e112544, 2014.
- Structuring knowledge with cognitive maps and cognitive graphs. Trends in cognitive sciences, 25(1):37–54, 2021.
- The cognitive map in humans: spatial navigation and beyond. Nature neuroscience, 20(11):1504–1513, 2017.
- Semi-parametric topological memory for navigation. arXiv preprint arXiv:1803.00653, 2018.
- Neural topological slam for visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12875–12884, 2020a.
- Learning to navigate in cities without a map. Advances in neural information processing systems, 31, 2018.
- A Behavioral Approach to Visual Navigation with Graph Localization Networks. In Proceedings of Robotics: Science and Systems, FreiburgimBreisgau, Germany, June 2019a. doi: 10.15607/RSS.2019.XV.010.
- Navigating to objects in the real world. Science Robotics, 8(79):eadf6991, 2023.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Ving: Learning open-world navigation with visual goals. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13215–13222. IEEE, 2021a.
- Search on the replay buffer: Bridging planning and reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
- Feudal networks for hierarchical reinforcement learning. In International Conference on Machine Learning, pages 3540–3549. PMLR, 2017.
- No rl, no simulation: Learning to navigate without navigating. Advances in Neural Information Processing Systems, 34:26661–26673, 2021.
- Gibson env: real-world perception for embodied agents. In Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on. IEEE, 2018.
- Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
- Matterport3D: Learning from RGB-D Data in Indoor Environments. International Conference on 3D Vision (3DV), 2017.
- Cognitive Mapping and Planning for Visual Navigation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- Learning To Explore Using Active Neural SLAM. In International Conference on Learning Representations, 2019.
- Towards generalization in target-driven visual navigation by using deep reinforcement learning. IEEE Transactions on Robotics, 36(5):1546–1561, 2020.
- Maast: Map attention with semantic transformers for efficient visual navigation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13223–13230. IEEE, 2021.
- A behavioral approach to visual navigation with graph localization networks. arXiv preprint arXiv:1903.00445, 2019b.
- Offline reinforcement learning for visual navigation. arXiv preprint arXiv:2212.08244, 2022.
- Metric-Free Exploration for Topological Mapping by Task and Motion Imitation in Feature Space. arXiv preprint arXiv:2303.09192, 2023.
- Rapid exploration for open-world navigation with latent goal models. arXiv preprint arXiv:2104.05859, 2021b.
- Topological Semantic Graph Memory for Image-Goal Navigation. In Conference on Robot Learning, pages 393–402. PMLR, 2023.
- One-4-All: Neural Potential Fields for Embodied Navigation. arXiv preprint arXiv:2303.04011, 2023.
- Poni: Potential functions for objectgoal navigation with interaction-free learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18890–18900, 2022.
- Mapnet: An allocentric spatial memory for mapping environments. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8476–8484, 2018.
- Learning to map for active semantic goal navigation. arXiv preprint arXiv:2106.15648, 2021.
- Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems, 33:4247–4258, 2020b.
- Zero experience required: Plug & play modular transfer learning for semantic visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17031–17041, 2022.
- Feudal reinforcement learning. Advances in neural information processing systems, 5, 1992.
- Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364. IEEE, 2017.
- Visual memory for robust path following. Advances in neural information processing systems, 31, 2018.
- Scene memory transformer for embodied agents in long-horizon tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 538–547, 2019.
- EgoMap: Projective mapping and structured egocentric memory for Deep RL. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 525–540. Springer, 2020.
- Memory-augmented reinforcement learning for image-goal navigation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3316–3323. IEEE, 2022.
- A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
- Hierarchical imitation and reinforcement learning. In International conference on machine learning, pages 2917–2926. PMLR, 2018.
- Options as responses: Grounding behavioural hierarchies in multi-agent reinforcement learning. In International Conference on Machine Learning, pages 9733–9742. PMLR, 2020.
- Ask your humans: Using human instructions to improve generalization in reinforcement learning. arXiv preprint arXiv:2011.00517, 2020.
- Hrl4in: Hierarchical reinforcement learning for interactive navigation with mobile manipulators. In Conference on Robot Learning, pages 603–616. PMLR, 2020.
- Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning, pages 1430–1440. PMLR, 2021.
- Hierarchical robot navigation in novel environments using rough 2-d maps. arXiv preprint arXiv:2106.03665, 2021.
- Hierarchies of planning and reinforcement learning for robot navigation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 10682–10688. IEEE, 2021.
- Habitat 2.0: Training Home Assistants to Rearrange their Habitat. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
- Unsupervised visual representation learning by synchronous momentum grouping. In European Conference on Computer Vision, pages 265–282. Springer, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. arXiv preprint arXiv:1911.00357, 2019.
- Renderable Neural Radiance Map for Visual Navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9099–9108, 2023.
- Offline visual representation learning for embodied navigation. arXiv preprint arXiv:2204.13226, 2022.
- Why does hierarchy (sometimes) work so well in reinforcement learning? arXiv preprint arXiv:1909.10618 (2019), 2019.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- Faith Johnson (8 papers)
- Bryan Bo Cao (9 papers)
- Kristin Dana (27 papers)
- Shubham Jain (40 papers)
- Ashwin Ashok (12 papers)