Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration (2310.07896v1)

Published 11 Oct 2023 in cs.RO, cs.CV, and cs.LG

Abstract: Robotic learning for navigation in unfamiliar environments needs to provide policies for both task-oriented navigation (i.e., reaching a goal that the robot has located), and task-agnostic exploration (i.e., searching for a goal in a novel setting). Typically, these roles are handled by separate models, for example by using subgoal proposals, planning, or separate navigation strategies. In this paper, we describe how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration, with the latter providing the ability to search novel environments, and the former providing the ability to reach a user-specified goal once it has been located. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments, as compared to approaches that use subgoal proposals from generative models, or prior methods based on latent variable models. We instantiate our method by using a large-scale Transformer-based policy trained on data from multiple ground robots, with a diffusion model decoder to flexibly handle both goal-conditioned and goal-agnostic navigation. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods, and demonstrate significant improvements in performance and lower collision rates, despite utilizing smaller models than state-of-the-art approaches. For more videos, code, and pre-trained model checkpoints, see https://general-navigation-models.github.io/nomad/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. A. Faust, K. Oslund, O. Ramirez, A. Francis, L. Tapia, M. Fiser, and J. Davidson, “Prm-rl: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning,” in 2018 International Conference on Robotics and Automation (ICRA), 2018.
  2. C. Li, F. Xia, R. Martín-Martín, and S. Savarese, “HRL4IN: hierarchical reinforcement learning for interactive navigation with mobile manipulators,” in 3rd Annual Conference on Robot Learning (CoRL), 2019.
  3. D. Shah, A. Sridhar, N. Dashora, K. Stachowicz, K. Black, N. Hirose, and S. Levine, “ViNT: A Foundation Model for Visual Navigation,” in 7th Annual Conference on Robot Learning (CoRL), 2023.
  4. B. Kuipers and Y.-T. Byun, “A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations,” Robotics and Autonomous Systems, 1991, special Issue Toward Learning Robots.
  5. F. Bourgault, A. A. Makarenko, S. B. Williams, B. Grocholsky, and H. F. Durrant-Whyte, “Information based adaptive robotic exploration,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2002.
  6. T. Kollar and N. Roy, “Efficient optimization of information-theoretic exploration in slam,” in AAAI Conference on Artificial Intelligence (AAAI), 2008.
  7. W. Tabib, K. Goel, J. Yao, M. Dabhi, C. Boirum, and N. Michael, “Real-time information-theoretic exploration with gaussian mixture model maps.” in Robotics: Science and Systems, 2019.
  8. B. Yamauchi, “A frontier-based approach for autonomous exploration,” in IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA), 1997.
  9. B. Charrow, S. Liu, V. Kumar, and N. Michael, “Information-theoretic mapping using cauchy-schwarz quadratic mutual information,” in International Conference on Robotics and Automation (ICRA), 2015.
  10. D. Holz, N. Basilico, F. Amigoni, and S. Behnke, “A comparative evaluation of exploration strategies and heuristics to improve them,” 2011.
  11. D. Shah, B. Eysenbach, G. Kahn, N. Rhinehart, and S. Levine, “ViNG: Learning Open-World Navigation with Visual Goals,” in International Conference on Robotics and Automation (ICRA), 2021.
  12. X. Meng, N. Ratliff, Y. Xiang, and D. Fox, “Scaling Local Control to Large-Scale Topological Navigation,” in International Conference on Robotics and Automation (ICRA), 2020.
  13. D. S. Chaplot, H. Jiang, S. Gupta, and A. Gupta, “Semantic curiosity for active visual learning,” in ECCV, 2020.
  14. T. Chen, S. Gupta, and A. Gupta, “Learning exploration policies for navigation,” in International Conference on Learning Representations (ICLR), 2019.
  15. H. Tan, L. Yu, and M. Bansal, “Learning to navigate unseen environments: Back translation with environmental dropout,” in 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)), 2019.
  16. A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?” IEEE Robotics and Automation Letters, 2020.
  17. D. Shah, B. Eysenbach, N. Rhinehart, and S. Levine, “Rapid exploration for open-world navigation with latent goal models,” in Conference on Robot Learning (CoRL), 2021.
  18. D. S. Chaplot, R. Salakhutdinov, A. Gupta, and S. Gupta, “Neural topological slam for visual navigation,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  19. D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in International Conference on Machine Learning, 2017.
  20. A. Khazatsky, A. Nair, D. Jing, and S. Levine, “What can i do here? learning new skills by imagining visual affordances,” in IEEE International Conference on Robotics and Automation (ICRA), 2021.
  21. K. Fang, P. Yin, A. Nair, H. R. Walke, G. Yan, and S. Levine, “Generalization with lossy affordances: Leveraging broad offline data for learning visuomotor tasks,” in 6th Annual Conference on Robot Learning, 2022.
  22. T. Gervet, S. Chintala, D. Batra, J. Malik, and D. S. Chaplot, “Navigating to objects in the real world,” Science Robotics, 2023.
  23. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Neural Information Processing Systems, 2020.
  24. L. Metz, J. Ibarz, N. Jaitly, and J. Davidson, “Discrete sequential prediction of continuous actions for deep rl,” 2019.
  25. R. Dadashi, L. Hussenot, D. Vincent, S. Girgin, A. Raichuk, M. Geist, and O. Pietquin, “Continuous control with action quantization from demonstrations,” in 39th International Conference on Machine Learning (ICML), 2022.
  26. N. M. M. Shafiullah, Z. J. Cui, A. Altanzaya, and L. Pinto, “Behavior transformers: Cloning $k$ modes with one stone,” in Advances in Neural Information Processing Systems (NeurIPS), A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022.
  27. Y. Chebotar et al., “Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions,” in 7th Annual Conference on Robot Learning (CoRL), 2023.
  28. P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson, “Implicit behavioral cloning,” in 5th Annual Conference on Robot Learning (CoRL), 2021.
  29. M. Janner, Y. Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” in International Conference on Machine Learning (ICML), 2022.
  30. Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion policies as an expressive policy class for offline reinforcement learning,” in The Eleventh International Conference on Learning Representations (ICLR), 2023.
  31. C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Robotics: Science and Systems (RSS), 2023.
  32. T. Pearce, T. Rashid, A. Kanervisto, D. Bignell, M. Sun, R. Georgescu, S. V. Macua, S. Z. Tan, I. Momennejad, K. Hofmann, and S. Devlin, “Imitating human behaviour with diffusion models,” in The Eleventh International Conference on Learning Representations (ICLR), 2023.
  33. M. Reuss, M. Li, X. Jia, and R. Lioutikov, “Goal conditioned imitation learning using score-based diffusion policies,” in Robotics: Science and Systems (RSS), 2023.
  34. P. Hansen-Estruch, I. Kostrikov, M. Janner, J. G. Kuba, and S. Levine, “Idql: Implicit q-learning as an actor-critic method with diffusion policies,” 2023.
  35. D. Shah and S. Levine, “ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints,” in Robotics: Science and Systems (RSS), 2022.
  36. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.
  37. A. Brohan et al., “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint, 2022.
  38. N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp, “Wayformer: Motion forecasting via simple and efficient attention networks,” 2022.
  39. M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2020.
  40. D. Shah, A. Sridhar, A. Bhorkar, N. Hirose, and S. Levine, “GNM: A General Navigation Model to Drive Any Robot,” in International Conference on Robotics and Automation (ICRA), 2023.
  41. N. Hirose, D. Shah, A. Sridhar, and S. Levine, “Sacson: Scalable autonomous data collection for social navigation,” arXiv, 2023.
  42. A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning (ICML), 2021.
  43. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations (ICLR), 2019.
  44. Y. Bengio, N. Léonard, and A. C. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” ArXiv, 2013.
  45. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR), 2021.
Citations (69)

Summary

  • The paper introduces a unified diffusion policy that effectively models multimodal action distributions for both goal-directed and exploratory navigation.
  • It leverages a Transformer-based architecture with EfficientNet-driven goal masking to flexibly toggle between task-specific and task-agnostic behaviors.
  • Empirical evaluations demonstrate over 25% improvement in success rates and a 15x reduction in model size compared to competing approaches in unseen environments.

Overview of NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

The paper "NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration" presents a novel approach to robotic learning for navigating both familiar and unfamiliar environments. The authors propose a unified policy that is versatile enough to handle both goal-directed navigation and exploration without a predetermined goal. This policy is grounded in utilizing goal masking alongside a diffusion model to effectively model complex, multimodal distributions of actions within real-world settings.

Key Contributions

The central contribution of this work is the introduction of NoMaD, a diffusion policy that utilizes a Transformer-based architecture coupled with goal masking. This approach allows the policy to function in both task-specific and task-agnostic capacities, providing improved performance over existing methods that employ separate models for goal-conditioned and undirected navigation.

  1. Unified Diffusion Policy: By leveraging a diffusion model, NoMaD is able to model the distribution of possible actions based on visual observations, making it versatile for both goal-seeking and exploratory tasks.
  2. Architecture and Methodology: The approach employs a Transformer backbone for the encoding of visual inputs and utilizes EfficientNet encoders for goal masking. This setup leverages conditioned attention mechanisms to flexibly switch between goal-directed and undirected actions.
  3. Empirical Evaluation: Experiments conducted with NoMaD demonstrated superior effectiveness in navigating unseen environments compared to five alternative methods. Notably, the diffusion model approach proved to outperform subgoal proposal techniques, with notable improvements in performance and a significant reduction in collision rates.

Numerical Results and Comparative Analysis

The evaluation of NoMaD took place across diverse real-world settings, indicating a success rate improvement by over 25% in exploratory tasks as compared to the current state-of-the-art, particularly the ViNT system with subgoal diffusion. Despite NoMaD's success, it manages these advanced capabilities with a model size that is 15 times smaller than comparable approaches, underscoring its computational efficiency.

Theoretical and Practical Implications

Theoretically, NoMaD provides new insights into how a unified model can effectively manage both navigation with and without destination images, pointing towards a more flexible and generalizable policy structure. The development and successful implementation of goal masking alongside diffusion models propose a compelling method for future navigation systems that can adapt dynamically to varying contexts and objectives.

Practically, the deployment of NoMaD could simplify the integration of robotic systems into complex environments, minimizing the need for multiple specialized models and facilitating the robots' ability to adapt to new tasks with minimal intervention. This has significant implications for the deployment of robots in dynamic real-world settings such as search-and-rescue missions or autonomous delivery, where robots must navigate uncharted terrains.

Future Developments

Potential future advancements could involve extending the goal specification modalities to include language instructions and spatial coordinates, broadening the range of applicable use cases significantly. Additionally, refining the exploration strategies through semantic understanding or incorporation of prior knowledge could yield further enhancements in performance.

Conclusion

Overall, the paper presents a comprehensive framework for robotic navigation that combines state-of-the-art neural architectures with powerful probabilistic modeling, offering a substantial step forward in flexible, efficient robotic learning and deployment in diverse environments. The innovations introduced by NoMaD are expected to have a lasting impact on the field of machine learning for robotics, as well as on autonomous navigation technologies.

Youtube Logo Streamline Icon: https://streamlinehq.com