Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Real-world Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning (2404.09645v2)

Published 15 Apr 2024 in cs.RO, cs.CL, and cs.CV

Abstract: Improving instance-specific image goal navigation (InstanceImageNav), which locates the identical object in a real-world environment from a query image, is essential for robotic systems to assist users in finding desired objects. The challenge lies in the domain gap between low-quality images observed by the moving robot, characterized by motion blur and low-resolution, and high-quality query images provided by the user. Such domain gaps could significantly reduce the task success rate but have not been the focus of previous work. To address this, we propose a novel method called Few-shot Cross-quality Instance-aware Adaptation (CrossIA), which employs contrastive learning with an instance classifier to align features between massive low- and few high-quality images. This approach effectively reduces the domain gap by bringing the latent representations of cross-quality images closer on an instance basis. Additionally, the system integrates an object image collection with a pre-trained deblurring model to enhance the observed image quality. Our method fine-tunes the SimSiam model, pre-trained on ImageNet, using CrossIA. We evaluated our method's effectiveness through an InstanceImageNav task with 20 different types of instances, where the robot identifies the same instance in a real-world environment as a high-quality query image. Our experiments showed that our method improves the task success rate by up to three times compared to the baseline, a conventional approach based on SuperGlue. These findings highlight the potential of leveraging contrastive learning and image enhancement techniques to bridge the domain gap and improve object localization in robotic applications. The project website is https://emergentsystemlabstudent.github.io/DomainBridgingNav/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. J. Krantz, S. Lee, J. Malik, D. Batra, and D. S. Chaplot, “Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances,” arXiv preprint arXiv:2211.15876, 2022.
  2. J. Krantz, T. Gervet, K. Yadav, A. Wang, C. Paxton, et al., “Navigating to Objects Specified by Images,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 10 916–10 925.
  3. M. Chang, T. Gervet, M. Khanna, S. Yenamandra, D. Shah, et al., “GOAT: Go to Any Thing,” arXiv preprint arXiv:2311.06430, 2023.
  4. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning Feature Matching with Graph Neural Networks,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2020, pp. 4938–4947.
  5. M. Savva, A. Kadian, O. Maksymets, Y. Zhao, E. Wijmans, et al., “Habitat: A Platform for Embodied AI Research,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9339–9347.
  6. K. Kim, S. Lee, and S. Cho, “MSSNet: Multi-Scale-Stage Network for Single Image Deblurring,” in European Conference on Computer Vision (ECCV), 2022, pp. 524–539.
  7. R. Wang, Z. Wu, Z. Weng, J. Chen, G.-J. Qi, et al., “Cross-domain Contrastive Learning for Unsupervised Domain Adaptation,” IEEE Transactions on Multimedia, 2022.
  8. A. Singh, “CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation,” Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 5089–5101, 2021.
  9. B. Li, J. Han, Y. Cheng, C. Tan, P. Qi, et al., “Object Goal Navigation in Eobodied AI: A Survey,” in International Conference on Video, Signal and Image Processing (VSIP), 2022, pp. 87–92.
  10. J. Gu, E. Stefani, Q. Wu, J. Thomason, and X. Wang, “Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions,” Association for Computational Linguistics (ACL), pp. 7606–7623, 2022.
  11. K. Kaneda, S. Nagashima, R. Korekata, M. Kambara, and K. Sugiura, “Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine,” IEEE Robotics and Automation Letters, 2024.
  12. K. Yadav, R. Ramrakhya, A. Majumdar, V.-P. Berges, S. Kuhar, et al., “Offline Visual Representation Learning for Embodied Navigation,” in Workshop on Reincarnating Reinforcement Learning at International Conference on Learning Representations (ICLR), 2023.
  13. T. Gervet, S. Chintala, D. Batra, J. Malik, and D. S. Chaplot, “Navigating to Objects in the Real World,” Science Robotics, vol. 8, no. 79, p. eadf6991, 2023.
  14. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in International Conference on Machine Learning (ICML), 2020, pp. 1597–1607.
  15. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum Contrast for Unsupervised Visual Representation Learning,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2020, pp. 9729–9738.
  16. X. Chen and K. He, “Exploring Simple siamese Representation Learning,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2021, pp. 15 750–15 758.
  17. J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, et al., “Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 21 271–21 284.
  18. I. Ben-Shaul, R. Shwartz-Ziv, T. Galanti, S. Dekel, and Y. LeCun, “Reverse Engineering Self-Supervised Learning,” Advances in Neural Information Processing Systems (NeurIPS), vol. 37, 2023.
  19. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, et al., “Domain-Adversarial Training of Neural Networks,” Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016.
  20. E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR).
  21. O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2018, pp. 8183–8192.
  22. S. Cho and S. Lee, “Fast Motion Deblurring,” in ACM Transactions on Graphics (SIGGRAPH Asia), 2009, pp. 1–8.
  23. Q. Shan, J. Jia, and A. Agarwala, “High-Quality Motion Deblurring from a Single Image,” ACM Transactions on Graphics (SIGGRAPH), vol. 27, no. 3, pp. 1–10, 2008.
  24. D. Ren, K. Zhang, Q. Wang, Q. Hu, and W. Zuo, “Neural Blind Deconvolution using Deep Priors,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2020, pp. 3341–3350.
  25. A. Kanechika, L. El Hafi, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, “Interactive Learning System for 3D Semantic Segmentation with Autonomous Mobile Robots,” in IEEE/SICE International Symposium on System Integration (SII), 2024, pp. 1274–1281.
  26. X. Zhao, W. Ding, Y. An, Y. Du, T. Yu, et al., “Fast Segment Anything,” arXiv preprint arXiv:2306.12156, 2023.
  27. K. Tateno, F. Tombari, and N. Navab, “Real-time and Scalable Incremental Segmentation on Dense SLAM,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 4465–4472.
  28. Y. Zhang, B. Hooi, D. Hu, J. Liang, and J. Feng, “Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning,” Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 29 848–29 860, 2021.
  29. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, et al., “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2017, pp. 5828–5839.
  30. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, et al., “ImageNet: A Large-Scale Hierarchical Image Database,” in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2009, pp. 248–255.
  31. T. Yamamoto, K. Terada, A. Ochiai, F. Saito, Y. Asahara, et al., “Development of Human Support Robot as the Research Platform of a Domestic Mobile Manipulator,” ROBOMECH Journal, vol. 6, 2019.
  32. T.-Y. Liu, “Learning to Rank for Information Retrieval,” Foundations and Trends in Information Retrieval, vol. 3, no. 3, pp. 225–331, 2009.
  33. X. Lin, J. He, Z. Chen, Z. Lyu, B. Fei, et al., “Diffbir: Towards blind image restoration with generative diffusion prior,” arXiv preprint arXiv:2308.15070, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com