Learning Generalizable Feature Fields for Mobile Manipulation (2403.07563v2)
Abstract: An open problem in mobile manipulation is how to represent objects and scenes in a unified manner so that robots can use both for navigation and manipulation. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherent at an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We quantitatively evaluate GeFF's ability for open-vocabulary object-/part-level manipulation and show that GeFF outperforms point-based baselines in runtime and storage-accuracy trade-offs, with qualitative examples of semantics-aware navigation and articulated object manipulation.
- Robot learning in homes: Improving generalization and reducing dataset bias. Advances in Neural Information Processing Systems, 31:9094–9104, 2018.
- Commodity telepresence with team avatrina’s nursebot in the ana avatar xprize finals. In ICRA 2023 2nd Workshop on Toward Robot Avatars, 2023.
- Tidybot: Personalized robot assistance with large language models. Autonomous Robots, 2023.
- Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems. IEEE Transactions on Robotics (T-RO), 38(4):2022–2038, 2022.
- Semantic OcTree Mapping and Shannon Mutual Information Computation for Robot Exploration. IEEE Transactions on Robotics (T-RO), 39(3):1910–1928, 2023.
- Gnm: A general navigation model to drive any robot. In ICRA, 2023.
- Vint: A foundation model for visual navigation. In CORL, 2023.
- Stable bin packing of non-convex 3d objects with a robot manipulator. In ICRA, 2019.
- Homerobot: Open-vocabulary mobile manipulation. arXiv preprint arXiv:2306.11565, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- pixelnerf: Neural radiance fields from one or few images. In CVPR, 2021.
- Featurenerf: Learning generalizable nerfs by distilling foundation models. In ICCV, 2023.
- Learning transferable visual models from natural language supervision. In ICML. PMLR, 2021.
- Lerf: Language embedded radiance fields. In ICCV, 2023.
- Alex Trevithick and Bo Yang. Grf: Learning a general radiance field for 3d representation and rendering. In ICCV, 2021.
- F2-nerf: Fast neural radiance field training with free camera trajectories. In CVPR, 2023.
- Zip-nerf: Anti-aliased grid-based neural radiance fields. arXiv preprint arXiv:2304.06706, 2023.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
- Is attention all that nerf needs? In ICLR, 2023.
- Actorsnerf: Animatable few-shot human rendering with generalizable nerfs. In ICCV, pages 18391–18401, 2023.
- Advances in neural rendering. In arXiv:2111.05849, 2021.
- Lolnerf: Learn from one look. In CVPR, 2022.
- Decomposing nerf for editing via feature field distillation. NeurIPS, 2022.
- Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. In International Conference on 3D Vision (3DV), 2022.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Distilled feature fields enable few-shot language-guided manipulation. arXiv preprint arXiv:2308.07931, 2023.
- Gnfactor: Multi-task real robot learning with generalizable neural feature fields. In CoRL, 2023.
- Poni: Potential functions for objectgoal navigation with interaction-free learning. In CVPR, 2022.
- Navigating to objects in the real world. In SCIENCE ROBOTICS, 2023.
- Clip-fields: Weakly supervised semantic fields for robotic memory. In RSS, 2023.
- Open-vocabulary queryable scene representations for real world planning. In ICRA, 2023.
- Stubborn: A strong baseline for indoor object navigation. In IROS, 2022.
- Navigation with large language models: Semantic guesswork as a heuristic for planning. In Conference on Robot Learning (CoRL), 2023.
- Reasoning with scene graphs for robot planning under partial observability. IEEE Robotics and Automation Letters (RAL), 7(2):5560–5567, 2022.
- Hierarchical representations and explicit memory: Learning effective navigation policies on 3d scene graphs using graph neural networks. In IEEE International Conference on Robotics and Automation (ICRA), pages 9272–9279, 2022.
- Object goal navigation using goal-oriented semantic exploration. In In Neural Information Processing Systems (NeurIPS), 2020.
- Ovrl-v2: A simple state-of-art baseline for imagenav and objectnav. In arXiv:2303.07798, 2023.
- Topological semantic graph memory for image-goal navigation. In CoRL, 2022.
- Multi-object navigation with dynamically learned neural implicit representations. In ICCV, 2023.
- Unifying perception, estimation and action for mobile manipulation via belief space planning. In ICRA, 2012.
- Fully autonomous real-world reinforcement learning with applications to mobile manipulation. In CoRL, 2021.
- Error-aware imitation learning from teleoperation data for mobile manipulation. In CoRL, 2021.
- Multi-skill mobile manipulation for object rearrangement. In ICLR, 2023.
- Slap: Spatial-language attention policies. In CoRL, 2023.
- Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation. In ICRA, 2021.
- Asc: Adaptive skill coordination for robotic mobile manipulation. In arXiv:2304.00410, 2023.
- Skill transformer: A monolithic policy for mobile manipulation. In ICCV, 2023.
- Open-world object manipulation using pre-trained vision-language model. In CoRL, 2023.
- Go fetch: Mobile manipulation in unstructured environments. In arXiv:2004.00899, 2020.
- Go fetch! - dynamic grasps using boston dynamics spot with external robotic arm. In ICRA, 2021.
- Kinematically-decoupled impedance control for fast object visual servoing and grasping on quadruped manipulators. In IROS, 2023.
- Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation. In CVPR, 2023.
- Ok-robot: What really matters in integrating open-knowledge models for robotics. arXiv preprint arXiv:2401.12202, 2024.
- Visual language maps for robot navigation. In ICRA, 2023.
- Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:2302.07241, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
- 3d reconstruction with generalizable neural fields using scene priors. arXiv preprint arXiv:2309.15164, 2023.
- Ponder: Point cloud pre-training via neural rendering. In ICCV, 2023.
- Extract free dense labels from clip. In ECCV, 2022.
- isdf: Real-time neural signed distance fields for robot perception. In RSS, 2022.
- Implicit geometric regularization for learning shapes. In ICML. PMLR, 2020.
- 3d object detection with pointformer. In CVPR, 2021.
- Point transformer. In ICCV, 2021.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
- Deep residual learning for image recognition. In CVPR, 2016.
- Hybvio: Pushing the limits of real-time visual-inertial odometry. In WACV, 2022.
- The Open Motion Planning Library. IEEE Robotics & Automation Magazine, 19(4):72–82, December 2012. https://ompl.kavrakilab.org.
- https://gazebosim.org/home.
- Habitat 2.0: Training home assistants to rearrange their habitat. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Ri-Zhao Qiu (9 papers)
- Yafei Hu (7 papers)
- Ge Yang (49 papers)
- Yuchen Song (16 papers)
- Yang Fu (43 papers)
- Jianglong Ye (11 papers)
- Jiteng Mu (10 papers)
- Ruihan Yang (43 papers)
- Nikolay Atanasov (101 papers)
- Sebastian Scherer (163 papers)
- Xiaolong Wang (243 papers)