Learning Generalizable Feature Fields for Mobile Manipulation
Abstract: An open problem in mobile manipulation is how to represent objects and scenes in a unified manner so that robots can use both for navigation and manipulation. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherent at an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We quantitatively evaluate GeFF's ability for open-vocabulary object-/part-level manipulation and show that GeFF outperforms point-based baselines in runtime and storage-accuracy trade-offs, with qualitative examples of semantics-aware navigation and articulated object manipulation.
- Robot learning in homes: Improving generalization and reducing dataset bias. Advances in Neural Information Processing Systems, 31:9094–9104, 2018.
- Commodity telepresence with team avatrina’s nursebot in the ana avatar xprize finals. In ICRA 2023 2nd Workshop on Toward Robot Avatars, 2023.
- Tidybot: Personalized robot assistance with large language models. Autonomous Robots, 2023.
- Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems. IEEE Transactions on Robotics (T-RO), 38(4):2022–2038, 2022.
- Semantic OcTree Mapping and Shannon Mutual Information Computation for Robot Exploration. IEEE Transactions on Robotics (T-RO), 39(3):1910–1928, 2023.
- Gnm: A general navigation model to drive any robot. In ICRA, 2023.
- Vint: A foundation model for visual navigation. In CORL, 2023.
- Stable bin packing of non-convex 3d objects with a robot manipulator. In ICRA, 2019.
- Homerobot: Open-vocabulary mobile manipulation. arXiv preprint arXiv:2306.11565, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- pixelnerf: Neural radiance fields from one or few images. In CVPR, 2021.
- Featurenerf: Learning generalizable nerfs by distilling foundation models. In ICCV, 2023.
- Learning transferable visual models from natural language supervision. In ICML. PMLR, 2021.
- Lerf: Language embedded radiance fields. In ICCV, 2023.
- Alex Trevithick and Bo Yang. Grf: Learning a general radiance field for 3d representation and rendering. In ICCV, 2021.
- F2-nerf: Fast neural radiance field training with free camera trajectories. In CVPR, 2023.
- Zip-nerf: Anti-aliased grid-based neural radiance fields. arXiv preprint arXiv:2304.06706, 2023.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
- Is attention all that nerf needs? In ICLR, 2023.
- Actorsnerf: Animatable few-shot human rendering with generalizable nerfs. In ICCV, pages 18391–18401, 2023.
- Advances in neural rendering. In arXiv:2111.05849, 2021.
- Lolnerf: Learn from one look. In CVPR, 2022.
- Decomposing nerf for editing via feature field distillation. NeurIPS, 2022.
- Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. In International Conference on 3D Vision (3DV), 2022.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Distilled feature fields enable few-shot language-guided manipulation. arXiv preprint arXiv:2308.07931, 2023.
- Gnfactor: Multi-task real robot learning with generalizable neural feature fields. In CoRL, 2023.
- Poni: Potential functions for objectgoal navigation with interaction-free learning. In CVPR, 2022.
- Navigating to objects in the real world. In SCIENCE ROBOTICS, 2023.
- Clip-fields: Weakly supervised semantic fields for robotic memory. In RSS, 2023.
- Open-vocabulary queryable scene representations for real world planning. In ICRA, 2023.
- Stubborn: A strong baseline for indoor object navigation. In IROS, 2022.
- Navigation with large language models: Semantic guesswork as a heuristic for planning. In Conference on Robot Learning (CoRL), 2023.
- Reasoning with scene graphs for robot planning under partial observability. IEEE Robotics and Automation Letters (RAL), 7(2):5560–5567, 2022.
- Hierarchical representations and explicit memory: Learning effective navigation policies on 3d scene graphs using graph neural networks. In IEEE International Conference on Robotics and Automation (ICRA), pages 9272–9279, 2022.
- Object goal navigation using goal-oriented semantic exploration. In In Neural Information Processing Systems (NeurIPS), 2020.
- Ovrl-v2: A simple state-of-art baseline for imagenav and objectnav. In arXiv:2303.07798, 2023.
- Topological semantic graph memory for image-goal navigation. In CoRL, 2022.
- Multi-object navigation with dynamically learned neural implicit representations. In ICCV, 2023.
- Unifying perception, estimation and action for mobile manipulation via belief space planning. In ICRA, 2012.
- Fully autonomous real-world reinforcement learning with applications to mobile manipulation. In CoRL, 2021.
- Error-aware imitation learning from teleoperation data for mobile manipulation. In CoRL, 2021.
- Multi-skill mobile manipulation for object rearrangement. In ICLR, 2023.
- Slap: Spatial-language attention policies. In CoRL, 2023.
- Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation. In ICRA, 2021.
- Asc: Adaptive skill coordination for robotic mobile manipulation. In arXiv:2304.00410, 2023.
- Skill transformer: A monolithic policy for mobile manipulation. In ICCV, 2023.
- Open-world object manipulation using pre-trained vision-language model. In CoRL, 2023.
- Go fetch: Mobile manipulation in unstructured environments. In arXiv:2004.00899, 2020.
- Go fetch! - dynamic grasps using boston dynamics spot with external robotic arm. In ICRA, 2021.
- Kinematically-decoupled impedance control for fast object visual servoing and grasping on quadruped manipulators. In IROS, 2023.
- Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation. In CVPR, 2023.
- Ok-robot: What really matters in integrating open-knowledge models for robotics. arXiv preprint arXiv:2401.12202, 2024.
- Visual language maps for robot navigation. In ICRA, 2023.
- Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:2302.07241, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
- 3d reconstruction with generalizable neural fields using scene priors. arXiv preprint arXiv:2309.15164, 2023.
- Ponder: Point cloud pre-training via neural rendering. In ICCV, 2023.
- Extract free dense labels from clip. In ECCV, 2022.
- isdf: Real-time neural signed distance fields for robot perception. In RSS, 2022.
- Implicit geometric regularization for learning shapes. In ICML. PMLR, 2020.
- 3d object detection with pointformer. In CVPR, 2021.
- Point transformer. In ICCV, 2021.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
- Deep residual learning for image recognition. In CVPR, 2016.
- Hybvio: Pushing the limits of real-time visual-inertial odometry. In WACV, 2022.
- The Open Motion Planning Library. IEEE Robotics & Automation Magazine, 19(4):72–82, December 2012. https://ompl.kavrakilab.org.
- https://gazebosim.org/home.
- Habitat 2.0: Training home assistants to rearrange their habitat. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.