GraspSplats: Efficient Manipulation with 3D Feature Splatting (2409.02084v1)
Abstract: The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-LLMs (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to their implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats generates high-quality scene representations in under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods.
- Distilled feature fields enable few-shot language-guided manipulation. In Conference on Robot Learning. PMLR, 2023.
- Language embedded radiance fields for zero-shot task-oriented grasping. In Conference on Robot Learning. PMLR, 2023.
- Decomposing nerf for editing via feature field distillation. NeurIPS, 2022.
- Learning transferable visual models from natural language supervision. In ICML. PMLR, 2021.
- Voxposer: Composable 3d value maps for robotic manipulation with language models. In Conference on Robot Learning. PMLR, 2023.
- Lerf: Language embedded radiance fields. In ICCV, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756, 2018.
- kpam: Keypoint affordances for category-level robotic manipulation. In The International Symposium of Robotics Research, pages 132–157. Springer, 2019.
- Neural descriptor fields: Se (3)-equivariant object representations for manipulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 6394–6400. IEEE, 2022.
- Nerf-supervision: Learning dense object descriptors from neural radiance fields. In 2022 international conference on robotics and automation (ICRA), pages 6496–6503. IEEE, 2022.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 2023.
- G. Chen and W. Wang. A survey on 3d gaussian splatting. arXiv preprint arXiv:2401.03890, 2024.
- Mobilesamv2: Faster segment anything to everything. arXiv preprint arXiv:2312.09579, 2023.
- Extract free dense labels from clip. In ECCV, 2022.
- Grasp pose detection in point clouds. The International Journal of Robotics Research, 2017.
- High precision grasp pose detection in dense clutter. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016.
- Cotracker: It is better to track together. arXiv preprint arXiv:2307.07635, 2023.
- Ok-robot: What really matters in integrating open-knowledge models for robotics. arXiv preprint arXiv:2401.12202, 2024.
- Open-vocabulary queryable scene representations for real world planning. In ICRA, 2023.
- Visual language maps for robot navigation. In ICRA, 2023.
- Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:2302.07241, 2023.
- Vlfm: Vision-language frontier maps for zero-shot semantic navigation. In ICRA, 2024.
- Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning. arXiv preprint arXiv:2309.16650, 2023.
- Segment anything. In ICCV, 2023.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
- Learning generalizable feature fields for mobile manipulation. arXiv preprint arXiv:2403.07563, 2024.
- D3 fields: Dynamic 3d descriptor fields for zero-shot generalizable robotic manipulation. arXiv preprint arXiv:2309.16118, 2023.
- Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects. In Proceedings of The 6th Conference on Robot Learning, 2023.
- J. Redmon and A. Angelova. Real-time grasp detection using convolutional neural networks. In ICRA, 2015.
- Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. Robotics: Science and Systems, 2017.
- Graspnet-1billion: A large-scale benchmark for general object grasping. In CVPR, 2020.
- Anygrasp: Robust and efficient grasp perception in spatial and temporal domains. IEEE Transactions on Robotics, 2023.
- Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In ICRA, 2021.
- 6-dof graspnet: Variational grasp generation for object manipulation. In ICCV, 2019.
- Visual whole-body control for legged loco-manipulation. arXiv preprint arXiv:2403.16967, 2024.
- Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. In CVPR, 2024.
- Feature splatting: Language-driven physics-based scene synthesis and editing. arXiv preprint arXiv:2404.01223, 2024.
- Langsplat: 3d language gaussian splatting. In CVPR, 2024.
- Gaussiangrasper: 3d language gaussian splatting for open-vocabulary robotic grasping. arXiv preprint arXiv:2403.09637, 2024.
- Y. Li and D. Pathak. Object-aware gaussian splatting for robotic manipulation. In ICRA 2024 Workshop on 3D Visual Representations for Robot Manipulation, 2014.
- Colmap-free 3d gaussian splatting. arXiv preprint arXiv:2312.07504, 2023.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, 1996.
- W. Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 1976.
- Structure-from-motion revisited. In CVPR, 2016.
- Tracking anything with decoupled video segmentation. In ICCV, 2023.
- Grounded sam: Assembling open-world models for diverse visual tasks. arXiv preprint arXiv:2401.14159, 2024.
- 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
- Bigbird: A large-scale 3d database of object instances. In ICRA, 2014.
- Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots. arXiv preprint arXiv:2402.10329, 2024.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.