Papers
Topics
Authors
Recent
2000 character limit reached

GraspSplats: Efficient Manipulation with 3D Feature Splatting (2409.02084v1)

Published 3 Sep 2024 in cs.RO, cs.CV, and cs.LG

Abstract: The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-LLMs (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to their implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats generates high-quality scene representations in under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Distilled feature fields enable few-shot language-guided manipulation. In Conference on Robot Learning. PMLR, 2023.
  2. Language embedded radiance fields for zero-shot task-oriented grasping. In Conference on Robot Learning. PMLR, 2023.
  3. Decomposing nerf for editing via feature field distillation. NeurIPS, 2022.
  4. Learning transferable visual models from natural language supervision. In ICML. PMLR, 2021.
  5. Voxposer: Composable 3d value maps for robotic manipulation with language models. In Conference on Robot Learning. PMLR, 2023.
  6. Lerf: Language embedded radiance fields. In ICCV, 2023.
  7. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  8. Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756, 2018.
  9. kpam: Keypoint affordances for category-level robotic manipulation. In The International Symposium of Robotics Research, pages 132–157. Springer, 2019.
  10. Neural descriptor fields: Se (3)-equivariant object representations for manipulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 6394–6400. IEEE, 2022.
  11. Nerf-supervision: Learning dense object descriptors from neural radiance fields. In 2022 international conference on robotics and automation (ICRA), pages 6496–6503. IEEE, 2022.
  12. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 2023.
  13. G. Chen and W. Wang. A survey on 3d gaussian splatting. arXiv preprint arXiv:2401.03890, 2024.
  14. Mobilesamv2: Faster segment anything to everything. arXiv preprint arXiv:2312.09579, 2023.
  15. Extract free dense labels from clip. In ECCV, 2022.
  16. Grasp pose detection in point clouds. The International Journal of Robotics Research, 2017.
  17. High precision grasp pose detection in dense clutter. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016.
  18. Cotracker: It is better to track together. arXiv preprint arXiv:2307.07635, 2023.
  19. Ok-robot: What really matters in integrating open-knowledge models for robotics. arXiv preprint arXiv:2401.12202, 2024.
  20. Open-vocabulary queryable scene representations for real world planning. In ICRA, 2023.
  21. Visual language maps for robot navigation. In ICRA, 2023.
  22. Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:2302.07241, 2023.
  23. Vlfm: Vision-language frontier maps for zero-shot semantic navigation. In ICRA, 2024.
  24. Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning. arXiv preprint arXiv:2309.16650, 2023.
  25. Segment anything. In ICCV, 2023.
  26. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
  27. Learning generalizable feature fields for mobile manipulation. arXiv preprint arXiv:2403.07563, 2024.
  28. D3 fields: Dynamic 3d descriptor fields for zero-shot generalizable robotic manipulation. arXiv preprint arXiv:2309.16118, 2023.
  29. Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects. In Proceedings of The 6th Conference on Robot Learning, 2023.
  30. J. Redmon and A. Angelova. Real-time grasp detection using convolutional neural networks. In ICRA, 2015.
  31. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. Robotics: Science and Systems, 2017.
  32. Graspnet-1billion: A large-scale benchmark for general object grasping. In CVPR, 2020.
  33. Anygrasp: Robust and efficient grasp perception in spatial and temporal domains. IEEE Transactions on Robotics, 2023.
  34. Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In ICRA, 2021.
  35. 6-dof graspnet: Variational grasp generation for object manipulation. In ICCV, 2019.
  36. Visual whole-body control for legged loco-manipulation. arXiv preprint arXiv:2403.16967, 2024.
  37. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. In CVPR, 2024.
  38. Feature splatting: Language-driven physics-based scene synthesis and editing. arXiv preprint arXiv:2404.01223, 2024.
  39. Langsplat: 3d language gaussian splatting. In CVPR, 2024.
  40. Gaussiangrasper: 3d language gaussian splatting for open-vocabulary robotic grasping. arXiv preprint arXiv:2403.09637, 2024.
  41. Y. Li and D. Pathak. Object-aware gaussian splatting for robotic manipulation. In ICRA 2024 Workshop on 3D Visual Representations for Robot Manipulation, 2014.
  42. Colmap-free 3d gaussian splatting. arXiv preprint arXiv:2312.07504, 2023.
  43. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, 1996.
  44. W. Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 1976.
  45. Structure-from-motion revisited. In CVPR, 2016.
  46. Tracking anything with decoupled video segmentation. In ICCV, 2023.
  47. Grounded sam: Assembling open-world models for diverse visual tasks. arXiv preprint arXiv:2401.14159, 2024.
  48. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  49. Bigbird: A large-scale 3d database of object instances. In ICRA, 2014.
  50. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots. arXiv preprint arXiv:2402.10329, 2024.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.