SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation (2310.16838v2)
Abstract: Humans demonstrate remarkable skill in transferring manipulation abilities across objects of varying shapes, poses, and appearances, a capability rooted in their understanding of semantic correspondences between different instances. To equip robots with a similar high-level comprehension, we present SparseDFF, a novel DFF for 3D scenes utilizing large 2D vision models to extract semantic features from sparse RGBD images, a domain where research is limited despite its relevance to many tasks with fixed-camera setups. SparseDFF generates view-consistent 3D DFFs, enabling efficient one-shot learning of dexterous manipulations by mapping image features to a 3D point cloud. Central to SparseDFF is a feature refinement network, optimized with a contrastive loss between views and a point-pruning mechanism for feature continuity. This facilitates the minimization of feature discrepancies w.r.t. end-effector parameters, bridging demonstrations and target manipulations. Validated in real-world scenarios with a dexterous hand, SparseDFF proves effective in manipulating both rigid and deformable objects, demonstrating significant generalization capabilities across object and scene variations.
- Goal directed multi-finger manipulation: Control policies and analysis. Computers & Graphics, 37(7):830–839, 2013.
- Learning dexterous in-hand manipulation. International Journal of Robotics Research (IJRR), 39(1):3–20, 2020.
- Dexterous manipulation using both palm and fingers. In International Conference on Robotics and Automation (ICRA), 2014.
- The ycb object and model set: Towards common benchmarks for manipulation research. In International Conference on Robotics and Automation (ICRA), 2015a.
- Benchmarking in manipulation research: The ycb object and model set and benchmarking protocols. arXiv preprint arXiv:1502.03143, 2015b.
- Yale-cmu-berkeley dataset for robotic manipulation research. International Journal of Robotics Research (IJRR), 36(3):261–268, 2017.
- Emerging properties in self-supervised vision transformers. In International Conference on Computer Vision (ICCV), 2021.
- A system for general in-hand object re-orientation. In Conference on Robot Learning (CoRL), 2022.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), 2020.
- D-grasp: Physically plausible dynamic grasp synthesis for hand-object interactions. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Extrinsic dexterity: In-hand manipulation with external forces. In International Conference on Robotics and Automation (ICRA), 2014.
- Graspnerf: multiview-based 6-dof grasp detection for transparent and specular objects using generalizable nerf. In International Conference on Robotics and Automation (ICRA), 2023.
- Push-grasping with dexterous hands: Mechanics and a method. In International Conference on Intelligent Robots and Systems (IROS), 2010.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
- Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756, 2018.
- Generalization in dexterous manipulation via geometry-aware multi-task learning. arXiv preprint arXiv:2111.03062, 2021.
- Grasping field: Learning implicit representations for human grasps. In International Conference on 3D Vision (3DV), 2020.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 42(4):1–14, 2023.
- Lerf: Language embedded radiance fields. arXiv preprint arXiv:2303.09553, 2023.
- Segment anything. arXiv:2304.02643, 2023.
- Decomposing nerf for editing via feature field distillation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Learning dexterous manipulation policies from experience and imitation. arXiv preprint arXiv:1611.05095, 2016a.
- Optimal control with learned local models: Application to dexterous manipulation. In International Conference on Robotics and Automation (ICRA), 2016b.
- Gendexgrasp: Generalizable dexterous grasping. In International Conference on Robotics and Automation (ICRA), 2023.
- Spawnnet: Learning generalizable visuomotor skills from pre-trained networks. arXiv preprint arXiv:2307.03567, 2023.
- Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Planning multi-fingered grasps as probabilistic inference in a learned deep network. In Robotics Research: The International Symposium, 2020.
- Learning dexterous grasping with object-centric visual affordances. In International Conference on Robotics and Automation (ICRA), 2021.
- Dexvip: Learning dexterous grasping with human hand pose priors from video. In Conference on Robot Learning (CoRL), 2022.
- kpam: Keypoint affordances for category-level robotic manipulation. In The International Symposium of Robotics Research, 2019.
- Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning (CoRL), 2020.
- An overview of dexterous manipulation. In International Conference on Robotics and Automation (ICRA), 2000.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Openscene: 3d scene understanding with open vocabularies. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- In-hand object rotation via rapid motor adaptation. In Conference on Robot Learning (CoRL), 2023.
- Dexpoint: Generalizable point cloud reinforcement learning for sim-to-real dexterous manipulation. In Conference on Robot Learning (CoRL), 2023.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 2021.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
- Language embedded radiance fields for zero-shot task-oriented grasping. arXiv preprint arXiv:2309.07970, 2023.
- Neural volumetric object selection. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Daniela Rus. In-hand dexterous manipulation of piecewise-smooth 3-d objects. International Journal of Robotics Research (IJRR), 18(4):355–381, 1999.
- Equivariant descriptor fields: Se (3)-equivariant energy-based models for end-to-end visual robotic manipulation learning. arXiv preprint arXiv:2206.08321, 2022.
- Articulated hands: Force control and kinematic issues. International Journal of Robotics Research (IJRR), 1(1):4–17, 1982.
- Clip-fields: Weakly supervised semantic fields for robotic memory. arXiv preprint arXiv:2210.05663, 2022.
- Learning high-dof reaching-and-grasping via dynamic representation of gripper-object interaction. arXiv preprint arXiv:2204.13998, 2022.
- Distilled feature fields enable few-shot manipulation. In Conference on Robot Learning (CoRL), 2023.
- Panoptic lifting for 3d scene understanding with neural fields. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Neural descriptor fields: Se (3)-equivariant object representations for manipulation. In International Conference on Robotics and Automation (ICRA), 2022.
- Se (3)-equivariant relational rearrangement with neural descriptor fields. In Conference on Robot Learning (CoRL), 2023.
- Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. In International Conference on 3D Vision (3DV), 2022.
- Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. In International Conference on Robotics and Automation (ICRA), 2023.
- Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning. arXiv preprint arXiv:2304.00464, 2023.
- Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation. In International Conference on Robotics and Automation (ICRA), 2023.
- Generalized anthropomorphic functional grasping with minimal demonstrations. arXiv preprint arXiv:2303.17808, 2023.
- Neural grasp distance fields for robot manipulation. In International Conference on Robotics and Automation (ICRA), 2023.
- Learning generalizable dexterous manipulation from human grasp affordance. In Conference on Robot Learning (CoRL), 2023.
- Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In European Conference on Computer Vision (ECCV), 2020.
- Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Useek: Unsupervised se (3)-equivariant 3d keypoints for generalizable manipulation. In International Conference on Robotics and Automation (ICRA), 2023.
- Gnfactor: Multi-task real robot learning with generalizable neural feature fields. arXiv preprint arXiv:2308.16891, 2023.
- In-place scene labelling and understanding with implicit scene representation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021.