Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects (2309.07473v2)
Abstract: Articulated object manipulation is a fundamental yet challenging task in robotics. Due to significant geometric and semantic variations across object categories, previous manipulation models struggle to generalize to novel categories. Few-shot learning is a promising solution for alleviating this issue by allowing robots to perform a few interactions with unseen objects. However, extant approaches often necessitate costly and inefficient test-time interactions with each unseen instance. Recognizing this limitation, we observe that despite their distinct shapes, different categories often share similar local geometries essential for manipulation, such as pullable handles and graspable edges - a factor typically underutilized in previous few-shot learning works. To harness this commonality, we introduce 'Where2Explore', an affordance learning framework that effectively explores novel categories with minimal interactions on a limited number of instances. Our framework explicitly estimates the geometric similarity across different categories, identifying local areas that differ from shapes in the training categories for efficient exploration while concurrently transferring affordance knowledge to similar parts of the objects. Extensive experiments in simulated and real-world environments demonstrate our framework's capacity for efficient few-shot exploration and generalization.
- Dexart: Benchmarking generalizable dexterous manipulation with articulated objects. arXiv preprint arXiv:2305.05706, 2023.
- Whole-body motion planning for manipulation of articulated objects. In 2013 IEEE International Conference on Robotics and Automation, pages 1656–1662. IEEE, 2013.
- Planning for autonomous door opening with a mobile manipulator. In 2010 IEEE International Conference on Robotics and Automation, pages 1799–1806. IEEE, 2010.
- Strap: Structured object affordance segmentation with point supervision. arXiv preprint arXiv:2304.08492, 2023.
- 3d affordancenet: A benchmark for visual object affordance understanding. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1778–1787, 2021.
- Flowbot3d: Learning 3d articulation flow to manipulate articulated objects. arXiv preprint arXiv:2205.04382, 2022.
- Act the part: Learning interaction strategies for articulated object part discovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15752–15761, 2021.
- Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts. arXiv preprint arXiv:2211.05272, 2022.
- End-to-end affordance learning for robotic manipulation. arXiv preprint arXiv:2209.12941, 2022.
- James J Gibson. The theory of affordances. Hilldale, USA, 1(2):67–82, 1977.
- Carto: Category and joint agnostic reconstruction of articulated objects. Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2023.
- Category-independent articulated object tracking with factor graphs. In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022, pages 3800–3807. IEEE, 2022.
- Ditto in the house: Building articulation models of indoor scenes through interactive perception. In IEEE International Conference on Robotics and Automation (ICRA), 2023.
- OPD: Single-view 3D openable part detection. In ECCV, 2022.
- Ditto: Building digital twins of articulated objects from interaction. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Synergies between affordance and geometry: 6-dof grasp detection via implicit representations. arXiv preprint arXiv:2104.01542, 2021.
- Interactive segmentation, tracking, and kinematic modeling of unknown 3d articulated objects. In 2013 IEEE International Conference on Robotics and Automation, pages 5003–5010. IEEE, 2013.
- Adam: A method for stochastic optimization. The 3rd International Conference for Learning Representations, 2015.
- Visual object-action recognition: Inferring object affordances from human demonstration. Computer Vision and Image Understanding, 115(1):81–90, 2011.
- Category-level articulated object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3706–3715, 2020.
- Nothing but geometric constraints: A model-free method for articulated object pose estimation. arXiv preprint arXiv:2012.00088, 2020.
- Self-supervised category-level articulated object pose estimation with part-level se (3) equivariance. arXiv preprint arXiv:2302.14268, 2023.
- Learning dexterous grasping with object-centric visual affordances. In 2021 IEEE international conference on robotics and automation (ICRA), pages 6169–6176. IEEE, 2021.
- Articulated object interaction in unknown scenes with whole-body mobile manipulation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1647–1654. IEEE, 2022.
- Where2act: From pixels to actions for articulated 3d objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6813–6823, 2021.
- PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- A-sdf: Learning disentangled signed distance functions for articulated shape representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13001–13011, 2021.
- Structure from action: Learning interactions for articulated object 3d structure discovery. arxiv, 2022.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
- Learning agent-aware affordances for closed-loop interaction with articulated objects. arXiv preprint arXiv:2209.05802, 2022.
- Opening a door with a humanoid robot using multi-sensory tactile feedback. In 2008 IEEE International Conference on Robotics and Automation, pages 285–291. IEEE, 2008.
- A probabilistic framework for learning kinematic models of articulated objects. Journal of Artificial Intelligence Research, 41:477–526, 2011.
- Coarse-to-fine active segmentation of interactable parts in real scene images. arXiv preprint arXiv:2303.11530, 2023.
- Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8876–8884, 2019.
- Adaafford: Learning to adapt manipulation affordance for 3d articulated objects via few-shot interactions. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX, pages 90–107. Springer, 2022.
- Captra: Category-level pose tracking for rigid and articulated objects from point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13209–13218, 2021.
- Learning foresightful dense visual affordance for deformable object manipulation. In IEEE International Conference on Computer Vision (ICCV), 2023.
- Vat-mart: Learning visual action trajectory proposals for manipulating 3d articulated objects. arXiv preprint arXiv:2106.14440, 2021.
- Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020.
- Partafford: Part-level affordance discovery from 3d objects. arXiv preprint arXiv:2202.13519, 2022.
- Universal manipulation policy network for articulated objects. IEEE Robotics and Automation Letters, 7(2):2447–2454, 2022.
- Grounding 3d object affordance from 2d interactions in images. arXiv preprint arXiv:2303.10437, 2023.
- Deep part induction from articulated object pairs. arXiv preprint arXiv:1809.07417, 2018.
- Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19313–19322, 2022.
- Visual identification of articulated object parts. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2443–2450. IEEE, 2021.
- Dualafford: Learning collaborative visual affordance for dual-gripper object manipulation. arXiv preprint arXiv:2207.01971, 2022.
- Chuanruo Ning (6 papers)
- Ruihai Wu (28 papers)
- Haoran Lu (20 papers)
- Kaichun Mo (41 papers)
- Hao Dong (175 papers)