Papers
Topics
Authors
Recent
Search
2000 character limit reached

UniDoorManip: Learning Universal Door Manipulation Policy Over Large-scale and Diverse Door Manipulation Environments

Published 5 Mar 2024 in cs.RO | (2403.02604v3)

Abstract: Learning a universal manipulation policy encompassing doors with diverse categories, geometries and mechanisms, is crucial for future embodied agents to effectively work in complex and broad real-world scenarios. Due to the limited datasets and unrealistic simulation environments, previous works fail to achieve good performance across various doors. In this work, we build a novel door manipulation environment reflecting different realistic door manipulation mechanisms, and further equip this environment with a large-scale door dataset covering 6 door categories with hundreds of door bodies and handles, making up thousands of different door instances. Additionally, to better emulate real-world scenarios, we introduce a mobile robot as the agent and use the partial and occluded point cloud as the observation, which are not considered in previous works while possessing significance for real-world implementations. To learn a universal policy over diverse doors, we propose a novel framework disentangling the whole manipulation process into three stages, and integrating them by training in the reversed order of inference. Extensive experiments validate the effectiveness of our designs and demonstrate our framework's strong performance. Code, data and videos are avaible on https://unidoormanip.github.io/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Learning to generalize kinematic models to novel objects. In Proceedings of the 3rd Conference on Robot Learning, 2019.
  2. Affordance learning from play for sample-efficient policy learning. In 2022 International Conference on Robotics and Automation (ICRA), pages 6372–6378. IEEE, 2022.
  3. Learning environment-aware affordance for 3d articulated object manipulation under occlusions. arXiv preprint arXiv:2309.07510, 2023.
  4. Flowbot3d: Learning 3d articulation flow to manipulate articulated objects. arXiv preprint arXiv:2205.04382, 2022.
  5. Partmanip: Learning cross-category generalizable part manipulation policy from point cloud observations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2978–2988, 2023a.
  6. Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7081–7091, 2023b.
  7. End-to-end affordance learning for robotic manipulation. arXiv preprint arXiv:2209.12941, 2022.
  8. James J Gibson. The theory of affordances. Hilldale, USA, 1(2):67–82, 1977.
  9. Maniskill2: A unified benchmark for generalizable manipulation skills. arXiv preprint arXiv:2302.04659, 2023.
  10. Active articulation model estimation through interactive perception. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 3305–3312. IEEE, 2015.
  11. Screwnet: Category-independent articulation model estimation from depth images using screw theory. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13670–13677. IEEE, 2021.
  12. Ditto: Building digital twins of articulated objects from interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5616–5626, 2022.
  13. Manipulating articulated objects with interactive perception. In 2008 IEEE International Conference on Robotics and Automation, pages 272–277. IEEE, 2008.
  14. Interactive segmentation, tracking, and kinematic modeling of unknown 3d articulated objects. In 2013 IEEE International Conference on Robotics and Automation, pages 5003–5010. IEEE, 2013.
  15. Category-level articulated object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3706–3715, 2020.
  16. Akb-48: A real-world articulated object knowledge base. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14809–14818, 2022.
  17. Nothing but geometric constraints: A model-free method for articulated object pose estimation. arXiv preprint arXiv:2012.00088, 2020.
  18. Sagci-system: Towards sample-efficient, generalizable, compositional, and incremental robot learning. In 2022 International Conference on Robotics and Automation (ICRA), pages 98–105. IEEE, 2022.
  19. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
  20. Where2act: From pixels to actions for articulated 3d objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6813–6823, 2021.
  21. Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations. arXiv preprint arXiv:2107.14483, 2021.
  22. Structure from action: Learning interactions for articulated object 3d structure discovery. arXiv preprint arXiv:2207.08997, 2022.
  23. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
  24. Learning agent-aware affordances for closed-loop interaction with articulated objects. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5916–5922. IEEE, 2023.
  25. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  26. Learning structured output representation using deep conditional generative models. Advances in neural information processing systems, 28, 2015.
  27. Doorgym: A scalable door opening environment and baseline agent. arXiv preprint arXiv:1908.01887, 2019.
  28. Learning semantic keypoint representations for door opening manipulation. IEEE Robotics and Automation Letters, 5(4):6980–6987, 2020.
  29. Adaafford: Learning to adapt manipulation affordance for 3d articulated objects via few-shot interactions. European conference on computer vision (ECCV 2022), 2022.
  30. Captra: Category-level pose tracking for rigid and articulated objects from point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13209–13218, 2021.
  31. VAT-mart: Learning visual action trajectory proposals for manipulating 3d ARTiculated objects. In International Conference on Learning Representations, 2022.
  32. Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020.
  33. Umpnet: Universal manipulation policy network for articulated objects. IEEE Robotics and Automation Letters, 2022.
  34. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
  35. robosuite: A modular simulation framework and benchmark for robot learning. arXiv preprint arXiv:2009.12293, 2020.
Citations (5)

Summary

  • The paper presents a universal door manipulation policy through a three-stage framework integrating handle grasping, manipulation, and door opening in a large-scale, diverse simulation environment.
  • The paper employs a segmentation-based PointNet++ and a conditional VAE to address complex mechanical dynamics and reduce error rates in real-time manipulation.
  • The paper demonstrates significant performance improvements over legacy methods, validated through extensive simulation and real-world experiments.

Learning Universal Door Manipulation Policies

"UniDoorManip: Learning Universal Door Manipulation Policy Over Large-scale and Diverse Door Manipulation Environments" (2403.02604) presents a structured framework for developing door manipulation policies applicable to a wide range of doors with varying designs, mechanisms, and geometries. The proposed methodology not only involves the creation of a robust simulation environment but also introduces a multi-stage manipulation framework critically tuned for efficient learning.

Large-scale Diverse Door Manipulation Environment

A distinctive feature of this research is the development of a door manipulation environment and dataset that encompass six door categories, from interior to refrigerator doors, comprising 328 bodies and 204 handles. This dataset architecture allows the creation of thousands of distinct door configurations, facilitating comprehensive policy training (Figure 1). Figure 1

Figure 1: Our pipeline for the framework. We disentangle the entire door manipulation process into three stages: handle grasping, handle manipulation, and door opening.

The environment simulates diverse manipulation mechanisms supported by a realistic simulation that leverages IsaacGym. This includes detailed mechanical properties such as latching forces and resilient forces applied to door joints, facilitating translation from simulation to real-world execution. The methodology employed ensures robust policy learning against various types of realistic occlusions and maneuvering constraints imposed by a robot with a mobile base, offering a significant improvement over legacy simulation environments.

Methods

The methodology segregates door manipulation into three stages: handle grasping, handle manipulation, and door opening (Figure 2). This segmentation aids in isolating the complexities of each phase, thereby simplifying learning and improving generalization across unseen doors with diverse geometries and mechanisms. Figure 2

Figure 2: Manipulation Sequence Guided by Our Universal Manipulation Policy.

Stage 1: Handle Grasping

The first stage employs point-level visual affordance to predict a score map indicating action feasibility, using a segmentation version of PointNet++. By sampling potential actions and evaluating them through a discriminator model, it ensures the selection of the optimal grasp pose.

Stage 2: Handle Manipulation

Leveraging a generative architecture, a conditional VAE structure is used to manage varied manipulation mechanisms. The model is trained with extensive data covering various handle types to ensure robust performance and reduced error rates in real-time applications.

Stage 3: Door Opening

Third-stage policies are trained in a closed-loop formulation, which monitors ongoing adjustments to reduce cumulative errors. This architecture supports a dynamic, real-time framework compared to the open-loop methodology which lacks the ability to adapt during execution.

Results and Evaluation

The empirical results highlight the efficacy of the framework in scenarios of diverse geometry and category, outperforming previous strategies such as GAPartNet and VAT-MART. The universal manipulation policy demonstrates high success rates across multiple tasks, significantly excelling in environments involving unseen categories and complex shapes (Figure 3). Figure 3

Figure 3: Comparison of ablations and our approach for various door joint angles. Average success rate metrics are shown for each angle.

A notable highlight is the integration capability, where conditioned training bridges independent policy stages into a seamless manipulation sequence. The research underscored that without such integration or state consideration, the effectiveness significantly drops, particularly in environments with complex dynamics and occlusions (Figure 4, Figure 5). Figure 4

Figure 4: Failure cases of ablated versions.

Figure 5

Figure 5: Real-World Experiments demonstrate the application of learned policies in practical settings.

Conclusion

This research advances universal door manipulation policy learning, presenting a novel, efficiently structured framework that accommodates diverse types, handles, and integration methods. Through extensive experimentation, both in simulation and real-world settings, the study validates its significant contributions to robotic manipulation domains. Future directions may involve further improving real-time adaptation mechanisms and expanding policy universality to other articulated objects beyond the scope of doors.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.