6-DoF Grasp Planning using Fast 3D Reconstruction and Grasp Quality CNN (2009.08618v2)
Abstract: Recent consumer demand for home robots has accelerated performance of robotic grasping. However, a key component of the perception pipeline, the depth camera, is still expensive and inaccessible to most consumers. In addition, grasp planning has significantly improved recently, by leveraging large datasets and cloud robotics, and by limiting the state and action space to top-down grasps with 4 degrees of freedom (DoF). By leveraging multi-view geometry of the object using inexpensive equipment such as off-the-shelf RGB cameras and state-of-the-art algorithms such as Learn Stereo Machine (LSM\cite{kar2017learning}), the robot is able to generate more robust grasps from different angles with 6-DoF. In this paper, we present a modification of LSM to graspable objects, evaluate the grasps, and develop a 6-DoF grasp planner based on Grasp-Quality CNN (GQ-CNN\cite{mahler2017dex}) that exploits multiple camera views to plan a robust grasp, even in the absence of a possible top-down grasp.
- 6-dof grasp planning using fast 3d reconstruction and grasp quality cnn. arXiv preprint arXiv:2009.08618, 2020.
- Avplug: Approach vector planning for unicontact grasping amid clutter. In 2021 IEEE 17th international conference on automation science and engineering (CASE), pages 1140–1147. IEEE, 2021.
- Orienting novel 3d objects using self-supervised learning of rotation transforms. In 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), pages 1453–1460. IEEE, 2020.
- Flowbot3d: Learning 3d articulation flow to manipulate articulated objects. arXiv preprint arXiv:2205.04382, 2022.
- Art/atk: A research platform for assessing and mitigating the sim-to-real gap in robotics and autonomous vehicle engineering. arXiv preprint arXiv:2211.04886, 2022.
- Dex-net ar: Distributed deep grasp planning using an augmented reality application and a smartphone camera. In IEEE International Conference on Robotics and Automation (ICRA), 2020.
- Multi-model 3d registration: Finding multiple moving objects in cluttered point clouds. arXiv preprint arXiv:2402.10865, 2024.
- Learning a multi-view stereo machine. In Advances in neural information processing systems, pages 365–376, 2017.
- Planar robot casting with real2sim2real self-supervised learning. arXiv preprint arXiv:2111.04814, 2021.
- Real2sim2real: Self-supervised learning of physical single-step dynamic actions for planar robot casting. In 2022 International Conference on Robotics and Automation (ICRA), pages 8282–8289. IEEE, 2022.
- Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312, 2017.
- Dex-net 1.0: A cloud-based network of 3d objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In 2016 IEEE international conference on robotics and automation (ICRA), pages 1957–1964. IEEE, 2016.
- Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv preprint arXiv:1804.05172, 2018.
- 6-dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE International Conference on Computer Vision, pages 2901–2910, 2019.
- Tax-pose: Task-specific cross-pose estimation for robot manipulation. In Conference on Robot Learning, pages 1783–1792. PMLR, 2023.
- Diffclip: Leveraging stable diffusion for language grounded 3d classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3596–3605, 2024.
- Personalization of end-to-end speech recognition on mobile devices for named entities. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 23–30. IEEE, 2019.
- Dex-net mm: Deep grasping for surface decluttering with a low-precision mobile manipulator. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), pages 1373–1379. IEEE, 2019.
- Apla: Additional perturbation for latent noise with adversarial training enables consistency. arXiv preprint arXiv:2308.12605, 2023.
- Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), pages 767–783, 2018.
- Haolun Zhang. Health diagnosis based on analysis of data captured by wearable technology devices. International Journal of Advanced Science and Technology, 95:89–96, 2016.
- Flowbot++: Learning generalized articulated objects manipulation via articulation projection. arXiv preprint arXiv:2306.12893, 2023.
- Dex-net ar: Distributed deep grasp planning using a commodity cellphone and augmented reality app. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 552–558. IEEE, 2020.
- Robots of the lost arc: Self-supervised learning to dynamically manipulate fixed-endpoint cables. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4560–4567. IEEE, 2021.