Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation (2310.03478v2)

Published 5 Oct 2023 in cs.RO

Abstract: Robotic manipulation requires accurate perception of the environment, which poses a significant challenge due to its inherent complexity and constantly changing nature. In this context, RGB image and point-cloud observations are two commonly used modalities in visual-based robotic manipulation, but each of these modalities have their own limitations. Commercial point-cloud observations often suffer from issues like sparse sampling and noisy output due to the limits of the emission-reception imaging principle. On the other hand, RGB images, while rich in texture information, lack essential depth and 3D information crucial for robotic manipulation. To mitigate these challenges, we propose an image-only robotic manipulation framework that leverages an eye-on-hand monocular camera installed on the robot's parallel gripper. By moving with the robot gripper, this camera gains the ability to actively perceive object from multiple perspectives during the manipulation process. This enables the estimation of 6D object poses, which can be utilized for manipulation. While, obtaining images from more and diverse viewpoints typically improves pose estimation, it also increases the manipulation time. To address this trade-off, we employ a reinforcement learning policy to synchronize the manipulation strategy with active perception, achieving a balance between 6D pose accuracy and manipulation efficiency. Our experimental results in both simulated and real-world environments showcase the state-of-the-art effectiveness of our approach. %, which, to the best of our knowledge, is the first to achieve robust real-world robotic manipulation through active pose estimation. We believe that our method will inspire further research on real-world-oriented robotic manipulation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. A survey of exploration methods in reinforcement learning. arXiv preprint arXiv:2109.00157, 2021.
  2. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  3. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021.
  4. Kai Chen and Qi Dou. Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2773–2782, 2021.
  5. Stereopose: Category-level 6d transparent object pose estimation from stereo images via back-view nocs. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2855–2861. IEEE, 2023.
  6. Graspnerf: Multiview-based 6-dof grasp detection for transparent and specular objects using generalizable nerf. arXiv preprint arXiv:2210.06575, 2022.
  7. Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  8. Flowbot3d: Learning 3d articulation flow to manipulate articulated objects. In Robotics: Science and Systems (RSS), 2022.
  9. Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7081–7091, 2023.
  10. End-to-end affordance learning for robotic manipulation. In International Conference on Robotics and Automation (ICRA), 2023.
  11. Maniskill2: A unified benchmark for generalizable manipulation skills. arXiv preprint arXiv:2302.04659, 2023.
  12. An overview of depth cameras and range scanners based on time-of-flight technologies. Machine vision and applications, 27(7):1005–1020, 2016.
  13. Dex-nerf: Using a neural radiance field to grasp transparent objects. arXiv preprint arXiv:2110.14217, 2021.
  14. Look closer: Bridging egocentric and third-person views with transformers for robotic manipulation. IEEE Robotics and Automation Letters, 7(2):3046–3053, 2022.
  15. 3d depth cameras in vision: Benefits and limitations of the hardware: With an emphasis on the first-and second-generation kinect models. Computer vision and machine learning with RGB-D sensors, pages 3–26, 2014.
  16. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  17. Exploration in deep reinforcement learning: A survey. Information Fusion, 2022.
  18. Prior-free category-level pose estimation with implicit space transformation. arXiv preprint arXiv:2303.13479, 2023.
  19. Sagci-system: Towards sample-efficient, generalizable, compositional, and incremental robot learning. In 2022 International Conference on Robotics and Automation (ICRA), pages 98–105. IEEE, 2022.
  20. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  21. Where2act: From pixels to actions for articulated 3d objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6813–6823, 2021.
  22. PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  23. Category-level 6d object pose recovery in depth images. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
  24. Clear grasp: 3d shape estimation of transparent objects for manipulation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 3634–3642. IEEE, 2020.
  25. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  26. Shape prior deformation for categorical 6d object pose and size estimation. In European Conference on Computer Vision, pages 530–546. Springer, 2020.
  27. Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2642–2651, 2019.
  28. Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4807–4814. IEEE, 2021.
  29. Grasp proposal networks: An end-to-end solution for visual learning of robotic grasps. Advances in Neural Information Processing Systems, 33:13174–13184, 2020.
  30. SAPIEN: A simulated part-based interactive environment. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  31. Survey of 3d modeling using depth cameras. Virtual Reality & Intelligent Hardware, 1(5):483–499, 2019.
  32. Universal manipulation policy network for articulated objects. IEEE Robotics and Automation Letters, 7(2):2447–2454, 2022.
  33. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018.
  34. Mastering visual continuous control: Improved data-augmented reinforcement learning. In International Conference on Learning Representations (ICLR), 2022.
  35. Rbp-pose: Residual bounding box projection for category-level pose estimation. In European Conference on Computer Vision, pages 655–672. Springer, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Boshi An (6 papers)
  2. Yiran Geng (14 papers)
  3. Kai Chen (512 papers)
  4. Xiaoqi Li (77 papers)
  5. Qi Dou (163 papers)
  6. Hao Dong (175 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com