Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Articulated Object Manipulation with Coarse-to-fine Affordance for Mitigating the Effect of Point Cloud Noise (2402.18699v2)

Published 28 Feb 2024 in cs.RO

Abstract: 3D articulated objects are inherently challenging for manipulation due to the varied geometries and intricate functionalities associated with articulated objects.Point-level affordance, which predicts the per-point actionable score and thus proposes the best point to interact with, has demonstrated excellent performance and generalization capabilities in articulated object manipulation. However, a significant challenge remains: while previous works use perfect point cloud generated in simulation, the models cannot directly apply to the noisy point cloud in the real-world. To tackle this challenge, we leverage the property of real-world scanned point cloud that, the point cloud becomes less noisy when the camera is closer to the object. Therefore, we propose a novel coarse-to-fine affordance learning pipeline to mitigate the effect of point cloud noise in two stages. In the first stage, we learn the affordance on the noisy far point cloud which includes the whole object to propose the approximated place to manipulate. Then, we move the camera in front of the approximated place, scan a less noisy point cloud containing precise local geometries for manipulation, and learn affordance on such point cloud to propose fine-grained final actions. The proposed method is thoroughly evaluated both using large-scale simulated noisy point clouds mimicking real-world scans, and in the real world scenarios, with superiority over existing methods, demonstrating the effectiveness in tackling the noisy real-world point cloud problem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. S. Deng, X. Xu, C. Wu, K. Chen, and K. Jia, “3d affordancenet: A benchmark for visual object affordance understanding,” in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1778–1787.
  2. K. M. Varadarajan and M. Vincze, “Afnet: The affordance network,” in Asian conference on computer vision.   Springer, 2012, pp. 512–523.
  3. J. Borja-Diaz, O. Mees, G. Kalweit, L. Hermann, J. Boedecker, and W. Burgard, “Affordance learning from play for sample-efficient policy learning,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 6372–6378.
  4. Y. Ju, K. Hu, G. Zhang, G. Zhang, M. Jiang, and H. Xu, “Robo-abc: Affordance generalization beyond categories via semantic correspondence for robot manipulation,” arXiv preprint arXiv:2401.07487, 2024.
  5. K. Mo, L. J. Guibas, M. Mukadam, A. Gupta, and S. Tulsiani, “Where2act: From pixels to actions for articulated 3d objects,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 6813–6823.
  6. R. Wu, Y. Zhao, K. Mo, Z. Guo, Y. Wang, T. Wu, Q. Fan, X. Chen, L. Guibas, and H. Dong, “VAT-mart: Learning visual action trajectory proposals for manipulating 3d ARTiculated objects,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=iEx3PiooLy
  7. Y. Wang, R. Wu, K. Mo, J. Ke, Q. Fan, L. Guibas, and H. Dong, “AdaAfford: Learning to adapt manipulation affordance for 3d articulated objects via few-shot interactions,” European conference on computer vision, 2022.
  8. Y. Zhao, R. Wu, Z. Chen, Y. Zhang, Q. Fan, K. Mo, and H. Dong, “Dualafford: Learning collaborative visual affordance for dual-gripper object manipulation,” International Conference on Learning Representations (ICLR), 2023.
  9. R. Wu, K. Cheng, Y. Zhao, C. Ning, G. Zhan, and H. Dong, “Learning environment-aware affordance for 3d articulated object manipulation under occlusions,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Online]. Available: https://openreview.net/forum?id=Re2NHYoZ5l
  10. R. Wu, C. Ning, and H. Dong, “Learning foresightful dense visual affordance for deformable object manipulation,” in IEEE International Conference on Computer Vision (ICCV), 2023.
  11. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv preprint arXiv:1706.02413, 2017.
  12. U. M. Nunes and Y. Demiris, “Online unsupervised learning of the 3d kinematic structure of arbitrary rigid bodies,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3809–3817.
  13. X. Li, H. Wang, L. Yi, L. Guibas, A. L. Abbott, and S. Song, “Category-level articulated object pose estimation,” arXiv preprint arXiv:1912.11913, 2019.
  14. J. Xu, S. Song, and M. Ciocarlie, “Tandem: Learning joint exploration and decision making with tactile sensors,” IEEE Robotics and Automation Letters, 2022.
  15. Y. Du, R. Wu, Y. Shen, and H. Dong, “Learning part motion of articulated objects using spatially continuous neural implicit representations,” in British Machine Vision Conference (BMVC), November 2023.
  16. Z. Xu, H. Zhanpeng, and S. Song, “Umpnet: Universal manipulation policy network for articulated objects,” IEEE Robotics and Automation Letters, 2022.
  17. S. Song, A. Zeng, J. Lee, and T. Funkhouser, “Grasping in the wild: Learning 6dof closed-loop grasping from low-cost demonstrations,” Robotics and Automation Letters, 2020.
  18. I. Akinola, J. Xu, S. Song, and P. K. Allen, “Dynamic grasping with reachability and motion awareness,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 9422–9429.
  19. H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2020, pp. 11 444–11 453.
  20. M. Kokic, D. Kragic, and J. Bohg, “Learning task-oriented grasping from human activity datasets,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3352–3359, 2020.
  21. I. Lenz, H. Lee, and A. Saxena, “Deep learning for detecting robotic grasps,” The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 705–724, 2015.
  22. J. Redmon and A. Angelova, “Real-time grasp detection using convolutional neural networks,” in 2015 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2015, pp. 1316–1322.
  23. Y. Qin, R. Chen, H. Zhu, M. Song, J. Xu, and H. Su, “S4g: Amodal single-view single-shot se (3) grasp detection in cluttered scenes,” in Conference on robot learning.   PMLR, 2020, pp. 53–65.
  24. K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige et al., “Using simulation and domain adaptation to improve efficiency of deep robotic grasping,” in 2018 IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 4243–4250.
  25. B. Eisner, H. Zhang, and D. Held, “Flowbot3d: Learning 3d articulation flow to manipulate articulated objects,” arXiv preprint arXiv:2205.04382, 2022.
  26. G. Schiavi, P. Wulkop, G. Rizzi, L. Ott, R. Siegwart, and J. J. Chung, “Learning agent-aware affordances for closed-loop interaction with articulated objects,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 5916–5922.
  27. H. Luo, W. Zhai, J. Zhang, Y. Cao, and D. Tao, “Leverage interactive affinity for affordance learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6809–6819.
  28. J. J. Gibson, “The theory of affordances,” Hilldale, USA, vol. 1, no. 2, pp. 67–82, 1977.
  29. T. Nagarajan, C. Feichtenhofer, and K. Grauman, “Grounded human-object interaction hotspots from video,” in ICCV, 2019.
  30. T. Nagarajan and K. Grauman, “Learning affordance landscapes for interaction exploration in 3d environments,” in NeurIPS, 2020.
  31. P. Mandikal and K. Grauman, “Learning dexterous grasping with object-centric visual affordances,” in IEEE International Conference on Robotics and Automation (ICRA), 2021.
  32. E. Corona, A. Pumarola, G. Alenya, F. Moreno-Noguer, and G. Rogez, “Ganhand: Predicting human grasp affordances in multi-object scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5031–5041.
  33. K. M. Varadarajan and M. Vincze, “Afrob: The affordance network ontology for robots,” in 2012 IEEE/RSJ international conference on intelligent robots and systems.   IEEE, 2012, pp. 1343–1350.
  34. Y. Geng, B. An, H. Geng, Y. Chen, Y. Yang, and H. Dong, “End-to-end affordance learning for robotic manipulation,” arXiv preprint arXiv:2209.12941, 2022.
  35. C. Ning, R. Wu, H. Lu, K. Mo, and H. Dong, “Where2explore: Few-shot affordance learning for unseen novel categories of articulated objects,” arXiv preprint arXiv:2309.07473, 2023.
  36. K. Mo, L. J. Guibas, M. Mukadam, A. Gupta, and S. Tulsiani, “Where2act: From pixels to actions for articulated 3d objects,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6813–6823.
  37. F. Xiang, Y. Qin, K. Mo, Y. Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y. Yuan, H. Wang, L. Yi, A. X. Chang, L. J. Guibas, and H. Su, “SAPIEN: A simulated part-based interactive environment,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  38. K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  39. A. L. Xiaoshuai Zhang, Fanbo Xiang, “Kuafu: A real-time ray tracing renderer,” https://github.com/jetd1/kuafu, 2022, accessed: January 18, 2023.
  40. Z. Xu, Z. He, and S. Song, “Universal manipulation policy network for articulated objects,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2447–2454, 2022.
Citations (11)

Summary

We haven't generated a summary for this paper yet.