Papers
Topics
Authors
Recent
2000 character limit reached

CAPT: Category-level Articulation Estimation from a Single Point Cloud Using Transformer (2402.17360v1)

Published 27 Feb 2024 in cs.CV, cs.AI, and cs.RO

Abstract: The ability to estimate joint parameters is essential for various applications in robotics and computer vision. In this paper, we propose CAPT: category-level articulation estimation from a point cloud using Transformer. CAPT uses an end-to-end transformer-based architecture for joint parameter and state estimation of articulated objects from a single point cloud. The proposed CAPT methods accurately estimate joint parameters and states for various articulated objects with high precision and robustness. The paper also introduces a motion loss approach, which improves articulation estimation performance by emphasizing the dynamic features of articulated objects. Additionally, the paper presents a double voting strategy to provide the framework with coarse-to-fine parameter estimation. Experimental results on several category datasets demonstrate that our methods outperform existing alternatives for articulation estimation. Our research provides a promising solution for applying Transformer-based architectures in articulated object analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. A. Jain, R. Lioutikov, C. Chuck, and S. Niekum, “ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 13 670–13 677.
  2. Z. Jiang, C.-C. Hsu, and Y. Zhu, “Ditto: Building Digital Twins of Articulated Objects From Interaction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5616–5626.
  3. X. Wang, B. Zhou, Y. Shi, X. Chen, Q. Zhao, and K. Xu, “Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8876–8884.
  4. X. Li, H. Wang, L. Yi, L. J. Guibas, A. L. Abbott, and S. Song, “Category-level articulated object pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3706–3715.
  5. K. Hausman, S. Niekum, S. Osentoski, and G. S. Sukhatme, “Active articulation model estimation through interactive perception,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), May 2015, pp. 3305–3312.
  6. J. Sturm, C. Stachniss, and W. Burgard, “A Probabilistic Framework for Learning Kinematic Models of Articulated Objects,” Journal of Artificial Intelligence Research, vol. 41, pp. 477–526, Aug. 2011.
  7. J. Bohg, K. Hausman, B. Sankaran, O. Brock, D. Kragic, S. Schaal, and G. Sukhatme, “Interactive Perception: Leveraging Action in Perception and Perception in Action,” IEEE Transactions on Robotics, vol. 33, no. 6, pp. 1273–1291, Dec. 2017.
  8. D. Katz, M. Kazemi, J. A. Bagnell, and A. Stentz, “Interactive segmentation, tracking, and kinematic modeling of unknown 3d articulated objects,” in 2013 IEEE International Conference on Robotics and Automation.   IEEE, 2013, pp. 5003–5010.
  9. R. S. Hartanto, R. Ishikawa, M. Roxas, and T. Oishi, “Hand-Motion-guided Articulation and Segmentation Estimation,” in 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).   IEEE, 2020, pp. 807–813.
  10. L. Yi, H. Huang, D. Liu, E. Kalogerakis, H. Su, and L. Guibas, “Deep Part Induction from Articulated Object Pairs,” ACM Transactions on Graphics, vol. 37, no. 6, pp. 1–15, Dec. 2018.
  11. E. Colleoni, S. Moccia, X. Du, E. De Momi, and D. Stoyanov, “Deep Learning Based Robotic Tool Detection and Articulation Estimation With Spatio-Temporal Layers,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2714–2721, July 2019.
  12. N. Heppert, T. Migimatsu, B. Yi, C. Chen, and J. Bohg, “Category-Independent Articulated Object Tracking with Factor Graphs,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2022, pp. 3800–3807.
  13. Y. Weng, H. Wang, Q. Zhou, Y. Qin, Y. Duan, Q. Fan, B. Chen, H. Su, and L. J. Guibas, “CAPTRA: CAtegory-Level Pose Tracking for Rigid and Articulated Objects From Point Clouds,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 209–13 218.
  14. A. Jain, S. Giguere, R. Lioutikov, and S. Niekum, “Distributional Depth-Based Estimation of Object Articulation Models,” in Proceedings of the 5th Conference on Robot Learning.   PMLR, Jan. 2022, pp. 1611–1621.
  15. W.-C. Tseng, H.-J. Liao, L. Yen-Chen, and M. Sun, “CLA-NeRF: Category-Level Articulated Neural Radiance Field,” in 2022 International Conference on Robotics and Automation (ICRA), May 2022, pp. 8454–8460.
  16. Z. Yan, R. Hu, X. Yan, L. Chen, O. van Kaick, H. Zhang, and H. Huang, “RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud,” ACM Transactions on Graphics, vol. 38, no. 6, pp. 1–15, Dec. 2019.
  17. L. Liu, H. Xue, W. Xu, H. Fu, and C. Lu, “Towards Real-World Category-level Articulation Pose Estimation,” IEEE Transactions on Image Processing, vol. 31, pp. 1072–1083, 2022.
  18. G. Liu, Q. Sun, H. Huang, C. Ma, Y. Guo, L. Yi, H. Huang, and R. Hu, “Semi-Weakly Supervised Object Kinematic Motion Prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 726–21 735.
  19. B. Eisner, H. Zhang, and D. Held, “Flowbot3d: Learning 3d articulation flow to manipulate articulated objects,” arXiv preprint arXiv:2205.04382, 2022.
  20. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems, vol. 30.   Curran Associates, Inc., 2017.
  21. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  22. M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,” Computational Visual Media, vol. 7, no. 2, pp. 187–199, 2021.
  23. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 259–16 268.
  24. J. Lin, M. Rickert, A. Perzylo, and A. Knoll, “PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sept. 2021, pp. 5657–5663.
  25. X. Xu, G. Geng, X. Cao, K. Li, and M. Zhou, “TDNet: Transformer-based network for point cloud denoising,” Applied Optics, vol. 61, no. 6, pp. C80–C88, Feb. 2022.
  26. Y. Wang and J. M. Solomon, “Deep Closest Point: Learning Representations for Point Cloud Registration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3523–3532.
  27. X. Yu, L. Tang, Y. Rao, T. Huang, J. Zhou, and J. Lu, “Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19 313–19 322.
  28. J. Park, Q.-Y. Zhou, and V. Koltun, “Colored Point Cloud Registration Revisited,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 143–152.
  29. W. Wu, Z. Qi, and L. Fuxin, “PointConv: Deep Convolutional Networks on 3D Point Clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9621–9630.
  30. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660.
  31. H. Cheng and K. C. Gupta, “An Historical Note on Finite Rotations,” Journal of Applied Mechanics, vol. 56, no. 1, pp. 139–145, Mar. 1989.
  32. F. Michel, A. Krull, E. Brachmann, M. Y. Yang, S. Gumhold, and C. Rother, “Pose estimation of kinematic chain instances via object coordinate regression.” in BMVC, 2015, pp. 181–1.
  33. F. Xiang, Y. Qin, K. Mo, Y. Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y. Yuan, H. Wang, L. Yi, A. X. Chang, L. J. Guibas, and H. Su, “SAPIEN: A SimulAted Part-Based Interactive ENvironment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 097–11 107.
  34. R. Hu, W. Li, O. Van Kaick, A. Shamir, H. Zhang, and H. Huang, “Learning to predict part mobility from a single static snapshot,” ACM Transactions on Graphics, vol. 36, no. 6, pp. 227:1–227:13, Nov. 2017.
  35. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
  36. J. Guo, L. Fu, M. Jia, K. Wang, and S. Liu, “Fast and Robust Bin-picking System for Densely Piled Industrial Objects,” in 2020 Chinese Automation Congress (CAC), Nov. 2020, pp. 2845–2850.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.