Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation (2403.16400v3)

Published 25 Mar 2024 in cs.CV and cs.RO

Abstract: In medical and industrial domains, providing guidance for assembly processes can be critical to ensure efficiency and safety. Errors in assembly can lead to significant consequences such as extended surgery times and prolonged manufacturing or maintenance times in industry. Assembly scenarios can benefit from in-situ augmented reality visualization, i.e., augmentations in close proximity to the target object, to provide guidance, reduce assembly times, and minimize errors. In order to enable in-situ visualization, 6D pose estimation can be leveraged to identify the correct location for an augmentation. Existing 6D pose estimation techniques primarily focus on individual objects and static captures. However, assembly scenarios have various dynamics, including occlusion during assembly and dynamics in the appearance of assembly objects. Existing work focus either on object detection combined with state detection, or focus purely on the pose estimation. To address the challenges of 6D pose estimation in combination with assembly state detection, our approach ASDF builds upon the strengths of YOLOv8, a real-time capable object detection framework. We extend this framework, refine the object pose, and fuse pose knowledge with network-detected pose information. Utilizing our late fusion in our Pose2State module results in refined 6D pose estimation and assembly state detection. By combining both pose and state information, our Pose2State module predicts the final assembly state with precision. The evaluation of our ASDF dataset shows that our Pose2State module leads to an improved assembly state detection and that the improvement of the assembly state further leads to a more robust 6D pose estimation. Moreover, on the GBOT dataset, we outperform the pure deep learning-based network and even outperform the hybrid and pure tracking-based approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. R. Alghonaim and E. Johns. Benchmarking domain randomisation for visual sim-to-real transfer. In IEEE International Conference on Robotics and Automation (ICRA), 2021.
  2. YOLOPose: Transformer-based multi-object 6d pose estimation using keypoint regression. In Intelligent Autonomous Systems 17: Proceedings of the 17th International Conference IAS-17, pp. 392–406. Springer, 2023.
  3. Unity perception: Generate synthetic data for computer vision, 2021.
  4. E. Bottani and G. Vignali. Augmented reality technology in the manufacturing industry: A review of the last decade. IISE Transactions, 51(3):284–310, 2019. doi: 10 . 1080/24725854 . 2018 . 1493244
  5. DexYCB: A benchmark for capturing hand grasping of objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9044–9053, 2021.
  6. Deep learning in surgical workflow analysis: a review of phase and step recognition. IEEE Journal of Biomedical and Health Informatics, 2023.
  7. BlenderProc: Reducing the reality gap with photorealistic rendering. In Robotics: Science and Systems (RSS) Workshops, 2020.
  8. Augmented reality-based guidance in product assembly and maintenance/repair perspective: A state of the art review on challenges and opportunities. Expert Systems with Applications, 2023.
  9. DuploTrack: a real-time system for authoring and guiding duplo block assembly. In Proceedings of the 25th annual ACM symposium on User interface software and technology, UIST ’12, pp. 389–402. Association for Computing Machinery, 2012. doi: 10 . 1145/2380116 . 2380167
  10. Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  11. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, 2012, Revised Selected Papers, Part I 11, pp. 548–562. Springer, 2013.
  12. T-LESS: An RGB-d dataset for 6d pose estimation of texture-less objects. IEEE Winter Conference on Applications of Computer Vision (WACV), 2017.
  13. On evaluation of 6d object pose estimation. vol. 9915, pp. 606–619. Springer International Publishing, 2016. doi: 10 . 1007/978-3-319-49409-8_52
  14. BOP challenge 2020 on 6d object localization. European Conference on Computer Vision Workshops (ECCVW), 2020.
  15. PoET: Pose estimation transformer for single-view, multi-object 6d pose estimation. In 6th Annual Conference on Robot Learning (CoRL 2022), 2022.
  16. HouseCat6d – a large-scale multi-modal category level 6d object pose dataset with household objects in realistic scenarios, 2023.
  17. HomebrewedDB: RGB-d dataset for 6d pose estimation of 3d objects, 2019.
  18. ARTFM: Augmented reality visualization of tool functionality manuals in operating rooms. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 736–737. IEEE, 2022.
  19. GBOT: Graph-based 3d object tracking for augmented reality-assisted assembly guidance.
  20. Deepim: Deep iterative matching for 6d pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 683–698, 2018.
  21. For a more comprehensive evaluation of 6dof object pose tracking. 2023.
  22. Tga: Two-level group attention for assembly state detection. In 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 258–263. IEEE, 2020.
  23. A state validation system for augmented reality based maintenance procedures. Applied Sciences, 9(10):2115, 2019. doi: 10 . 3390/app9102115
  24. Equipment assembly recognition for augmented reality guidance. pp. 109–118, 2024.
  25. Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7668–7677, 2019.
  26. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4561–4570, 2019.
  27. W. Qiu and A. Yuille. UnrealCV: Connecting computer vision to unreal engine. In European Conference on Computer Vision Workshops (ECCVW), pp. 909–916, 2016.
  28. 6dof object tracking based on 3d scans for augmented reality remote live support. Computers, 7(1):6, 2018.
  29. Indoor synthetic data generation: A systematic review. Computer Vision and Image Understanding, p. 103907, 2024. doi: 10 . 1016/j . cviu . 2023 . 103907
  30. Industreal: A dataset for procedure step recognition handling execution errors in egocentric videos in an industrial-like setting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4365–4374, 2024.
  31. State-aware configuration detection for augmented reality step-by-step tutorials. In 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 157–166. IEEE, 2023.
  32. Fusing visual appearance and geometry for multi-modality 6dof object tracking, 2023. Publication Title: arXiv preprint arXiv:2302.11458.
  33. SRT3d: A sparse region-based 3d object tracking approach for the real world. International Journal of Computer Vision, 130(4):1008–1030, 2022.
  34. A multi-body tracking framework–from rigid objects to kinematic structures, 2022. Publication Title: arXiv preprint arXiv:2208.01502.
  35. Iterative corresponding geometry: Fusing region and depth for highly efficient 3d tracking of textureless objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  36. IKEA object state dataset: A 6dof object pose estimation dataset and benchmark for multi-state assembly objects, 2021. Publication Title: arXiv preprint arXiv:2111.08614.
  37. Deep multi-state object pose estimation for augmented reality assembly. In 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 222–227, 2019. doi: 10 . 1109/ISMAR-Adjunct . 2019 . 00-42
  38. Real-time seamless single shot 6d object pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  39. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, 2017. doi: 10 . 1109/IROS . 2017 . 8202133
  40. Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018.
  41. Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16611–16621, 2021.
  42. Normalized object coordinate space for category-level 6d object pose and size estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  43. PhoCaL: A multi-modal dataset for category-level object pose estimation with photometrically challenging objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21222–21231, 2022.
  44. IKEA-manual: Seeing shape assembly step by step. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, eds., Advances in Neural Information Processing Systems, vol. 35, pp. 28428–28440. Curran Associates, Inc., 2022.
  45. Augmented reality instruction for object assembly based on markerless tracking. In Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D ’16, pp. 95–102. Association for Computing Machinery, 2016. doi: 10 . 1145/2856400 . 2856416
  46. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes, 2017. Publication Title: arXiv preprint arXiv:1711.00199.
  47. Authoring of a mixed reality assembly instructor for hierarchical structures. In The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings., pp. 237–246, 2003. doi: 10 . 1109/ISMAR . 2003 . 1240707
  48. B. Zhou and S. Güven. Fine-grained visual recognition in mobile augmented reality for technical support. IEEE Transactions on Visualization and Computer Graphics, 26(12):3514–3523, 2020. doi: 10 . 1109/TVCG . 2020 . 3023635
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com