Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HandGCAT: Occlusion-Robust 3D Hand Mesh Reconstruction from Monocular Images (2403.07912v1)

Published 27 Feb 2024 in cs.CV

Abstract: We propose a robust and accurate method for reconstructing 3D hand mesh from monocular images. This is a very challenging problem, as hands are often severely occluded by objects. Previous works often have disregarded 2D hand pose information, which contains hand prior knowledge that is strongly correlated with occluded regions. Thus, in this work, we propose a novel 3D hand mesh reconstruction network HandGCAT, that can fully exploit hand prior as compensation information to enhance occluded region features. Specifically, we designed the Knowledge-Guided Graph Convolution (KGC) module and the Cross-Attention Transformer (CAT) module. KGC extracts hand prior information from 2D hand pose by graph convolution. CAT fuses hand prior into occluded regions by considering their high correlation. Extensive experiments on popular datasets with challenging hand-object occlusions, such as HO3D v2, HO3D v3, and DexYCB demonstrate that our HandGCAT reaches state-of-the-art performance. The code is available at https://github.com/heartStrive/HandGCAT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. “Interacting two-hand 3d pose and shape reconstruction from single color image,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021.
  2. “I2uv-handnet: Image-to-uv prediction network for accurate and high-fidelity 3d hand mesh modeling,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021.
  3. “Disentangled representation learning for multimodal emotion recognition,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, p. 1642–1651.
  4. “Learning modality-specific and -agnostic representations for asynchronous multimodal language sequences,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, p. 1708–1717.
  5. “Emotion recognition for multiple context awareness,” in Proceedings of the European Conference on Computer Vision (ECCV), 2022, vol. 13697, pp. 144–162.
  6. “Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences,” Knowledge-Based Systems, vol. 265, pp. 110370, 2023.
  7. “Towards simultaneous segmentation of liver tumors and intrahepatic vessels via cross-attention mechanism,” arXiv preprint arXiv:2302.09785, 2023.
  8. “Model robustness meets data privacy: Adversarial robustness distillation without original data,” arXiv preprint arXiv:2302.11611, 2023.
  9. “Contextual and cross-modal interaction for multi-modal speech emotion recognition,” IEEE Signal Processing Letters, vol. 29, pp. 2093–2097, 2022.
  10. “Context de-confounded emotion recognition,” 2023.
  11. “Tsa-net: Tube self-attention network for action quality assessment,” in Proceedings of the 29th ACM International Conference on Multimedia (ACM MM), 2021, pp. 4902–4910.
  12. “Ca-spacenet: Counterfactual analysis for 6d pose estimation in space,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 10627–10634.
  13. “A survey of video-based action quality assessment,” in 2021 International Conference on Networking Systems of AI (INSAI), 2021, pp. 1–9.
  14. “Keypoint transformer: Solving joint identification in challenging hands and object interactions for accurate 3d pose estimation,” in CVPR, 2022.
  15. “Collaborative learning for hand and object reconstruction with attention-guided graph convolution,” in CVPR, 2022.
  16. “How robust is 3d human pose estimation to occlusion?,” arXiv:1808.09316, 2018.
  17. “3d human pose estimation using spatio-temporal networks with explicit occlusion training,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
  18. “Multi-scale structure-aware network for human pose estimation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  19. “Occlusion-aware siamese network for human pose estimation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  20. “Handoccnet: Occlusion-robust 3d hand mesh estimation network,” in CVPR, 2022.
  21. “Rmpe: Regional multi-person pose estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2017.
  22. “openrealtime multi-person 2d pose estimation using part affinity fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017.
  23. “Honnotate: A method for 3d annotation of hand and object poses,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  24. “Ho-3d_v3: Improving the accuracy of hand-object annotations of the ho-3d dataset,” arXiv:2107.00887, 2021.
  25. “DexYCB: A benchmark for capturing hand grasping of objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  26. “Embodied hands: Modeling and capturing hands and bodies together,” arXiv:2201.02610, 2022.
  27. “3d hand shape and pose estimation from a single rgb image,” in CVPR, 2019.
  28. “Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose,” in Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2020.
  29. “End-to-end human pose and mesh reconstruction with transformers,” in CVPR, 2021.
  30. “Mesh graphormer,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021.
  31. “Convolutional neural networks on graphs with fast localized spectral filtering,” Advances in neural information processing systems, 2016.
  32. “I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image,” in Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2020.
  33. “Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction,” in CVPR, 2020.
  34. “Honnotate: A method for 3d annotation of hand and object poses,” in CVPR, 2020.
  35. “Semi-supervised 3d hand-object poses estimation with interactions in time,” in CVPR, 2021.
  36. “Artiboost: Boosting articulated 3d hand-object pose estimation via online exploration and synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  37. “Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  38. “Attention is all you need,” Advances in neural information processing systems, 2017.
  39. “Stacked hourglass networks for human pose estimation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2016.
  40. “Weakly supervised 3d hand pose estimation via biomechanical constraints,” in Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2020.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com