Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation (2403.13405v1)

Published 20 Mar 2024 in cs.CV and cs.AI

Abstract: Depth-based 3D hand pose estimation is an important but challenging research task in human-machine interaction community. Recently, dense regression methods have attracted increasing attention in 3D hand pose estimation task, which provide a low computational burden and high accuracy regression way by densely regressing hand joint offset maps. However, large-scale regression offset values are often affected by noise and outliers, leading to a significant drop in accuracy. To tackle this, we re-formulate 3D hand pose estimation as a dense ordinal regression problem and propose a novel Dense Ordinal Regression 3D Pose Network (DOR3D-Net). Specifically, we first decompose offset value regression into sub-tasks of binary classifications with ordinal constraints. Then, each binary classifier can predict the probability of a binary spatial relationship relative to joint, which is easier to train and yield much lower level of noise. The estimated hand joint positions are inferred by aggregating the ordinal regression results at local positions with a weighted sum. Furthermore, both joint regression loss and ordinal regression loss are used to train our DOR3D-Net in an end-to-end manner. Extensive experiments on public datasets (ICVL, MSRA, NYU and HANDS2017) show that our design provides significant improvements over SOTA methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1067–1076, 2019.
  2. Weakly-supervised domain adaptation via gan and mesh model for estimating 3d hand poses interacting objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6121–6131, 2020.
  3. Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing, 395:138–149, 2020.
  4. Handfoldingnet: A 3d hand pose estimation network using multiscale-feature guided folding of a 2d hand skeleton. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11260–11269, 2021.
  5. Recurrent 3d hand pose estimation using cascaded pose-guided 3d alignments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):932–945, 2022.
  6. Crossinfonet: Multi-task information sharing based hand pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9896–9905, 2019.
  7. A human-robot interaction interface for mobile and stationary robots based on real-time 3d human body and hand-finger pose estimation. In 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), pages 1–6. IEEE, 2016.
  8. Jgr-p2o: Joint graph reasoning based pixel-to-offset prediction network for 3d hand pose estimation from a single depth image. In European Conference on Computer Vision, pages 120–137. Springer, 2020.
  9. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2002–2011, 2018.
  10. First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 409–419, 2018.
  11. Hand pointnet: 3d hand pose estimation using point sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8417–8426, 2018.
  12. 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1991–2000, 2017.
  13. Point-to-point regression pointnet for 3d hand pose estimation. In Proceedings of the European conference on computer vision (ECCV), pages 475–491, 2018.
  14. Fast lifting for 3d hand pose estimation in ar/vr applications. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 106–110. IEEE, 2018.
  15. Order regularization on ordinal loss for head pose, age and gaze estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 1496–1504, 2021.
  16. An active learning paradigm for online audio-visual emotion recognition. IEEE Transactions on Affective Computing, 13(2):756–768, 2019.
  17. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4990–5000, 2020.
  18. A survey on 3d hand pose estimation: Cameras, methods, and datasets. Pattern Recognition, 93:251–272, 2019.
  19. Learning probabilistic ordinal embeddings for uncertainty-aware regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13896–13905, 2021.
  20. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1954–1963, 2021.
  21. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
  22. Pose estimation for augmented reality: a hands-on survey. IEEE transactions on visualization and computer graphics, 22(12):2633–2651, 2015.
  23. Towards real-time physical human-robot interaction using skeleton information and hand gestures. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–6. IEEE, 2018.
  24. V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In Proceedings of the IEEE conference on computer vision and pattern Recognition, pages 5079–5088, 2018.
  25. Ordinal regression with multiple output cnn for age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4920–4928, 2016.
  26. Deepprior++: Improving fast and accurate 3d hand pose estimation. In Proceedings of the IEEE international conference on computer vision Workshops, pages 585–594, 2017.
  27. Ordinal depth supervision for 3d human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7307–7316, 2018.
  28. Mining multi-view information: A strong self-supervised framework for depth-based 3d hand pose and mesh estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20555–20565, 2022.
  29. Srn: Stacked regression network for real-time 3d hand pose estimation. In BMVC, page 112, 2019.
  30. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5693–5703, 2019.
  31. Cascaded hand pose regression. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 824–832, 2015.
  32. Latent regression forest: Structured estimation of 3d articulated hand posture. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3786–3793, 2014.
  33. Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (ToG), 33(5):1–10, 2014.
  34. Dense 3d regression for hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5147–5156, 2018.
  35. A2j: Anchor-to-joint regression network for 3d articulated pose estimation from a single depth image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 793–802, 2019.
  36. Bighand2. 2m benchmark: Hand pose dataset and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4866–4874, 2017.
  37. Hemlets pose: Learning part-centric heatmap triplets for accurate 3d human pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2344–2353, 2019.

Summary

We haven't generated a summary for this paper yet.