Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera (2312.14157v1)

Published 21 Dec 2023 in cs.CV

Abstract: 3D hand tracking from a monocular video is a very challenging problem due to hand interactions, occlusions, left-right hand ambiguity, and fast motion. Most existing methods rely on RGB inputs, which have severe limitations under low-light conditions and suffer from motion blur. In contrast, event cameras capture local brightness changes instead of full image frames and do not suffer from the described effects. Unfortunately, existing image-based techniques cannot be directly applied to events due to significant differences in the data modalities. In response to these challenges, this paper introduces the first framework for 3D tracking of two fast-moving and interacting hands from a single monocular event camera. Our approach tackles the left-right hand ambiguity with a novel semi-supervised feature-wise attention mechanism and integrates an intersection loss to fix hand collisions. To facilitate advances in this research domain, we release a new synthetic large-scale dataset of two interacting hands, Ev2Hands-S, and a new real benchmark with real event streams and ground-truth 3D annotations, Ev2Hands-R. Our approach outperforms existing methods in terms of the 3D reconstruction accuracy and generalises to real data under severe light conditions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. A systematic literature review on vision based gesture recognition techniques. Multimedia Tools and Applications, 77:28121–28184, 2018.
  2. Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In Computer Vision and Pattern Recognition (CVPR), 2019.
  3. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014.
  4. 3d hand shape and pose from images in the wild. In Computer Vision and Pattern Recognition (CVPR), 2019.
  5. 1€ filter: a simple speed-based low-pass filter for noisy input in interactive systems. In SIGCHI Conference on Human Factors in Computing Systems, pages 2527–2530, 2012.
  6. Efficient human pose estimation via 3d event point cloud. In International Conference on 3D Vision (3DV), 2022.
  7. A review of hand gesture and sign language recognition techniques. International Journal of Machine Learning and Cybernetics, 10:131–153, 2019.
  8. Lisa: Learning implicit shape and appearance of hands. In Computer Vision and Pattern Recognition (CVPR), 2022.
  9. A survey of glove-based systems and their applications. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(4):461–482, 2008.
  10. Learning to disambiguate strongly interacting hands via probabilistic per-pixel part segmentation. In International Conference on 3D Vision (3DV), 2021.
  11. First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In Computer Vision and Pattern Recognition (CVPR), 2018.
  12. 3d hand shape and pose estimation from a single rgb image. In Computer Vision and Pattern Recognition (CVPR), 2019.
  13. Video to events: Recycling video datasets for event cameras. In Computer Vision and Pattern Recognition (CVPR), 2020.
  14. Keypoint transformer: Solving joint identification in challenging hands and object interactions for accurate 3d pose estimation. In Computer Vision and Pattern Recognition (CVPR), pages 11090–11100, 2022.
  15. Resolving 3d human pose ambiguities with 3d scene constraints. In International Conference on Computer Vision (ICCV), pages 2282–2292, 2019.
  16. iniVation. Davis 346. https://inivation.com/wp-content/uploads/2019/08/DAVIS346.pdf, 2019.
  17. Hand pose estimation via latent 2.5d heatmap regression. In European Conference on Computer Vision (ECCV), 2018.
  18. Tero Karras. Maximizing parallelism in the construction of bvhs, octrees, and k-d trees. In Proceedings of the Fourth ACM SIGGRAPH / Eurographics Conference on High-Performance Graphics, pages 33–37. Eurographics Association, 2012.
  19. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  20. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In Computer Vision and Pattern Recognition (CVPR), 2020.
  21. Applications of hand gestures recognition in industrial robots: a review. In Eleventh International Conference on Machine Vision (ICMV 2018), pages 455–465. SPIE, 2019.
  22. Interacting attention graph for single image two-hand reconstruction. In Computer Vision and Pattern Recognition (CVPR), pages 2761–2770, 2022.
  23. Point-to-pose voting based hand pose estimation using residual permutation equivariant layer. In Computer Vision and Pattern Recognition (CVPR), pages 11927–11936, 2019.
  24. Handvoxnet: Deep voxel-based network for 3d hand shape and pose estimation from a single depth map. In Computer Vision and Pattern Recognition (CVPR), pages 7113–7122, 2020.
  25. Event-based vision meets deep learning on steering prediction for self-driving cars. In Computer Vision and Pattern Recognition (CVPR), pages 5419–5427, 2018.
  26. Matthew Matl. Pyrender. https://github.com/mmatl/pyrender, 2019.
  27. Gyeongsik Moon. Bringing inputs to shared domains for 3d interacting hands recovery in the wild. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  28. Deephandmesh: A weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling. In European Conference on Computer Vision (ECCV), 2020a.
  29. Interhand2.6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In European Conference on Computer Vision (ECCV), 2020b.
  30. Neuralannot: Neural annotator for 3d human mesh training sets. In Computer Vision and Pattern Recognition (CVPR), 2022.
  31. Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In International Conference on Computer Vision (ICCV), 2017.
  32. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Computer Vision and Pattern Recognition (CVPR), 2018.
  33. Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Transactions on Graphics (ToG), 38(4):1–13, 2019.
  34. Differentiable event stream simulator for non-rigid 3d tracking. In CVPR Workshop on Event-based Vision, 2021.
  35. Federico Paredes-Vallés and Guido CHE de Croon. Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy. In Computer Vision and Pattern Recognition (CVPR), pages 3446–3455, 2021.
  36. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 2019.
  37. Physically based rendering: From theory to implementation. Morgan Kaufmann, 2016.
  38. E2 (go) motion: Motion augmented event stream for egocentric action recognition. In Computer Vision and Pattern Recognition (CVPR), 2022.
  39. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Computer Vision and Pattern Recognition (CVPR), 2017a.
  40. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv:1706.02413, 2017b.
  41. HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. In European Conference on Computer Vision (ECCV). Springer, 2020.
  42. Vision based hand gesture recognition for human computer interaction: a survey. Artificial intelligence review, 43:1–54, 2015.
  43. Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization. In The British Machine Vision Conference (BMVC). IEEE, 2017.
  44. High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(6):1964–1980, 2019.
  45. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
  46. Eventhands: Real-time neural 3d hand pose estimation from an event stream. In International Conference on Computer Vision (ICCV), 2021.
  47. Eventnerf: Neural radiance fields from a single colour event camera. In Computer Vision and Pattern Recognition (CVPR), 2023.
  48. Event transformer. a sparse-aware solution for efficient event data processing. In Computer Vision and Pattern Recognition (CVPR), 2022.
  49. Continuous-time intensity estimation using event cameras. In Asian Conference on Computer Vision (ACCV), pages 308–324. Springer, 2018.
  50. Decaf: Monocular deformation capture for face and hand interactions. ACM Transactions on Graphics (TOG), 42(6), 2023.
  51. Fast and robust hand tracking using detection-guided optimization. In Computer Vision and Pattern Recognition (CVPR), pages 3213–3221, 2015.
  52. The gesture pendant: a self-illuminating, wearable, infrared computer vision system for home automation control and medical monitoring. In Digest of Papers. Fourth International Symposium on Wearable Computers, pages 87–94, 2000.
  53. Ess: Learning event-based semantic segmentation from still images. In European Conference on Computer Vision (ECCV), pages 341–357. Springer, 2022.
  54. Articulated distance fields for ultra-fast tracking of hands interacting. ACM Transactions on Graphics (TOG), 36(6):1–12, 2017.
  55. The Captury. http://www.thecaptury.com/, 2023.
  56. State of the art in dense monocular non-rigid 3d reconstruction. Computer Graphics Forum (Eurographics State of the Art Reports), 2023.
  57. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision (IJCV), 118(2):172–193, 2016.
  58. Attention is all you need. In International Conference on Neural Information Processing Systems (NeurIPS), 2017.
  59. Rgb2hands: Real-time tracking of 3d hand interactions from monocular rgb video. ACM Transactions on Graphics (ToG), 39(6), 2020.
  60. Handflow: Quantifying view-dependent 3d ambiguity in two-hand reconstruction with normalizing flow. In Vision, Modeling, and Visualization (VMV), 2022.
  61. Space-time event clouds for gesture recognition: From rgb cameras to event cameras. In Winter Conference on Applications of Computer Vision (WACV), pages 1826–1835, 2019.
  62. Eventcap: Monocular 3d capture of high-speed human motions using an event camera. In Computer Vision and Pattern Recognition (CVPR), 2020.
  63. Event-based non-rigid reconstruction from contours. In The British Machine Vision Conference (BMVC), 2022.
  64. Bighand2.2m benchmark: Hand pose dataset and state of the art analysis. In Computer Vision and Pattern Recognition (CVPR), pages 4866–4874, 2017.
  65. Depth-based 3d hand pose estimation: From current achievements to future goals. In Computer Vision and Pattern Recognition (CVPR), 2018.
  66. Interacting two-hand 3d pose and shape reconstruction from single color image. In International Conference on Computer Vision (ICCV), pages 11354–11363, 2021.
  67. End-to-end hand mesh recovery from a monocular rgb image. In International Conference on Computer Vision (ICCV), 2019.
  68. Monocular real-time hand shape and motion capture using multi-modal data. In Computer Vision and Pattern Recognition (CVPR), 2020.
  69. Learning to estimate 3d hand pose from single rgb images. In International Conference on Computer Vision (ICCV), pages 4903–4911, 2017.
  70. Freihand: Dataset for markerless capture of hand pose and shape from single rgb images. In International Conference on Computer Vision (ICCV), 2019.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com