Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EvHandPose: Event-based 3D Hand Pose Estimation with Sparse Supervision (2303.02862v3)

Published 6 Mar 2023 in cs.CV and cs.AI

Abstract: Event camera shows great potential in 3D hand pose estimation, especially addressing the challenges of fast motion and high dynamic range in a low-power way. However, due to the asynchronous differential imaging mechanism, it is challenging to design event representation to encode hand motion information especially when the hands are not moving (causing motion ambiguity), and it is infeasible to fully annotate the temporally dense event stream. In this paper, we propose EvHandPose with novel hand flow representations in Event-to-Pose module for accurate hand pose estimation and alleviating the motion ambiguity issue. To solve the problem under sparse annotation, we design contrast maximization and hand-edge constraints in Pose-to-IWE (Image with Warped Events) module and formulate EvHandPose in a weakly-supervision framework. We further build EvRealHands, the first large-scale real-world event-based hand pose dataset on several challenging scenes to bridge the real-synthetic domain gap. Experiments on EvRealHands demonstrate that EvHandPose outperforms previous event-based methods under all evaluation scenes, achieves accurate and stable hand pose estimation with high temporal resolution in fast motion and strong light scenes compared with RGB-based methods, generalizes well to outdoor scenes and another type of event camera, and shows the potential for the hand gesture recognition task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. V. Lepetit, “Recent advances in 3d object and hand pose estimation,” arXiv preprint arXiv:2006.05927, 2020.
  2. X. Deng, D. Zuo, Y. Zhang, Z. Cui, J. Cheng, P. Tan, L. Chang, M. Pollefeys, S. Fanello, and H. Wang, “Recurrent 3d hand pose estimation using cascaded pose-guided 3d alignments,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 45, no. 1, pp. 932–945, 2022.
  3. P. Lichtsteiner, C. Posch, and T. Delbrück, “A 128×128 120 dB 15 μ𝜇{\mu}italic_μs latency asynchronous temporal contrast vision sensor,” IEEE J. Solid State Circuits, pp. 566–576, 2008.
  4. A. Mitrokhin, C. Fermüller, C. Parameshwara, and Y. Aloimonos, “Event-based moving object detection and tracking,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
  5. D. Gehrig, H. Rebecq, G. Gallego, and D. Scaramuzza, “Asynchronous, photometric feature tracking using events and frames,” in Proc. of European Conference on Computer Vision (ECCV), 2018.
  6. H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza, “EMVS: Event-based multi-view stereo—3D reconstruction with an event camera in real-time,” International Journal of Computer Vision (IJCV), vol. 126, no. 12, pp. 1394–1414, 2018.
  7. J. Hagenaars, F. Paredes-Vallés, and G. De Croon, “Self-supervised learning of event-based optical flow with spiking neural networks,” Advances in Neural Information Processing Systems (NeurIPS), 2021.
  8. H. Rebecq, T. Horstschäfer, G. Gallego, and D. Scaramuzza, “Evo: A geometric approach to event-based 6-Dof parallel tracking and mapping in real time,” IEEE Robotics and Automation Letters (IRAL), 2016.
  9. Y. Zhou, G. Gallego, and S. Shen, “Event-based stereo visual odometry,” IEEE Transactions on Robotics (TRO), vol. 37, no. 5, pp. 1433–1450, 2021.
  10. V. Rudnev, V. Golyanik, J. Wang, H.-P. Seidel, F. Mueller, M. Elgharib, and C. Theobalt, “EventHands: Real-time neural 3D hand pose estimation from an event stream,” in Proc. of International Conference on Computer Vision (ICCV), 2021.
  11. H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 43, no. 6, pp. 1964–1980, 2019.
  12. M. Mostafavi, L. Wang, and K.-J. Yoon, “Learning to reconstruct HDR images from events, with applications to depth and flow prediction,” International Journal of Computer Vision (IJCV), vol. 129, no. 4, pp. 900–920, 2021.
  13. C. Zimmermann and T. Brox, “Learning to estimate 3D hand pose from single RGB images,” in Proc. of International Conference on Computer Vision (ICCV), 2017.
  14. U. Iqbal, P. Molchanov, T. B. J. Gall, and J. Kautz, “Hand pose estimation via latent 2.5 D heatmap regression,” in Proc. of European Conference on Computer Vision (ECCV), 2018.
  15. L. Ge, Z. Ren, Y. Li, Z. Xue, Y. Wang, J. Cai, and J. Yuan, “3D hand shape and pose estimation from a single RGB image,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  16. X. Zhang, Q. Li, H. Mo, W. Zhang, and W. Zheng, “End-to-end hand mesh recovery from a monocular RGB image,” in Proc. of International Conference on Computer Vision (ICCV), 2019.
  17. X. Chen, Y. Liu, Y. Dong, X. Zhang, C. Ma, Y. Xiong, Y. Zhang, and X. Guo, “MobRecon: Mobile-friendly hand mesh reconstruction from monocular image,” Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  18. I. Oikonomidis, M. I. Lourakis, and A. A. Argyros, “Evolutionary quasi-random search for hand articulations tracking,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  19. D. Tzionas, L. Ballan, A. Srikantha, P. Aponte, M. Pollefeys, and J. Gall, “Capturing hands in action using discriminative salient points and physics simulation,” International Journal of Computer Vision (IJCV), vol. 118, no. 2, pp. 172–193, 2016.
  20. I. Oikonomidis, N. Kyriazis, and A. A. Argyros, “Tracking the articulated motion of two strongly interacting hands,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
  21. L. Ballan, A. Taneja, J. Gall, L. V. Gool, and M. Pollefeys, “Motion capture of hands in action using discriminative salient points,” in Proc. of European Conference on Computer Vision (ECCV), 2012.
  22. N. Kyriazis and A. Argyros, “Scalable 3D tracking of multiple interacting objects,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  23. J. Romero, D. Tzionas, and M. J. Black, “Embodied Hands: Modeling and capturing hands and bodies together,” ACM Transactions on Graphics (Proc. of ACM SIGGRAPH), vol. 36, no. 6, 2017.
  24. Y. Cai, L. Ge, J. Cai, and J. Yuan, “Weakly-supervised 3D hand pose estimation from monocular RGB images,” in Proc. of European Conference on Computer Vision (ECCV), 2018.
  25. A. Spurr, U. Iqbal, P. Molchanov, O. Hilliges, and J. Kautz, “Weakly supervised 3D hand pose estimation via biomechanical constraints,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
  26. X. Deng, Y. Zhu, Y. Zhang, Z. Cui, P. Tan, W. Qu, C. Ma, and H. Wang, “Weakly supervised learning for single depth-based hand shape recovery,” IEEE Transactions on Image Processing (TIP), vol. 30, pp. 532–545, 2020.
  27. D. Kulon, R. A. Guler, I. Kokkinos, M. M. Bronstein, and S. Zafeiriou, “Weakly-supervised mesh-convolutional hand reconstruction in the wild,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  28. Y. Chen, Z. Tu, D. Kang, L. Bao, Y. Zhang, X. Zhe, R. Chen, and J. Yuan, “Model-based 3D hand reconstruction via self-supervised learning,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  29. C. Wan, T. Probst, L. V. Gool, and A. Yao, “Self-supervised 3D hand pose estimation through training by fitting,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  30. A. Boukhayma, R. d. Bem, and P. H. Torr, “3D hand shape and pose from images in the wild,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  31. N. Neverova, J. Thewlis, R. A. Guler, I. Kokkinos, and A. Vedaldi, “Slim densepose: Thrifty learning from sparse annotations and motion cues,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  32. Y. Hasson, B. Tekin, F. Bogo, I. Laptev, M. Pollefeys, and C. Schmid, “Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  33. D. Gehrig, M. Gehrig, J. Hidalgo-Carrió, and D. Scaramuzza, “Video to events: Recycling video datasets for event cameras,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  34. Y. Hu, T. Delbrück, and S. Liu, “Learning to exploit multiple vision modalities by using grafted networks,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
  35. N. Messikommer, D. Gehrig, M. Gehrig, and D. Scaramuzza, “Bridging the gap between events and frames through unsupervised domain adaptation,” IEEE Robotics and Automation Letters (IRAL), 2022.
  36. A. I. Maqueda, A. Loquercio, G. Gallego, N. García, and D. Scaramuzza, “Event-based vision meets deep learning on steering prediction for self-driving cars,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  37. X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman, “Hots: a hierarchy of event-based time-surfaces for pattern recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 39, no. 7, pp. 1346–1359, 2016.
  38. A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  39. R. W. Baldwin, R. Liu, M. Almatrafi, V. K. Asari, and K. Hirakawa, “Time-ordered recent event (TORE) volumes for event cameras,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2023.
  40. D. Gehrig, A. Loquercio, K. G. Derpanis, and D. Scaramuzza, “End-to-end learning of representations for asynchronous event-based data,” in Proc. of International Conference on Computer Vision (ICCV), 2019.
  41. M. Cannici, M. Ciccone, A. Romanoni, and M. Matteucci, “A differentiable recurrent surface for asynchronous event-based data,” in Proc. of European Conference on Computer Vision (ECCV), A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., 2020.
  42. A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “EV-FlowNet: Self-supervised optical flow estimation for event-based cameras,” 2018.
  43. L. Xu, W. Xu, V. Golyanik, M. Habermann, L. Fang, and C. Theobalt, “EventCap: Monocular 3D capture of high-speed human motions using an event camera,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  44. S. Zou, C. Guo, X. Zuo, S. Wang, P. Wang, X. Hu, S. Chen, M. Gong, and L. Cheng, “EventHPE: Event-based 3D human pose and shape estimation,” in Proc. of International Conference on Computer Vision (ICCV), 2021.
  45. J. Nehvi, V. Golyanik, F. Mueller, H.-P. Seidel, M. Elgharib, and C. Theobalt, “Differentiable event stream simulator for non-rigid 3D tracking,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  46. J. P. Lewis, M. Cordner, and N. Fong, “Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation,” in ACM Transactions on Graphics (Proc. of ACM SIGGRAPH), 2000, pp. 165–172.
  47. X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015.
  48. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  49. K. Shoemake, “Animating rotation with quaternion curves,” in ACM Transactions on Graphics (Proc. of ACM SIGGRAPH), 1985.
  50. G. Gallego, H. Rebecq, and D. Scaramuzza, “A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  51. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in International Joint Conference on Artificial Intelligence (IJCAI), 1981.
  52. G. Gallego, M. Gehrig, and D. Scaramuzza, “Focus is all you need: Loss functions for event-based vision,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  53. J. Tompson, M. Stein, Y. LeCun, and K. Perlin, “Real-time continuous pose recovery of human hands using convolutional networks,” ACM Transactions on Graphics, 2014.
  54. J. Zhang, J. Jiao, M. Chen, L. Qu, X. Xu, and Q. Yang, “3d hand pose tracking and estimation using stereo matching,” CoRR, 2016. [Online]. Available: http://arxiv.org/abs/1610.07214
  55. S. Yuan, Q. Ye, B. Stenger, S. Jain, and T. Kim, “Bighand2.2m benchmark: Hand pose dataset and state of the art analysis,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017.
  56. C. Zimmermann, D. Ceylan, J. Yang, B. Russell, M. Argus, and T. Brox, “Freihand: A dataset for markerless capture of hand pose and shape from single RGB images,” in Proc. of International Conference on Computer Vision (ICCV), 2019.
  57. G. Moon, S.-I. Yu, H. Wen, T. Shiratori, and K. M. Lee, “Interhand2. 6M: A dataset and baseline for 3D interacting hand pose estimation from a single RGB image,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
  58. S. Brahmbhatt, C. Tang, C. D. Twigg, C. C. Kemp, and J. Hays, “ContactPose: A dataset of grasps with object contact and hand pose,” in Proc. of European Conference on Computer Vision (ECCV), 2020.
  59. T. Simon, H. Joo, I. Matthews, and Y. Sheikh, “Hand keypoint detection in single images using multiview bootstrapping,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  60. S. Han, B. Liu, R. Cabezas, C. D. Twigg, P. Zhang, J. Petkau, T.-H. Yu, C.-J. Tai, M. Akbay, Z. Wang et al., “MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality,” ACM Transactions on Graphics (Proc. of ACM SIGGRAPH), vol. 39, no. 4, pp. 87–1, 2020.
  61. J. Heikkila and O. Silvén, “A four-step camera calibration procedure with implicit image correction,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 1997.
  62. P. Duan, Z. W. Wang, X. Zhou, Y. Ma, and B. Shi, “Eventzoom: Learning to denoise and super resolve neuromorphic events,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  63. Q. De Smedt, H. Wannous, J.-P. Vandeborre, J. Guerry, B. L. Saux, and D. Filliat, “3D hand gesture recognition using a depth and skeletal dataset: Shrec’17 track,” in Proceedings of the Workshop on 3D Object Retrieval, 2017.
  64. F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L. Chang, and M. Grundmann, “Mediapipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020.
  65. J. C. Gower, “Generalized procrustes analysis,” Psychometrika, 1975.
  66. J. S. Supancic III, G. Rogez, Y. Yang, J. Shotton, and D. Ramanan, “Depth-based hand pose estimation: methods, data, and challenges,” Proc. of International Conference on Computer Vision (ICCV), 2015.
  67. S. Chen and M. Guo, “Live demonstration: CeleX-V: A 1M pixel multi-mode event-based sensor,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 1682–1683.
  68. T. Finateu, A. Niwa, D. Matolin, K. Tsuchimoto, A. Mascheroni, E. Reynaud, P. Mostafalu, F. T. Brady, L. Chotard, F. LeGoff, H. Takahashi, H. Wakabayashi, Y. Oike, and C. Posch, “A 1280×720 back-illuminated stacked temporal contrast event-based vision sensor with 4.86μ𝜇{\mu}italic_μm pixels, 1.066GEPS readout, programmable event-rate controller and compressive data-formatting pipeline,” in IEEE International Solid- State Circuits Conference, 2020, pp. 112–114.
  69. W. Zhang, Z. Lin, J. Cheng, C. Ma, X. Deng, and H. Wang, “STA-GCN: two-stream graph convolutional network with spatial-temporal attention for hand gesture recognition,” The Visual Computer, 2020.
  70. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
  71. G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid, “Learning from synthetic humans,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Citations (5)

Summary

We haven't generated a summary for this paper yet.