Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HARP: Personalized Hand Reconstruction from a Monocular RGB Video (2212.09530v3)

Published 19 Dec 2022 in cs.CV

Abstract: We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex displacement map, a normal map, and an albedo without any neural components. As validated by our experiments, the explicit nature of our representation enables a truly scalable, robust, and efficient approach to hand avatar creation. HARP is optimized via gradient descent from a short sequence captured by a hand-held mobile phone and can be directly used in AR/VR applications with real-time rendering capability. To enable this, we carefully design and implement a shadow-aware differentiable rendering scheme that is robust to high degree articulations and self-shadowing regularly present in hand motion sequences, as well as challenging lighting conditions. It also generalizes to unseen poses and novel viewpoints, producing photo-realistic renderings of hand animations performing highly-articulated motions. Furthermore, the learned HARP representation can be used for improving 3D hand pose estimation quality in challenging viewpoints. The key advantages of HARP are validated by the in-depth analyses on appearance reconstruction, novel-view and novel pose synthesis, and 3D hand pose refinement. It is an AR/VR-ready personalized hand representation that shows superior fidelity and scalability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. Motion capture of hands in action using discriminative salient points. In European Conference on Computer Vision (ECCV), volume 7577 of LNCS, pages 640–653. Springer, 2012.
  2. Varitex: Variational neural face textures. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
  3. Weakly-supervised 3d hand pose estimation from monocular rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 666–682, 2018.
  4. Reconstructing hand-object interactions in the wild. ICCV, 2021.
  5. DexYCB: A benchmark for capturing hand grasping of objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  6. I2uv-handnet: Image-to-uv prediction network for accurate and high-fidelity 3d hand mesh modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12929–12938, 2021.
  7. Hand avatar: Free-pose hand animation and rendering from monocular video. arXiv:2211.12782, 2022.
  8. Model-based 3d hand reconstruction via self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10451–10460, 2021.
  9. Relighting4d: Neural relightable human from videos. In ECCV, 2022.
  10. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In European Conference on Computer Vision (ECCV), 2020.
  11. Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018.
  12. Lisa: Learning implicit shape and appearance of hands. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20533–20543, 2022.
  13. Implicit fairing of irregular meshes using diffusion and curvature flow. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 317–324, 1999.
  14. Hope-net: A graph-based model for hand-object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6608–6617, 2020.
  15. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics, (Proc. SIGGRAPH), 40(8), 2021.
  16. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021.
  17. First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 409–419, 2018.
  18. 3d hand shape and pose estimation from a single rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10833–10842, 2019.
  19. Neural head avatars from monocular rgb videos. arXiv preprint arXiv:2112.01554, 2021.
  20. Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising. arXiv:2206.03380, 2022.
  21. Learning joint reconstruction of hands and manipulated objects. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2019.
  22. Hvtr: Hybrid volumetric-textural rendering for human avatars. In 2022 International Conference on 3D Vision (3DV), 2022.
  23. Hand pose estimation via latent 2.5 d heatmap regression. In Proceedings of the European Conference on Computer Vision (ECCV), pages 118–134, 2018.
  24. Kaleido AI GmbH. Unscreen. February 2021.
  25. A skeleton-driven neural occupancy representation for articulated hands. In International Conference on 3D Vision (3DV), 2021.
  26. Grasping field: Learning implicit representations for human grasps. In 2020 International Conference on 3D Vision (3DV), pages 333–344. IEEE, 2020.
  27. Multi-view image-based hand geometry refinement using differentiable monte carlo ray tracing. BMVC, 2021.
  28. Neural 3d mesh renderer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3907–3916, 2018.
  29. Deep video portraits. ACM Transactions on Graphics (TOG), 37(4):1–14, 2018.
  30. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  31. Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019.
  32. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  33. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
  34. Tava: Template-free animatable volumetric actors. 2022.
  35. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
  36. Piano: A parametric hand bone model from magnetic resonance imaging. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 816–822. International Joint Conferences on Artificial Intelligence Organization, 8 2021.
  37. Nimble: A non-rigid hand model with bones and muscles. arXiv preprint arXiv:2202.04533, 2022.
  38. End-to-end human pose and mesh reconstruction with transformers. In CVPR, 2021.
  39. Robust high-resolution video matting with temporal guidance, 2021.
  40. Soft rasterizer: A differentiable renderer for image-based 3d reasoning. The IEEE International Conference on Computer Vision (ICCV), Oct 2019.
  41. KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints. In European conference on computer vision, 2022.
  42. COAP: Compositional articulated occupancy of people. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2022.
  43. LEAP: Learning articulated occupancy of people. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2021.
  44. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  45. I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 752–768. Springer, 2020.
  46. Neuralannot: Neural annotator for in-the-wild expressive 3d human pose and mesh training sets. arXiv preprint arXiv:2011.11232, 2020.
  47. Deephandmesh: A weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling. In European Conference on Computer Vision (ECCV), 2020.
  48. V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In Proceedings of the IEEE conference on computer vision and pattern Recognition, pages 5079–5088, 2018.
  49. Interhand2.6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In European Conference on Computer Vision (ECCV), 2020.
  50. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 49–59, 2018.
  51. Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera. ACM Transactions on Graphics (TOG), 38(4), 2019.
  52. Body2hands: Learning to infer 3d hands from conversational gesture body dynamics. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11865–11874, 2021.
  53. Mitsuba 2: A retargetable forward and inverse renderer. Transactions on Graphics (Proceedings of SIGGRAPH Asia), 38(6), Dec. 2019.
  54. Neural articulated radiance field. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5762–5772, 2021.
  55. Neural articulated radiance field. In International Conference on Computer Vision, 2021.
  56. Nerfies: Deformable neural radiance fields. ICCV, 2021.
  57. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), dec 2021.
  58. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
  59. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR, 2021.
  60. Bui Tuong Phong. Illumination for computer generated pictures. Communications of the ACM, 18(6):311–317, 1975.
  61. Smplpix: Neural avatars from 3d human models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1810–1819, 2021.
  62. HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2020.
  63. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
  64. Rendering antialiased shadows with depth maps. In Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pages 283–291, 1987.
  65. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
  66. Eventhands: Real-time neural 3d hand pose estimation from an event stream. In International Conference on Computer Vision (ICCV), 2021.
  67. Realistichands: A hybrid model for 3d hand reconstruction. In 2021 International Conference on 3D Vision (3DV), pages 22–31. IEEE, 2021.
  68. First order motion model for image animation. In Conference on Neural Information Processing Systems (NeurIPS), December 2019.
  69. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  70. Constraining dense hand surface tracking with elasticity. ACM Transactions on Graphics (TOG), 39:1 – 14, 2020.
  71. As-rigid-as-possible surface modeling. In Proceedings of EUROGRAPHICS/ACM SIGGRAPH Symposium on Geometry Processing, pages 109–116, 2007.
  72. Weakly supervised 3d hand pose estimation via biomechanical constraints. In European Conference on Computer Vision (ECCV), 2020.
  73. A-nerf: Surface-free human 3d pose refinement via neural rendering. arXiv preprint arXiv:2102.06199, 2021.
  74. Towards accurate alignment in real-time 3d hand-mesh reconstruction. In International Conference on Computer Vision (ICCV), pages 11698–11707, 2021.
  75. H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4511–4520, 2019.
  76. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  77. Metaavatar: Learning animatable clothed human models from few depth images. In Advances in Neural Information Processing Systems, 2021.
  78. Sunstage: Portrait reconstruction and relighting using the sun as a light stage. arXiv preprint arXiv:2204.03648, 2022.
  79. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee, 2003.
  80. HumanNeRF: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16210–16220, June 2022.
  81. Lance Williams. Casting curved shadows on curved surfaces. In Proceedings of the 5th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’78, page 270–274, New York, NY, USA, 1978. Association for Computing Machinery.
  82. H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. Advances in Neural Information Processing Systems, 34, 2021.
  83. Ghum & ghuml: Generative 3d human shape and articulated pose models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6184–6193, 2020.
  84. Disentangling latent hands for image synthesis and pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9877–9886, 2019.
  85. Single depth view based real-time reconstruction of hand-object interactions. ACM Transactions on Graphics (TOG), 40(3):1–12, 2021.
  86. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  87. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (TOG), 40(6):1–18, 2021.
  88. Contrastive representation learning for hand shape estimation. In arxive, 2021.
  89. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision, pages 4903–4911, 2017.
Citations (19)

Summary

We haven't generated a summary for this paper yet.