Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

URHand: Universal Relightable Hands (2401.05334v1)

Published 10 Jan 2024 in cs.CV and cs.GR

Abstract: Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows few-shot personalization using images captured with a mobile phone, and is ready to be photorealistically rendered under novel illuminations. To simplify the personalization process while retaining photorealism, we build a powerful universal relightable prior based on neural relighting from multi-view images of hands captured in a light stage with hundreds of identities. The key challenge is scaling the cross-identity training while maintaining personalized fidelity and sharp details without compromising generalization under natural illuminations. To this end, we propose a spatially varying linear lighting model as the neural renderer that takes physics-inspired shading as input feature. By removing non-linear activations and bias, our specifically designed lighting model explicitly keeps the linearity of light transport. This enables single-stage training from light-stage data while generalizing to real-time rendering under arbitrary continuous illuminations across diverse identities. In addition, we introduce the joint learning of a physically based model and our neural relighting model, which further improves fidelity and generalization. Extensive experiments show that our approach achieves superior performance over existing methods in terms of both quality and generalizability. We also demonstrate quick personalization of URHand from a short phone scan of an unseen identity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Motion capture of hands in action using discriminative salient points. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12, pages 640–653. Springer, 2012.
  2. Deep relightable appearance models for animatable faces. ACM Transactions on Graphics (TOG), 40(4):1–15, 2021.
  3. Brent Burley and Walt Disney Animation Studios. Physically-based shading at disney. In Acm Siggraph, pages 1–7. vol. 2012, 2012.
  4. Hand avatar: Free-pose hand animation and rendering from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8683–8693, 2023.
  5. Relighting4d: Neural relightable human from videos. In European Conference on Computer Vision, pages 606–623. Springer, 2022.
  6. Lisa: Learning implicit shape and appearance of hands. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20533–20543, 2022.
  7. Model-based 3d hand pose estimation from monocular video. IEEE transactions on pattern analysis and machine intelligence, 33(9):1793–1805, 2011.
  8. Acquiring the reflectance field of a human face. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 145–156, 2000.
  9. Dart: Articulated hand model with diverse accessories and rich textures. Advances in Neural Information Processing Systems, 35:37055–37067, 2022.
  10. The relightables: Volumetric performance capture of humans with realistic relighting. ACM Transactions on Graphics (ToG), 38(6):1–19, 2019.
  11. Towards high fidelity face relighting with realistic shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14719–14728, 2021.
  12. Face relighting with geometrically consistent shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4217–4226, 2022.
  13. Rana: Relightable articulated neural avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23142–23153, 2023.
  14. Relightablehands: Efficient neural relighting of articulated hand models. In CVPR, 2023.
  15. Geometry-aware single-image full-body human relighting. In European Conference on Computer Vision, pages 388–405. Springer, 2022.
  16. Tensoir: Tensorial inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2023.
  17. Relighting humans: occlusion-aware inverse rendering for full-body human images. arXiv preprint arXiv:1908.02714, 2019.
  18. Alias-free generative adversarial networks. In Proc. NeurIPS, 2021.
  19. A skeleton-driven neural occupancy representation for articulated hands. In 2021 International Conference on 3D Vision (3DV), pages 11–21. IEEE, 2021.
  20. Harp: Personalized hand reconstruction from a monocular rgb video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12802–12813, 2023.
  21. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  22. Single-image full-body human relighting. arXiv preprint arXiv:2107.07259, 2021.
  23. Megane: Morphable eyeglass and avatar network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12769–12779, 2023.
  24. Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148, 2019.
  25. Nimble: a non-rigid hand model with bones and muscles. ACM Transactions on Graphics (TOG), 41(4):1–16, 2022.
  26. Geometric gan. arXiv preprint arXiv:1705.02894, 2017.
  27. Robust high-resolution video matting with temporal guidance. In WACV, 2022.
  28. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
  29. Deep reflectance fields: high-quality facial reflectance field inference from color gradient illumination. ACM Transactions on Graphics (TOG), 38(4):1–12, 2019.
  30. I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 752–768. Springer, 2020.
  31. V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In Proceedings of the IEEE conference on computer vision and pattern Recognition, pages 5079–5088, 2018.
  32. Deephandmesh: A weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 440–455. Springer, 2020.
  33. A dataset of relighted 3d interacting hands. arXiv preprint arXiv:2310.17768, 2023.
  34. Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In Proceedings of the IEEE International Conference on Computer Vision, pages 1154–1163, 2017.
  35. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 49–59, 2018.
  36. Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Transactions on Graphics (ToG), 38(4):1–13, 2019.
  37. Livehand: Real-time and photorealistic neural hand rendering, 2023.
  38. Learning physics-guided face relighting under directional light. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5124–5133, 2020.
  39. Total relighting: learning to relight portraits for background replacement. ACM Transactions on Graphics (TOG), 40(4):1–21, 2021.
  40. Handy: Towards a high fidelity 3d hand shape and appearance model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4670–4680, 2023.
  41. Html: A parametric hand texture model for 3d hand reconstruction and personalization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 54–71. Springer, 2020.
  42. Relitalk: Relightable talking portrait generation from a single video. arXiv preprint arXiv:2309.02434, 2023.
  43. Pivotal tuning for latent-based editing of real images. ACM Trans. Graph., 2021.
  44. Embodied hands: modeling and capturing hands and bodies together. ACM Transactions on Graphics (TOG), 36(6):1–17, 2017.
  45. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  46. Litnerf: Intrinsic radiance decomposition for high-quality view synthesis and relighting of faces. In ACM SIGGRAPH Asia 2023, 2023.
  47. Sfsnet: Learning shape, reflectance and illuminance of facesin the wild’. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6296–6305, 2018.
  48. Constraining dense hand surface tracking with elasticity. ACM Transactions on Graphics (TOG), 39(6):1–14, 2020.
  49. Interactive markerless articulated hand motion tracking using rgb and depth data. In Proceedings of the IEEE international conference on computer vision, pages 2456–2463, 2013.
  50. Fast and robust hand tracking using detection-guided optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3221, 2015.
  51. Light stage super-resolution: continuous high-frequency relighting. ACM Transactions on Graphics (TOG), 39(6):1–12, 2020.
  52. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  53. Advances in neural rendering. In Computer Graphics Forum, pages 703–735. Wiley Online Library, 2022.
  54. Sphere-meshes for real-time hand modeling and tracking. ACM Transactions on Graphics (ToG), 35(6):1–11, 2016.
  55. Hand modeling and simulation using stabilized magnetic resonance imaging. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019.
  56. Sunstage: Portrait reconstruction and relighting using the sun as a light stage. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20792–20802, 2023.
  57. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
  58. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM Transactions on Graphics (TOG), 24(3):756–764, 2005.
  59. Analysis of human faces using a measurement-based skin reflectance model. ACM Transactions on Graphics (ToG), 25(3):1013–1024, 2006.
  60. Neural fields in visual computing and beyond. In Computer Graphics Forum, pages 641–676. Wiley Online Library, 2022.
  61. Renerf: Relightable neural radiance fields with nearfield lighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22581–22591, 2023.
  62. Deep image-based relighting from optimal sparse samples. ACM Transactions on Graphics (ToG), 37(4):1–13, 2018.
  63. High-fidelity facial reflectance and geometry inference from an unconstrained image. ACM Transactions on Graphics (TOG), 37(4):1–14, 2018.
  64. Towards practical capture of high-fidelity relightable avatars. In SIGGRAPH Asia 2023 Conference Proceedings, 2023.
  65. Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation. ACM Transactions on Graphics (TOG), 41(6):1–21, 2022.
  66. Geometry-consistent neural shape representation with implicit displacement fields. In International Conference on Learning Representations, 2022.
  67. Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5453–5462, 2021a.
  68. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  69. Neural light transport for relighting and view synthesis. ACM Transactions on Graphics (TOG), 40(1):1–17, 2021b.
  70. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (ToG), 40(6):1–18, 2021c.
  71. Simulation of hand anatomy using medical imaging. ACM Transactions on Graphics (TOG), 41(6):1–20, 2022.
Citations (6)

Summary

  • The paper presents a model that creates personalized, photorealistic hand renders with real-time relighting using simple mobile phone captures.
  • It employs a spatially varying linear lighting model alongside a dual branch (geometry and neural) approach to achieve accurate light transport.
  • Quantitative experiments demonstrate that URHand outperforms existing methods in adaptability, efficiency, and rendering accuracy under diverse illuminations.

Overview of the URHand Model

The paper presents URHand, a cutting-edge model dedicated to creating photorealistic, relightable models of human hands that are generalizable across different identities, viewpoints, poses, and lighting conditions. In digital mediums like video games and virtual environments, hands are omnipresent and central to user interaction. The ability to relight hands in real time to match varying lighting conditions is crucial for immersive experiences. However, current methods to achieve such realistic hand models tend to be resource-intensive, lacking generalizability, and requiring extensive data capturing processes for each identity.

Key Innovations

URHand addresses these challenges by streamlining the personalization process, essentially allowing the use of simple mobile phone captures to create a personalized hand model capable of real-time photorealistic rendering under various lighting scenarios. The team achieves this by developing a robust universal relightable prior, which is trained on a diverse dataset captured from multiview images of hands in a light stage setup. This innovation surpasses previous limitations, accommodating for personalized details and fidelity without sacrificing the model's ability to generalize across different lighting conditions.

Technical Approach

To maintain a high level of detail while preserving the linearity of lighting (an essential aspect for realistic relighting), the model employs a spatially varying linear lighting model as its neural renderer. This innovative model structure is designed without the non-linear activation functions and bias typically found in neural networks, ensuring that the light transport behaves linearly according to physics principles. As a result, URHand can be trained with light-stage data and then used to render hands under arbitrary continuous illuminations in real time with impressive accuracy.

Moreover, the paper introduces a dual approach combining practical geometry estimation with neural relighting. Through this hybrid framework, the physical branch focuses on refining hand geometry and producing precise shading features, while the neural branch manages the complex light interplay, such as subsurface scattering. Both branches are simultaneously optimized with specially tailored loss functions to enhance the quality and detail of the final relighting outcome.

Results and Potential

Quantitative experiments and extensive ablation studies validate the superiority of URHand over existing methods, both in quality and the ability to adapt to novel situations. Pivotal to this endeavor is their strategy of combining the strengths of physically based rendering with the versatility of data-driven neural approaches. This combination unlocks powerful realism and flexibility previously unattainable in real-time applications. Moreover, the paper demonstrates URHand's capability of quick personalization from a casual phone scan, making it a pioneer in easily accessible, realistic, and relightable hand modeling.