Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

URAvatar: Universal Relightable Gaussian Codec Avatars (2410.24223v1)

Published 31 Oct 2024 in cs.CV and cs.GR

Abstract: We present a new approach to creating photorealistic and relightable head avatars from a phone scan with unknown illumination. The reconstructed avatars can be animated and relit in real time with the global illumination of diverse environments. Unlike existing approaches that estimate parametric reflectance parameters via inverse rendering, our approach directly models learnable radiance transfer that incorporates global light transport in an efficient manner for real-time rendering. However, learning such a complex light transport that can generalize across identities is non-trivial. A phone scan in a single environment lacks sufficient information to infer how the head would appear in general environments. To address this, we build a universal relightable avatar model represented by 3D Gaussians. We train on hundreds of high-quality multi-view human scans with controllable point lights. High-resolution geometric guidance further enhances the reconstruction accuracy and generalization. Once trained, we finetune the pretrained model on a phone scan using inverse rendering to obtain a personalized relightable avatar. Our experiments establish the efficacy of our design, outperforming existing approaches while retaining real-time rendering capability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (84)
  1. FLAME-in-NeRF: Neural control of radiance fields for free view face animation. In International Conference on Automatic Face and Gesture Recognition (FG).
  2. RigNeRF: Fully controllable neural 3D portraits. In Conference on Computer Vision and Pattern Recognition (CVPR). 20364–20373.
  3. High-quality capture of eyes. Transactions on Graphics (TOG) 33, 6 (2014), 223:1–12.
  4. FLARE: Fast learning of Animatable and Relightable Mesh Avatars. Transactions on Graphics (TOG) 42, 6 (2023), 204:1–15.
  5. Deep relightable appearance models for animatable faces. Transactions on Graphics (TOG) 40, 4 (2021), 89:1–15.
  6. Deep reflectance volumes: Relightable reconstructions from multi-view photometric images. In European Conference on Computer Vision (ECCV). 294–311.
  7. Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 157–164.
  8. Authentic volumetric avatars from a phone scan. Transactions on Graphics (TOG) 41, 4 (2022), 163:1–19.
  9. FacewareHouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413–425.
  10. Real-time facial animation with image-based dynamic avatars. Transactions on Graphics (TOG) 35, 4 (2016), 126:1–12.
  11. Photo-realistic facial details synthesis from single image. In International Conference on Computer Vision (ICCV). 9429–9439.
  12. URHand: Universal Relightable Hands. In Conference on Computer Vision and Pattern Recognition (CVPR).
  13. Robert L Cook and Kenneth E. Torrance. 1982. A reflectance model for computer graphics. Transactions on Graphics (TOG) 1, 1 (1982), 7–24.
  14. Acquiring the reflectance field of a human face. In SIGGRAPH. 145–156.
  15. LumiGAN: Unconditional Generation of Relightable 3D Human Faces. In International Conference on 3D Vision (3DV). 302–312.
  16. Capturing and stylizing hair for 3D fabrication. Transactions on Graphics (TOG) 33, 4 (2014), 125:1–11.
  17. E Friesen and Paul Ekman. 1978. Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3, 2 (1978), 5.
  18. Near-Instant Capture of High-Resolution Facial Geometry and Reflectance. In Computer Graphics Forum, Vol. 35. 353–363.
  19. Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In Conference on Computer Vision and Pattern Recognition (CVPR). 8649–8658.
  20. Rotation-equivariant conditional spherical neural fields for learning a natural illumination prior. Advances in Neural Information Processing Systems 35 (2022), 26309–26323.
  21. Multiview face capture using polarized spherical gradient illumination. Transactions on Graphics (TOG) 30, 6 (2011), 129:1–10.
  22. Neural head avatars from monocular RGB videos. In Conference on Computer Vision and Pattern Recognition (CVPR). 18653–18664.
  23. Hypernetworks. In International Conference on Learning Representations (ICLR).
  24. Avatar digitization from a single image for real-time rendering. Transactions on Graphics (TOG) 36, 6 (2017), 195:1–14.
  25. Dynamic 3D avatar creation from hand-held video input. Transactions on Graphics (TOG) 34, 4 (2015), 45:1–14.
  26. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision (ECCV). 694–711.
  27. 3D Gaussian splatting for real-time radiance field rendering. Transactions on Graphics (TOG) 42, 4 (2023), 139:1–14.
  28. SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting. In Conference on Computer Vision and Pattern Recognition (CVPR).
  29. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).
  30. AvatarMe: Realistically Renderable 3D Facial Reconstruction “in-the-wild”. In Conference on Computer Vision and Pattern Recognition (CVPR). 760–769.
  31. AvatarMe++: Facial shape and BRDF inference with photorealistic rendering-aware GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12 (2021), 9269–9284.
  32. Practice and theory of blendshape facial models. Eurographics State-of-the-Art Reports 1, 8 (2014), 2.
  33. EyeNeRF: a hybrid representation for photorealistic synthesis, animation and relighting of human eyes. Transactions on Graphics (TOG) 41, 4 (2022), 166:1–16.
  34. Realtime facial animation with on-the-fly correctives. Transactions on Graphics (TOG) 32, 4 (2013), 42:1–10.
  35. MEGANE: Morphable Eyeglass and Avatar Network. In Conference on Computer Vision and Pattern Recognition (CVPR). 12769–12779.
  36. Learning Formation of Physically-Based Face Attributes. In Conference on Computer Vision and Pattern Recognition (CVPR). 3407–3416.
  37. Learning a model of facial shape and expression from 4D scans. Transactions on Graphics (TOG) 36, 6 (2017), 194:1–17.
  38. Single-shot implicit morphable faces with consistent texture parameterization. In SIGGRAPH Conference Proceedings. 83:1–12.
  39. Rapid Face Asset Acquisition with Recurrent Feature Alignment. Transactions on Graphics (TOG) 41, 6 (2022), 214:1–17.
  40. Deep appearance models for face rendering. Transactions on Graphics (TOG) 37, 4 (2018), 68:1–13.
  41. Mixture of volumetric primitives for efficient neural rendering. Transactions on Graphics (TOG) 40, 4 (2021), 59:1–13.
  42. Structure-aware hair capture. Transactions on Graphics (TOG) 32, 4 (2013), 76:1–12.
  43. Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering. Transactions on Graphics (TOG) 42, 6 (2023).
  44. Pixel codec avatars. In Conference on Computer Vision and Pattern Recognition (CVPR). 64–73.
  45. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. Rendering Techniques 2007, 9 (2007), 10.
  46. Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image. In Conference on Computer Vision and Pattern Recognition (CVPR). 4263–4273.
  47. Deep reflectance fields: high-quality facial reflectance field inference from color gradient illumination. Transactions on Graphics (TOG) 38, 4 (2019), 77:1–12.
  48. Deep relightable textures: volumetric performance capture with neural rendering. Transactions on Graphics (TOG) 39, 6 (2020), 259:1–21.
  49. NeRF: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  50. paGAN: real-time avatars using dynamic textures. Transactions on Graphics (TOG) 37, 6 (2018), 258:1–12.
  51. Strand-accurate multi-view hair capture. In Conference on Computer Vision and Pattern Recognition (CVPR). 155–164.
  52. Total relighting: learning to relight portraits for background replacement. Transactions on Graphics (TOG) 40, 4 (2021), 43:1–21.
  53. Post-production facial performance relighting using reflectance transfer. Transactions on Graphics (TOG) 26, 3 (2007), 52–es.
  54. Synthesizing realistic facial expressions from photographs. In SIGGRAPH Courses.
  55. Generating 3D faces using convolutional mesh autoencoders. In European Conference on Computer Vision (ECCV). 704–720.
  56. FaceLit: Neural 3D Relightable Faces. In Conference on Computer Vision and Pattern Recognition (CVPR). 8619–8628.
  57. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI. 234–241.
  58. Relightable Gaussian Codec Avatars. In Conference on Computer Vision and Pattern Recognition (CVPR).
  59. The eyes have it: An integrated eye and face model for photorealistic facial animation. Transactions on Graphics (TOG) 39, 4 (2020), 91:1–15.
  60. A light stage on every desk. In International Conference on Computer Vision (ICCV). 2420–2429.
  61. Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments. Transactions on Graphics (TOG) 21, 3 (2002), 527––536.
  62. A morphable face albedo model. In Conference on Computer Vision and Pattern Recognition (CVPR). 5011–5020.
  63. Single image portrait relighting. Transactions on Graphics (TOG) 38, 4 (2019), 79:1–12.
  64. Volux-gan: A generative model for 3d face synthesis with hdri relighting. In ACM SIGGRAPH 2022 Conference Proceedings. 1–9.
  65. Stylerig: Rigging stylegan for 3d control over portrait images. In Conference on Computer Vision and Pattern Recognition (CVPR). 6142–6151.
  66. Deferred neural rendering: Image synthesis using neural textures. Transactions on Graphics (TOG) 38, 4 (2019), 66:1–12.
  67. Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D face morphable model. In Conference on Computer Vision and Pattern Recognition (CVPR). 7346–7355.
  68. Face transfer with multilinear models. In SIGGRAPH Courses.
  69. All-frequency rendering of dynamic, spatially-varying reflectance. In ACM SIGGRAPH Asia. 131:1–10.
  70. StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video. In SIGGRAPH 2023 Conference Proceedings.
  71. Sunstage: Portrait reconstruction and relighting using the sun as a light stage. In Conference on Computer Vision and Pattern Recognition (CVPR). 20792–20802.
  72. Single image portrait relighting via explicit multiple reflectance channel modeling. Transactions on Graphics (TOG) 39, 6 (2020), 220:1–13.
  73. Model-based teeth reconstruction. Transactions on Graphics (TOG) 35, 6 (2016), 220:1–13.
  74. Neural fields in visual computing and beyond. In Computer Graphics Forum, Vol. 41. 641–676.
  75. Artist-Friendly Relightable and Animatable Neural Heads. In Conference on Computer Vision and Pattern Recognition (CVPR). 2457–2467.
  76. LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar. In SIGGRAPH Conference Proceedings.
  77. High-fidelity facial reflectance and geometry inference from an unconstrained image. Transactions on Graphics (TOG) 37, 4 (2018), 162:1–14.
  78. Towards Practical Capture of High-Fidelity Relightable Avatars. In SIGGRAPH Asia 2023 Conference Proceedings.
  79. VRMM: A volumetric relightable morphable head model. In ACM SIGGRAPH Conference Papers.
  80. Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation. Transactions on Graphics (TOG) 41, 6 (2022), 231:1–21.
  81. Neural light transport for relighting and view synthesis. Transactions on Graphics (TOG) 40, 1 (2021), 9:1–17.
  82. I M Avatar: Implicit Morphable Head Avatars from Videos. In Conference on Computer Vision and Pattern Recognition (CVPR). 13535–13545.
  83. PointAvatar: Deformable point-based head avatars from videos. In Conference on Computer Vision and Pattern Recognition (CVPR). 21057–21067.
  84. Instant volumetric head avatars. In Conference on Computer Vision and Pattern Recognition (CVPR). 4574–4584.
Citations (1)

Summary

  • The paper introduces a novel 3D Gaussian representation combined with end-to-end learnable radiance transfer to render highly detailed, photorealistic avatars.
  • The paper leverages single-phone scans and multi-view data to achieve consistent relighting and accurate geometric tracking under diverse lighting conditions.
  • The paper demonstrates significant improvements in rendering fidelity, outperforming prior methods as evidenced by metrics such as Mean Absolute Error and LPIPS.

Overview of "URAvatar: Universal Relightable Gaussian Codec Avatars"

The paper "URAvatar: Universal Relightable Gaussian Codec Avatars" addresses the creation of photorealistic, relightable head avatars using a single-phone scan, aimed at rendering realistic avatars that are consistent across different lighting conditions, identities, and expressions. The proposed approach contributes to the domain of 3D graphics and neural rendering by advancing the methodologies involved in avatar creation with complex lighting dynamics and reduced reliance on extensive capture systems.

The primary innovation of this research lies in its utilization of 3D Gaussians for geometric representation and learnable radiance transfer for appearance modeling. Unlike traditional rendering methods that decompose lighting parameters for diffuse and specular components via parametric reflectance, the paper presents an efficient model trained on hundreds of multi-view human scans. This model minimizes the gap between conventional studio-quality avatars and those generated from minimal input data, such as a cellphone scan, achieving notable improvements in real-time rendering fidelity.

Key Contributions and Methodology

  1. 3D Gaussian Representation:
    • The paper demonstrates the use of 3D Gaussians to handle the intricate geometry of human heads efficiently. This method facilitates a high degree of detail without resorting to computationally expensive operations typical of traditional mesh or voxel-based models.
  2. Learnable Radiance Transfer:
    • A critical aspect of the proposed framework is the radiance transfer function, which is directly learned in an end-to-end manner to account for global light transport. This makes the model adept at handling multiple light bounces and complex materials, such as skin and hair, which exhibit significant scattering and reflectance properties.
  3. Universal Relightable Prior:
    • The research introduces a universal avatar model that generalizes across identities by capturing shared characteristics through multi-identity training. This model enhances personalization by combining large-scale data with personalized finetuning for new identities using inverse rendering techniques.
  4. High-Quality Tracking and Albedo Estimation:
    • High-resolution geometric tracking and sophisticated albedo estimation support the model in maintaining visual accuracy and detail when repurposed for personalized avatars. The inclusion of such detailed preprocessing steps plays a significant role in the final relighting and rendering quality.

Experimental Evaluations and Results

The experimental setup includes meticulously captured studio and phone scan datasets, enabling quantitative and qualitative evaluation of relighting accuracy and rendering performance. URAvatar significantly outperforms prior approaches, such as the FLARE method, in metrics like Mean Absolute Error and LPIPS, underscoring the efficacy of its innovative learning components.

Additionally, ablation studies highlight the improvements brought by specific architectural choices, such as the unified specular visibility decoder for authentic eye reflections and the use of identity-conditioned biases for detailed expression modeling.

Implications and Future Work

The practical implications of this research pave the way for more accessible virtual communication technologies, where users can effortlessly create realistic avatars from readily available hardware like smartphones. The theoretical advancements also suggest potential improvements in the understanding of light physics and appearance modeling in neural avatars, posing exciting possibilities for future extensions in real-time 3D graphics.

Potential future directions could explore reducing the computational burden of personalization, further generalizing the lighting model to include dynamic environmental changes, or extending avatar dynamics to cover full-body renditions with similar relightable characteristics.

In conclusion, the paper presents meaningful advancements in photorealistic avatar creation, offering a promising vision for integrating realistic digital selves into immersive communication platforms.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews