InstantAvatar: Efficient 3D Head Reconstruction via Surface Rendering (2308.04868v3)
Abstract: Recent advances in full-head reconstruction have been obtained by optimizing a neural field through differentiable surface or volume rendering to represent a single scene. While these techniques achieve an unprecedented accuracy, they take several minutes, or even hours, due to the expensive optimization process required. In this work, we introduce InstantAvatar, a method that recovers full-head avatars from few images (down to just one) in a few seconds on commodity hardware. In order to speed up the reconstruction process, we propose a system that combines, for the first time, a voxel-grid neural field representation with a surface renderer. Notably, a naive combination of these two techniques leads to unstable optimizations that do not converge to valid solutions. In order to overcome this limitation, we present a novel statistical model that learns a prior distribution over 3D head signed distance functions using a voxel-grid based architecture. The use of this prior model, in combination with other design choices, results into a system that achieves 3D head reconstructions with comparable accuracy as the state-of-the-art with a 100x speed-up.
- Sal: Sign agnostic learning of shapes from raw data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Deep facial non-rigid multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- A method for registration of 3-d shapes. In ACM Transactions on Graphics (TOG), 1992.
- Multilinear wavelets: A statistical shape space for human faces. In Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV), 2014.
- Multi-neus: 3d head portraits from single image with neural implicit functions. arXiv preprint arXiv:2209.04436, 2022.
- Authentic volumetric avatars from a phone scan. ACM Transactions on Graphics (TOG), 2022.
- Sira: Relightable avatars from a single image. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022.
- Realy: Rethinking the evaluation of 3d face reconstruction. In Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV), 2022.
- SIDER: Single-image neural optimization for facial geometric detail recovery. In International Conference on 3D Vision (3DV), 2021.
- Towards high fidelity monocular face reconstruction with rich reflectance using self-supervised learning and ray tracing. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Multi-view 3d face reconstruction with deep recurrent neural networks. Image and Vision Computing, 80:80–91, 2018.
- Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (TOG), 40(4):1–13, 2021.
- Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
- Neural head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Implicit geometric regularization for learning shapes. In Proceedings of Machine Learning and Systems (MLSys). 2020.
- Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099, 2020.
- Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
- Neural lumigraph rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Vox-surf: Voxel-based implicit surface representation. arXiv preprint arXiv:2208.10925, 2022.
- Nerfacc: Efficient sampling accelerates nerfs. arXiv preprint arXiv:2305.04966, 2023.
- Learning a model of facial shape and expression from 4d scans. ACM Transactions on Graphics (TOG), 36(6):194–1, 2017.
- Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics (TOG), 36(6):194:1–194:17, 2017.
- Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Towards high-fidelity 3d face reconstruction from in-the-wild images using graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- AutoInt: Automatic integration for fast neural volume rendering. arXiv preprint arXiv:2012.01714, 2020.
- Bacon: Band-limited coordinate networks for multiscale scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Deep meta functionals for shape representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
- SMPL: A skinned multi-person linear model. ACM Transactions on Graphics (TOG), 34(6):248:1–248:16, 2015.
- Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV), 2020.
- Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989, 2022.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (TOG), 41(4):102:1–102:15, 2022.
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- A 3d face model for pose and illumination invariant face recognition. In Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments, 2009.
- The 2nd 3d face alignment in the wild challenge (3dfaw-video): Dense reconstruction from video. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCV Workshops), 2019.
- Visco grids: Surface reconstruction with viscosity and coarea grids. In Advances in Neural Information Processing Systems (NeurIPS).
- Multi-view 3d face reconstruction in the wild using siamese networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCV Workshops), 2019.
- H3d-net: Few-shot high-fidelity 3d head reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- 3d face reconstruction by learning from synthetic data. In International Conference on 3D Vision (3DV), 2016.
- Learning detailed face reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Metasdf: Meta-learning signed distance functions. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- A morphable face albedo model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
- Learned initializations for optimizing coordinate-based neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2017.
- Extreme 3d face reconstruction: Seeing through occlusions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Regressing robust and discriminative 3d morphable models with a very deep neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Faceverse: a fine-grained and detail-controllable 3d face morphable model from a hybrid dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
- Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Mvf-net: Multi-view 3d face morphable model regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems (NeurIPS), 2020.
- PlenOctrees for real-time rendering of neural radiance fields. In ICCV, 2021.
- Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Im avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 2018.