Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instant Multi-View Head Capture through Learnable Registration (2306.07437v1)

Published 12 Jun 2023 in cs.CV

Abstract: Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow, and commonly address the problem in two separate steps; multi-view stereo (MVS) reconstruction followed by non-rigid registration. To simplify this process, we introduce TEMPEH (Towards Estimation of 3D Meshes from Performances of Expressive Heads) to directly infer 3D heads in dense correspondence from calibrated multi-view images. Registering datasets of 3D scans typically requires manual parameter tuning to find the right balance between accurately fitting the scans surfaces and being robust to scanning noise and outliers. Instead, we propose to jointly register a 3D head dataset while training TEMPEH. Specifically, during training we minimize a geometric loss commonly used for surface registration, effectively leveraging TEMPEH as a regularizer. Our multi-view head inference builds on a volumetric feature representation that samples and fuses features from each view using camera calibration information. To account for partial occlusions and a large capture volume that enables head movements, we use view- and surface-aware feature fusion, and a spatial transformer-based head localization module, respectively. We use raw MVS scans as supervision during training, but, once trained, TEMPEH directly predicts 3D heads in dense correspondence without requiring scans. Predicting one head takes about 0.3 seconds with a median reconstruction error of 0.26 mm, 64% lower than the current state-of-the-art. This enables the efficient capture of large datasets containing multiple people and diverse facial motions. Code, model, and data are publicly available at https://tempeh.is.tue.mpg.de.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Spatiotemporal modeling for efficient registration of dynamic 3D faces. In International Conference on 3D Vision (3DV), pages 371–380, 2018.
  2. Inverse rendering of faces with a 3D morphable model. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(5):1080–1093, 2013.
  3. The digital Emily project: photoreal facial modeling and animation. In SIGGRAPH Courses, pages 12:1–12:15, 2009.
  4. Expression invariant 3D face recognition with a morphable model. In International Conference on Automatic Face & Gesture Recognition (FG), pages 1–6, 2008.
  5. Shape my face: Registering 3D face scans by surface-to-surface translation. International Journal of Computer Vision (IJCV), 129(9):2680–2713, 2021.
  6. Deep facial non-rigid multi-view stereo. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5849–5859, 2020.
  7. Anil Bas and William A. P. Smith. What does 2D geometric information really tell us about 3D face shape? International Journal of Computer Vision (IJCV), 127(10):1455–1473, 2019.
  8. Fitting a 3D morphable model to edges: A comparison between hard and soft correspondences. In Asian Conference on Computer Vision Workshops, pages 377–391, 2017.
  9. High-quality passive facial performance capture using anchor frames. Transactions on Graphics, (Proc. SIGGRAPH), 30(4):75, 2011.
  10. A method for registration of 3-D shapes. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 14(2):239–256, 1992.
  11. A morphable model for the synthesis of 3D faces. In SIGGRAPH, pages 187–194, 1999.
  12. A groupwise multilinear correspondence optimization for 3D faces. In International Conference on Computer Vision (ICCV), pages 3604–3612, 2015.
  13. A 3D morphable model learnt from 10,000 faces. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5543–5552, 2016.
  14. High resolution passive facial performance capture. Transactions on Graphics (TOG), 29(4), 2010.
  15. FaceWarehouse: A 3D facial expression database for visual computing. Transactions on Visualization and Computer Graphics (TVCG), 20(3):413–425, 2014.
  16. A 3D morphable model of craniofacial shape and texture variation. In International Conference on Computer Vision (ICCV), pages 3104–3112, 2017.
  17. Statistical modeling of craniofacial shape and texture. International Journal of Computer Vision (IJCV), 128(2):547–571, 2020.
  18. Emotion driven monocular face capture and animation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 20311–20322, 2022.
  19. Trimesh. https://trimsh.org/, 2019.
  20. Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set. In Conference on Computer Vision and Pattern Recognition Workshops (CVPR-W), pages 285–295, 2019.
  21. Multi-view 3D face reconstruction with deep recurrent neural networks. Image and Vision Computing (IVC), 80:80–91, 2018.
  22. End-to-end 3D face reconstruction with deep neural networks. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5908–5917, 2017.
  23. 3D morphable face models - past, present and future. Transactions on Graphics (TOG), 39(5), 2020.
  24. Towards racially unbiased skin tone estimation via scene disambiguation. In European Conference on Computer Vision (ECCV), volume 13673, pages 72–90, 2022.
  25. Learning an animatable detailed 3D face model from in-the-wild images. Transactions on Graphics, (Proc. SIGGRAPH), 40(4):88:1–88:13, 2021.
  26. Joint 3D face reconstruction and dense alignment with position map regression network. In European Conference on Computer Vision (ECCV), volume 11218, pages 557–574, 2018.
  27. Visual speech-aware perceptual 3D facial expression reconstruction from videos, 2022.
  28. Kaolin: A pytorch library for accelerating 3d deep learning research. https://github.com/NVIDIAGameWorks/kaolin, 2022.
  29. Multi-view stereo on consistent face topology. Computer Graphics Forum (CGF), 36(2):295–309, 2017.
  30. Reconstruction of personalized 3D face rigs from monocular video. Transactions on Graphics (TOG), 35(3):28, 2016.
  31. GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1155–1164, 2019.
  32. Statistical methods for tomographic image reconstruction. Proceedings of the 46th Session of the International Statistical Institute, Bulletin of the ISI, 52, 1987.
  33. Unsupervised training for 3D morphable model regression. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 8377–8386, 2018.
  34. Multiview face capture using polarized spherical gradient illumination. Transactions on Graphics (TOG), 30(6):129, 2011.
  35. Multi-view stereo revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2402–2409, 2006.
  36. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2492–2501, 2020.
  37. DenseReg: Fully convolutional dense shape regression in-the-wild. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 6799–6808, 2017.
  38. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  39. DPSNet: End-to-end deep plane sweep stereo. In International Conference on Learning Representations (ICLR), 2019.
  40. Learnable triangulation of human pose. In International Conference on Computer Vision (ICCV), pages 7717–7726, 2019.
  41. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In International Conference on Computer Vision (ICCV), pages 1031–1039, 2017.
  42. Spatial transformer networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 2017–2025, 2015.
  43. Light-weight multi-view topology consistent facial geometry and reflectance capture. In Advances in Computer Graphics (CGI), volume 13002, pages 139–150, 2021.
  44. Learning a multi-view stereo machine. In Advances in Neural Information Processing Systems (NeurIPS), pages 365–376, 2017.
  45. Ira Kemelmacher-Shlizerman. Internet based morphable model. In International Conference on Computer Vision (ICCV), pages 3256–3263, 2013.
  46. Face reconstruction in the wild. In International Conference on Computer Vision (ICCV), pages 1746–1753, 2011.
  47. InverseFaceNet: Deep monocular inverse face rendering. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 4625–4634, 2018.
  48. Robust single-view geometry and motion reconstruction. Transactions on Graphics (TOG), 28(5):175, 2009.
  49. Learning a model of facial shape and expression from 4D scans. Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017.
  50. Topologically consistent multi-view face inference using volumetric sampling. In International Conference on Computer Vision (ICCV), pages 3824–3834, 2021.
  51. Learning implicit functions for topology-varying dense 3D shape correspondence. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  52. 3D face modeling from diverse raw scan data. In International Conference on Computer Vision (ICCV), pages 9407–9417, 2019.
  53. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2019.
  54. Rapid acquisition of specular and diffuse normal maps from polarized spherical gradient illumination. In Proceedings of the Eurographics Symposium on Rendering Techniques, pages 183–194, 2007.
  55. DAD-3DHeads: A large-scale dense, accurate and diverse dataset for 3D head alignment from a single image. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 20910–20920, 2022.
  56. Survey on 3D face reconstruction from uncalibrated images. Computer Science Review, 40:1–35, 2021.
  57. Using facial symmetry to handle pose variations in real-world 3D face recognition. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(10):1938–1951, 2011.
  58. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
  59. Towards a complete 3D morphable model of the human head. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(11):4142–4160, 2021.
  60. Multi-view 3D face reconstruction in the wild using siamese networks. In International Conference on Computer Vision Workshops (ICCV-W), pages 3096–3100, 2019.
  61. 3D face reconstruction by learning from synthetic data. In International Conference on 3D Vision (3DV), pages 460–469, 2016.
  62. Single-shot high-quality facial geometry and skin appearance capture. Transactions on Graphics (TOG), 39(4):81, 2020.
  63. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 234–241, 2015.
  64. Unconstrained 3D face reconstruction. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2606–2615, 2015.
  65. Fully automatic expression-invariant face correspondence. Machine Vision and Applications, 25(4):859–879, 2014.
  66. Learning to regress 3D face shape and expression from an image without 3D supervision. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 7763–7772, 2019.
  67. Meet Mike: Epic avatars. In SIGGRAPH, 2017.
  68. Self-supervised monocular 3D face reconstruction by occlusion-aware multi-view geometry consistency. In European Conference on Computer Vision (ECCV), volume 12360, pages 53–70, 2020.
  69. DeepVoxels: Learning persistent 3D feature embeddings. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2437–2446, 2019.
  70. Total moving face reconstruction. In European Conference on Computer Vision (ECCV), pages 796–812, 2014.
  71. FML: Face model learning from videos. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 10812–10822, 2019.
  72. Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2549–2559, 2018.
  73. MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In International Conference on Computer Vision (ICCV), pages 1274–1283, 2017.
  74. Face2Face: Real-time face capture and reenactment of RGB videos. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 2387–2395, 2016.
  75. Regressing robust and discriminative 3D morphable models with a very deep neural network. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1599–1608, 2017.
  76. Multi-view stereo in the deep learning era: A comprehensive review. Displays, 70:102102, 2021.
  77. 3D dense face alignment via graph convolution networks. arXiv preprint arXiv:1904.05562, 2019.
  78. 3D face reconstruction with dense landmarks. In European Conference on Computer Vision (ECCV), pages 160–177, 2022.
  79. Deep incremental learning for efficient high-fidelity face tracking. Transactions on Graphics (TOG), 37(6):234, 2018.
  80. MVF-Net: Multi-view 3D face morphable model regression. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 959–968, 2019.
  81. FaceScape: A large-scale high quality 3D face dataset and detailed riggable 3d face prediction. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 598–607, 2020.
  82. MVSNet: Depth inference for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), volume 11212, pages 785–801, 2018.
  83. Functional faces: Groupwise dense correspondence using functional maps. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5033–5041, 2016.
  84. ImFace: A nonlinear 3D morphable face model with implicit neural representations. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 20311–20320, 2022.
  85. Deep implicit templates for 3D shape representation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1429–1439, 2021.
  86. On the continuity of rotation representations in neural networks. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5745–5753, 2019.
  87. Towards metrical reconstruction of human faces. In European Conference on Computer Vision (ECCV), pages 250–269, 2022.
  88. State of the art on monocular 3D face reconstruction, tracking, and applications. Computer Graphics Forum (CGF), 37(2):523–550, 2018.
Citations (18)

Summary

We haven't generated a summary for this paper yet.