Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry (2110.09772v3)
Abstract: This work studies learning from a synergy process of 3D Morphable Models (3DMM) and 3D facial landmarks to predict complete 3D facial geometry, including 3D alignment, face orientation, and 3D face modeling. Our synergy process leverages a representation cycle for 3DMM parameters and 3D landmarks. 3D landmarks can be extracted and refined from face meshes built by 3DMM parameters. We next reverse the representation direction and show that predicting 3DMM parameters from sparse 3D landmarks improves the information flow. Together we create a synergy process that utilizes the relation between 3D landmarks and 3DMM parameters, and they collaboratively contribute to better performance. We extensively validate our contribution on full tasks of facial geometry prediction and show our superior and robust performance on these tasks for various scenarios. Particularly, we adopt only simple and widely-used network operations to attain fast and accurate facial geometry prediction. Codes and data: https://choyingw.github.io/works/SynergyNet/
- Nvidia maxine cloud-ai video-streaming platform. https://developer.nvidia.com/maxine?ncid=so-yout-26905#cid=dl13_so-yout_en-us.
- The florence 2d/3d hybrid face dataset. In Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding, J-HGBU ’11. ACM, 2011.
- Faster than real-time facial alignment: A 3d spatial transformer network approach in unconstrained poses. In ICCV, pages 3980–3989, 2017.
- How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In ICCV, 2017.
- Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics (TVCG), 20(3):413–425, 2013.
- Face alignment by explicit shape regression. International Journal of Computer Vision (IJCV), 107(2):177–190, 2014.
- A vector-based representation to enhance head pose estimation. In WACV, 2021.
- Deep, landmark-free fame: Face alignment, modeling, and expression estimation. International Journal of Computer Vision (IJCV).
- A comprehensive performance evaluation of deformable face tracking “in-the-wild”. International Journal of Computer Vision (IJCV), 2018.
- Rankpose: Learning generalised feature with rank supervision for head pose estimation. In BMVC, 2020.
- Masked face recognition challenge: The insightface track report. In ICCV Workshops, 2021.
- The menpo benchmark for multi-pose 2d and 3d facial landmark localisation and tracking. International Journal of Computer Vision (IJCV), 127(6-7):599–624, 2019.
- Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In CVPR Workshops, pages 0–0, 2019.
- Style aggregated network for facial landmark detection. In CVPR, pages 379–388, 2018.
- Teacher supervises students how to learn from partially labeled images for facial landmark detection. In CVPR, pages 783–792, 2019.
- Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (TOG), 2021.
- Joint 3d face reconstruction and dense alignment with position map regression network. In ECCV, pages 534–551, 2018.
- Wing loss for robust facial landmark localisation with convolutional neural networks. In CVPR, pages 2235–2245, 2018.
- Computer graphics: principles and practice, volume 12110. Addison-Wesley Professional, 1996.
- Towards fast, accurate and stable 3d dense face alignment. In ECCV, 2020.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Quatnet: Quaternion-based head pose estimation with multiregression loss. IEEE Transactions on Multimedia (TMM), 2018.
- Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
- Large pose 3d face reconstruction from a single image via direct volumetric cnn regression. In ICCV, Oct 2017.
- Face recognition based on facial landmark detection. In 2017 10th Biomedical Engineering International Conference (BMEiCON), pages 1–4. IEEE, 2017.
- One millisecond face alignment with an ensemble of regression trees. In CVPR, pages 1867–1874, 2014.
- Deep video portraits. ACM Transactions on Graphics (TOG), 37(4):1–14, 2018.
- From real-time attention assessment to “with-me-ness” in human-robot interaction. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 157–164. Ieee, 2016.
- A prior-less method for multi-face tracking in unconstrained videos. In CVPR, 2018.
- Robust facial landmark tracking via cascade regression. Pattern Recognition (PR), 66:53–62, 2017.
- Dense face alignment. In ICCV, pages 1619–1628, 2017.
- A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In CVPR, pages 3317–3326, 2017.
- Rethinking pseudo-lidar representation. ECCV, 2020.
- Peter M. Roth Martin Koestinger, Paul Wohlhart and Horst Bischof. Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
- Deep head pose: Gaze-direction estimation in multimodal video. IEEE Transactions on Multimedia (TMM), 17(11):2094–2107, 2015.
- Robot reading human gaze: Why eye tracking is better than head tracking for human-robot collaboration. In IROS, pages 5048–5054. IEEE, 2016.
- A 3d face model for pose and illumination invariant face recognition. In IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 296–301. IEEE, 2009.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, pages 652–660, 2017.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, pages 5099–5108, 2017.
- Fine-grained head pose estimation without keypoints. In CVPR Workshops, pages 2074–2083, 2018.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, pages 4510–4520, 2018.
- Learning to regress 3d face shape and expression from an image without 3d supervision. In CVPR, pages 7763–7772, 2019.
- Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In ECCV, 2020.
- The first facial landmark tracking in-the-wild challenge: Benchmark and results. In CVPR Workshops, pages 50–58, 2015.
- How effective are landmarks and their geometry for face recognition? Computer vision and image understanding (CVIU), 102(2):117–133, 2006.
- Deep evolutionary 3d diffusion heat maps for large-pose face alignment. In BMVC, page 256, 2018.
- Fml: Face model learning from videos. In CVPR, pages 10812–10822, 2019.
- Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz. In CVPR, pages 2549–2559, 2018.
- 3d face reconstruction from a single image assisted by 2d face images in the wild. IEEE Transactions on Multimedia (TMM), 2020.
- Regressing robust and discriminative 3d morphable models with a very deep neural network. In CVPR, pages 5163–5172, 2017.
- Extreme 3d face reconstruction: Seeing through occlusions. In CVPR, pages 3935–3944, 2018.
- Human computer interaction with head pose, eye gaze and body gestures. In FG, pages 789–789. IEEE, 2018.
- High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR, 2018.
- One-shot free-view neural talking-head synthesis for video conferencing. CVPR, 2021.
- Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In CVPR, pages 8445–8453, 2019.
- Self-supervised 3d face reconstruction via conditional estimation. In CVPR, 2021.
- Occlusion pattern-based dictionary for robust face recognition. In ICME, 2016.
- Occluded face recognition using low-rank regression with generalized gradient direction. Pattern Recognition (PR), 2018.
- Inspacetype: Reconsider space type in indoor monocular depth estimation. arXiv preprint arXiv:2309.13516, 2023.
- Cross-modal perceptionist: Can face geometry be gleaned from voices? In CVPR, pages 10452–10461, 2022.
- Geometry-aware instance segmentation with disparity maps. arXiv preprint arXiv:2006.07802, 2020.
- Efficient multi-domain dictionary learning with gans. In GlobalSIP, 2019.
- Scene completeness-aware lidar depth completion for driving scenario. In ICASSP. IEEE, 2021.
- Toward practical monocular indoor depth estimation. In CVPR, 2022.
- Meta-optimization for higher model generalizability in single-image depth prediction. arXiv preprint arXiv:2305.07269, 2023.
- Mvf-net: Multi-view 3d face morphable model regression. In CVPR, pages 959–968, 2019.
- Look at boundary: A boundary-aware face alignment algorithm. In CVPR, pages 2129–2138, 2018.
- Grid-gcn for fast and scalable point cloud learning. In CVPR, pages 5661–5670, 2020.
- Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In CVPR, pages 1087–1096, 2019.
- Ssr-net: A compact soft stagewise regression network for age estimation. In IJCAI, 2018.
- The face of art: landmark detection and geometric style in portraits. ACM Transactions on Graphics (TOG), 38(4):1–15, 2019.
- Learning dense facial correspondences in unconstrained images. In ICCV, pages 4723–4732, 2017.
- Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955, 2020.
- Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters (SPL), 23(10):1499–1503, 2016.
- Deep rgb-d canonical correlation analysis for sparse depth completion. In NeurIPS, pages 5331–5341, 2019.
- Face alignment across large poses: A 3d solution. In CVPR, pages 146–155, 2016.
- Face alignment in full pose range: A 3d total solution. IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2019.
- Parallelized stochastic gradient descent. In NeurIPS, pages 2595–2603, 2010.
- State of the art on monocular 3d face reconstruction, tracking, and applications. In Computer Graphics Forum, volume 37, pages 523–550. Wiley Online Library, 2018.