Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry (2110.09772v3)

Published 19 Oct 2021 in cs.CV and cs.GR

Abstract: This work studies learning from a synergy process of 3D Morphable Models (3DMM) and 3D facial landmarks to predict complete 3D facial geometry, including 3D alignment, face orientation, and 3D face modeling. Our synergy process leverages a representation cycle for 3DMM parameters and 3D landmarks. 3D landmarks can be extracted and refined from face meshes built by 3DMM parameters. We next reverse the representation direction and show that predicting 3DMM parameters from sparse 3D landmarks improves the information flow. Together we create a synergy process that utilizes the relation between 3D landmarks and 3DMM parameters, and they collaboratively contribute to better performance. We extensively validate our contribution on full tasks of facial geometry prediction and show our superior and robust performance on these tasks for various scenarios. Particularly, we adopt only simple and widely-used network operations to attain fast and accurate facial geometry prediction. Codes and data: https://choyingw.github.io/works/SynergyNet/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. Nvidia maxine cloud-ai video-streaming platform. https://developer.nvidia.com/maxine?ncid=so-yout-26905#cid=dl13_so-yout_en-us.
  2. The florence 2d/3d hybrid face dataset. In Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding, J-HGBU ’11. ACM, 2011.
  3. Faster than real-time facial alignment: A 3d spatial transformer network approach in unconstrained poses. In ICCV, pages 3980–3989, 2017.
  4. How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In ICCV, 2017.
  5. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics (TVCG), 20(3):413–425, 2013.
  6. Face alignment by explicit shape regression. International Journal of Computer Vision (IJCV), 107(2):177–190, 2014.
  7. A vector-based representation to enhance head pose estimation. In WACV, 2021.
  8. Deep, landmark-free fame: Face alignment, modeling, and expression estimation. International Journal of Computer Vision (IJCV).
  9. A comprehensive performance evaluation of deformable face tracking “in-the-wild”. International Journal of Computer Vision (IJCV), 2018.
  10. Rankpose: Learning generalised feature with rank supervision for head pose estimation. In BMVC, 2020.
  11. Masked face recognition challenge: The insightface track report. In ICCV Workshops, 2021.
  12. The menpo benchmark for multi-pose 2d and 3d facial landmark localisation and tracking. International Journal of Computer Vision (IJCV), 127(6-7):599–624, 2019.
  13. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In CVPR Workshops, pages 0–0, 2019.
  14. Style aggregated network for facial landmark detection. In CVPR, pages 379–388, 2018.
  15. Teacher supervises students how to learn from partially labeled images for facial landmark detection. In CVPR, pages 783–792, 2019.
  16. Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (TOG), 2021.
  17. Joint 3d face reconstruction and dense alignment with position map regression network. In ECCV, pages 534–551, 2018.
  18. Wing loss for robust facial landmark localisation with convolutional neural networks. In CVPR, pages 2235–2245, 2018.
  19. Computer graphics: principles and practice, volume 12110. Addison-Wesley Professional, 1996.
  20. Towards fast, accurate and stable 3d dense face alignment. In ECCV, 2020.
  21. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  22. Quatnet: Quaternion-based head pose estimation with multiregression loss. IEEE Transactions on Multimedia (TMM), 2018.
  23. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
  24. Large pose 3d face reconstruction from a single image via direct volumetric cnn regression. In ICCV, Oct 2017.
  25. Face recognition based on facial landmark detection. In 2017 10th Biomedical Engineering International Conference (BMEiCON), pages 1–4. IEEE, 2017.
  26. One millisecond face alignment with an ensemble of regression trees. In CVPR, pages 1867–1874, 2014.
  27. Deep video portraits. ACM Transactions on Graphics (TOG), 37(4):1–14, 2018.
  28. From real-time attention assessment to “with-me-ness” in human-robot interaction. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 157–164. Ieee, 2016.
  29. A prior-less method for multi-face tracking in unconstrained videos. In CVPR, 2018.
  30. Robust facial landmark tracking via cascade regression. Pattern Recognition (PR), 66:53–62, 2017.
  31. Dense face alignment. In ICCV, pages 1619–1628, 2017.
  32. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In CVPR, pages 3317–3326, 2017.
  33. Rethinking pseudo-lidar representation. ECCV, 2020.
  34. Peter M. Roth Martin Koestinger, Paul Wohlhart and Horst Bischof. Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
  35. Deep head pose: Gaze-direction estimation in multimodal video. IEEE Transactions on Multimedia (TMM), 17(11):2094–2107, 2015.
  36. Robot reading human gaze: Why eye tracking is better than head tracking for human-robot collaboration. In IROS, pages 5048–5054. IEEE, 2016.
  37. A 3d face model for pose and illumination invariant face recognition. In IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 296–301. IEEE, 2009.
  38. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, pages 652–660, 2017.
  39. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, pages 5099–5108, 2017.
  40. Fine-grained head pose estimation without keypoints. In CVPR Workshops, pages 2074–2083, 2018.
  41. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, pages 4510–4520, 2018.
  42. Learning to regress 3d face shape and expression from an image without 3d supervision. In CVPR, pages 7763–7772, 2019.
  43. Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In ECCV, 2020.
  44. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In CVPR Workshops, pages 50–58, 2015.
  45. How effective are landmarks and their geometry for face recognition? Computer vision and image understanding (CVIU), 102(2):117–133, 2006.
  46. Deep evolutionary 3d diffusion heat maps for large-pose face alignment. In BMVC, page 256, 2018.
  47. Fml: Face model learning from videos. In CVPR, pages 10812–10822, 2019.
  48. Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz. In CVPR, pages 2549–2559, 2018.
  49. 3d face reconstruction from a single image assisted by 2d face images in the wild. IEEE Transactions on Multimedia (TMM), 2020.
  50. Regressing robust and discriminative 3d morphable models with a very deep neural network. In CVPR, pages 5163–5172, 2017.
  51. Extreme 3d face reconstruction: Seeing through occlusions. In CVPR, pages 3935–3944, 2018.
  52. Human computer interaction with head pose, eye gaze and body gestures. In FG, pages 789–789. IEEE, 2018.
  53. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR, 2018.
  54. One-shot free-view neural talking-head synthesis for video conferencing. CVPR, 2021.
  55. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In CVPR, pages 8445–8453, 2019.
  56. Self-supervised 3d face reconstruction via conditional estimation. In CVPR, 2021.
  57. Occlusion pattern-based dictionary for robust face recognition. In ICME, 2016.
  58. Occluded face recognition using low-rank regression with generalized gradient direction. Pattern Recognition (PR), 2018.
  59. Inspacetype: Reconsider space type in indoor monocular depth estimation. arXiv preprint arXiv:2309.13516, 2023.
  60. Cross-modal perceptionist: Can face geometry be gleaned from voices? In CVPR, pages 10452–10461, 2022.
  61. Geometry-aware instance segmentation with disparity maps. arXiv preprint arXiv:2006.07802, 2020.
  62. Efficient multi-domain dictionary learning with gans. In GlobalSIP, 2019.
  63. Scene completeness-aware lidar depth completion for driving scenario. In ICASSP. IEEE, 2021.
  64. Toward practical monocular indoor depth estimation. In CVPR, 2022.
  65. Meta-optimization for higher model generalizability in single-image depth prediction. arXiv preprint arXiv:2305.07269, 2023.
  66. Mvf-net: Multi-view 3d face morphable model regression. In CVPR, pages 959–968, 2019.
  67. Look at boundary: A boundary-aware face alignment algorithm. In CVPR, pages 2129–2138, 2018.
  68. Grid-gcn for fast and scalable point cloud learning. In CVPR, pages 5661–5670, 2020.
  69. Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In CVPR, pages 1087–1096, 2019.
  70. Ssr-net: A compact soft stagewise regression network for age estimation. In IJCAI, 2018.
  71. The face of art: landmark detection and geometric style in portraits. ACM Transactions on Graphics (TOG), 38(4):1–15, 2019.
  72. Learning dense facial correspondences in unconstrained images. In ICCV, pages 4723–4732, 2017.
  73. Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955, 2020.
  74. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters (SPL), 23(10):1499–1503, 2016.
  75. Deep rgb-d canonical correlation analysis for sparse depth completion. In NeurIPS, pages 5331–5341, 2019.
  76. Face alignment across large poses: A 3d solution. In CVPR, pages 146–155, 2016.
  77. Face alignment in full pose range: A 3d total solution. IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2019.
  78. Parallelized stochastic gradient descent. In NeurIPS, pages 2595–2603, 2010.
  79. State of the art on monocular 3d face reconstruction, tracking, and applications. In Computer Graphics Forum, volume 37, pages 523–550. Wiley Online Library, 2018.
Citations (54)

Summary

  • The paper outlines essential author guidelines for submitting manuscripts to 3DV conference proceedings, covering critical requirements for acceptance.
  • Strict rules mandate original work, prohibit concurrent submissions elsewhere, and enforce a maximum eight-page limit for manuscripts.
  • Authors must maintain anonymity during the blind review process and adhere to detailed formatting instructions for consistency.

Overview of 3DV Proceedings Author Guidelines

The paper provides detailed guidelines for authors preparing manuscripts for submission to the 3DV conference proceedings. The focus is on ensuring adherence to formatting and submission protocols to facilitate the review process and maintain consistency across published works.

The instructions commence with an overview of essential language requirements, emphasizing the use of English for all submissions. A section on dual submission clearly states that manuscripts should not have been published elsewhere in substantially similar form and prohibits concurrent submissions to other venues that overlap significantly with the material presented in the 3DV submission. Violation of this dual submission rule results in outright rejection, underscoring the importance of originality and exclusivity of the presented work. Authors are advised to cite any potentially overlapping works while arguing the novelty of their submission.

A critical aspect of the guidelines relates to paper length. Authors must restrict their manuscripts to eight pages, excluding references, to ensure uniformity and feasibility during the review process. It is explicitly mentioned that overlength papers will not be considered for review, highlighting the strict adherence to page limits as a non-negotiable requirement.

The paper elucidates the blind review process, advising authors on maintaining anonymity without compromising the integrity of citations. While self-citations are allowed, the text must avoid pronouns like "my" or "our" to preserve author anonymity. Technical reports related to the submission should be included as supplementary material but not relied upon by reviewers for understanding the core content of the submission.

Formatting instructions are extensive, detailing type-style, font usage, margins, and layout configurations. The use of rulers is an interesting addition, designed to aid reviewers in referencing specific lines during the review, though not included in the final camera-ready copy. Special attention is given to figures and illustrations, ensuring clarity and printability in grayscale.

The conclusion of the paper reiterates the necessity of including a signed IEEE copyright release form with the final submission, ensuring compliance with publication requirements.

Implications

The meticulous guidelines for 3DV proceedings underline the conference's commitment to high-quality publications and a disciplined review process. The instructions ensure that submissions can be seamlessly integrated into a standardized format, promoting consistency and accessibility in the archival process. For researchers, these guidelines mandate an awareness of not just the technical content but also an adherence to strict formatting and submission protocols, emphasizing the importance of presentation alongside innovation.

As conferences increasingly adopt rigorous submission and formatting protocols, these practices could influence broader academic publishing standards, potentially leading to more uniformity and ease in cross-referencing and reviewing scholarly work. The attention to detail and emphasis on clean presentation can serve as a benchmark for other conferences, fostering an environment where the format complements the quality of research. Future developments in AI could streamline such processes further, perhaps offering automatic formatting checks or real-time collaboration tools that integrate these guidelines directly into manuscript preparation platforms. Such advancements would promote efficiency, allowing researchers to focus more on innovative content creation.

Youtube Logo Streamline Icon: https://streamlinehq.com