3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

Published 15 Apr 2024 in cs.CV and cs.AI | (2404.09819v1)

Abstract: When working with 3D facial data, improving fidelity and avoiding the uncanny valley effect is critically dependent on accurate 3D facial performance capture. Because such methods are expensive and due to the widespread availability of 2D videos, recent methods have focused on how to perform monocular 3D face tracking. However, these methods often fall short in capturing precise facial movements due to limitations in their network architecture, training, and evaluation processes. Addressing these challenges, we propose a novel face tracker, FlowFace, that introduces an innovative 2D alignment network for dense per-vertex alignment. Unlike prior work, FlowFace is trained on high-quality 3D scan annotations rather than weak supervision or synthetic data. Our 3D model fitting module jointly fits a 3D face model from one or many observations, integrating existing neutral shape priors for enhanced identity and expression disentanglement and per-vertex deformations for detailed facial feature reconstruction. Additionally, we propose a novel metric and benchmark for assessing tracking accuracy. Our method exhibits superior performance on both custom and publicly available benchmarks. We further validate the effectiveness of our tracker by generating high-quality 3D data from 2D videos, which leads to performance gains on downstream tasks.

Abstract PDF HTML Upgrade to Chat

References (57)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces FlowFace, a two-stage framework combining a 2D alignment network with 3D model fitting for enhanced 3D face tracking.
It leverages a modified RAFT update module and vision-transformer backbone to predict dense UV-to-image flow with iterative refinement.
The method significantly improves benchmark performance and temporal consistency, showcasing robust tracking on in-the-wild datasets.

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow Overview

The paper presents FlowFace, a novel framework for 3D face tracking using 2D video input. It addresses the challenge of monocular 3D face tracking by introducing a dense, per-vertex alignment method, trained with high-quality 3D scans. The proposed framework emphasizes the shortcomings of prior approaches, such as reliance on sparse landmarks and photometric similarity, and introduces a superior 2D alignment network architecture and novel evaluation metrics.

Methodology

FlowFace: Novel 3D Face Tracker

FlowFace introduces a two-stage pipeline composed of a 2D alignment network and a 3D model fitting module. The 2D alignment network predicts dense UV-to-image flow, avoiding computational constraints common in inverse rendering methods. FlowFace employs a vision-transformer backbone to enhance feature extraction, complemented by high-quality 3D scan annotations for training accuracy. Additionally, FlowFace integrates identity and expression disentanglement via neutral shape priors and per-vertex deformations.

Figure 1: An overview of the proposed 2D alignment network architecture.

The 2D Alignment Network

The 2D alignment network predicts a probabilistic location of each vertex within the face model. With iterative refinement through the RAFT update module, this network utilizes an image feature encoder and UV positional encoding to achieve precise alignment.

Figure 2: An overview of our modified RAFT update module.

3D Model Fitting

The 3D model fitting module optimizes 3D head model parameters across multiple observations using alignment energy minimization. Integration of per-vertex deformations and MICA-derived neutral shape priors enhances the disentanglement of identity and expression components, leading to superior 3D reconstruction accuracy.

Screen-Space Motion Error (SSME)

Introduced as a novel metric, the SSME measures dense face motion in screen space, highlighting FlowFace's capacity for precise motion capture across varying temporal frames. It resolves evaluative deficiencies of prior metrics by incorporating temporal consistency.

Figure 3: SSME_h plotted over frames, indicating temporal stability and tracking consistency.

Experimental Results

Performance on Benchmarks

FlowFace demonstrates significant improvements across the Multiface and FaceScape benchmarks, delivering superior 3D reconstruction and motion tracking with reduced SSME values, indicating enhanced temporal stability. The model's robustness is further validated on the NoW Challenge and additional datasets, affirming its generalization capabilities to in-the-wild images.

Figure 4: Visualization of the motion trajectory error illustrating model accuracy.

Applications in Downstream Tasks

FlowFace's advanced face tracking significantly benefits downstream tasks, such as 3D head avatar synthesis and speech-driven 3D facial animation. The integration of FlowFace in INSTA leads to improved perceptual quality in avatar synthesis, evidenced by lower LPIPS scores. Additionally, the augmentation of facial animation models with FlowFace-generated data results in notable improvements in performance metrics.

Figure 5: Expression transfer leveraging FlowFace-driven tracking data.

Conclusion

FlowFace sets a new standard for 3D face tracking by providing a highly precise, efficient approach to dense alignment and 3D reconstruction from 2D videos. The paper outlines future potential for end-to-end learnable frameworks and large-scale dataset generation, encouraging further research and application across computer graphics fields.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

Summary

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow Overview

Methodology

FlowFace: Novel 3D Face Tracker

The 2D Alignment Network

3D Model Fitting

Screen-Space Motion Error (SSME)

Experimental Results

Performance on Benchmarks

Applications in Downstream Tasks

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (6)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

Summary

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow Overview

Methodology

FlowFace: Novel 3D Face Tracker

The 2D Alignment Network

3D Model Fitting

Screen-Space Motion Error (SSME)

Experimental Results

Performance on Benchmarks

Applications in Downstream Tasks

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (6)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research