Automatic Face Reenactment (1602.02651v1)

Published 8 Feb 2016 in cs.CV and cs.GR

Abstract: We propose an image-based, facial reenactment system that replaces the face of an actor in an existing target video with the face of a user from a source video, while preserving the original target performance. Our system is fully automatic and does not require a database of source expressions. Instead, it is able to produce convincing reenactment results from a short source video captured with an off-the-shelf camera, such as a webcam, where the user performs arbitrary facial gestures. Our reenactment pipeline is conceived as part image retrieval and part face transfer: The image retrieval is based on temporal clustering of target frames and a novel image matching metric that combines appearance and motion to select candidate frames from the source video, while the face transfer uses a 2D warping strategy that preserves the user's identity. Our system excels in simplicity as it does not rely on a 3D face model, it is robust under head motion and does not require the source and target performance to be similar. We show convincing reenactment results for videos that we recorded ourselves and for low-quality footage taken from the Internet.

Citations (164)

View on Semantic Scholar

Summary

The paper presents an automatic, image-based facial reenactment system that operates without a source expression database and works on low-end devices.
The system employs a three-step process: non-rigid 2D face tracking, a novel appearance and motion-based face matching metric with clustering, and a 2D warping strategy for identity-preserving face transfer.
This approach offers simplicity by avoiding 3D models, demonstrates robustness across lighting and poses, and holds potential applications in video dubbing, VR, and privacy-focused face swapping.

Automatic Face Reenactment: A Comprehensive Overview

Face reenactment is a complex process entailing the substitution of an actor's facial region in a video with a new face while retaining the original performance's intricacies. The paper "Automatic Face Reenactment" presents a pioneering automatic facial reenactment system relying entirely on image-based strategies. The proposed method is particularly notable for its ability to function without any database of source expressions and its capability to reenact performances captured on low-end devices, such as webcams.

Key Contributions and Methodology

The authors introduce an innovative approach that seamlessly integrates image retrieval with face transfer. The system operates in three primary steps:

Face Tracking: Utilizing a non-rigid 2D shape model, the system tracks facial landmarks with commendable precision. This step delivers consistent landmark points that stabilize the framework for subsequent procedures, even amid moderate head movements.
Face Matching: The paper introduces a novel matching metric that accounts for both appearance and motion, enhancing temporal consistency. The target sequence is divided into clusters of similar expressions, facilitating stable matching. Such clustering ensures that matching is robust, even when there are disparities in performances between source and target videos.
Face Transfer: A unique 2D warping strategy is proposed for the face transfer process. This strategy aligns the shape of the user's face to that of the actor, efficiently preserving the user's identity without necessitating a complex 3D model reconstruction. The approach simplifies the process while maintaining sufficient accuracy and realism in the resulting reenactment sequence.

Strong Numerical and Qualitative Results

The approach presented shows a distinct edge in simplicity by eschewing the need for a 3D face model. Moreover, the robustness of this system under different lighting conditions and head poses is demonstrated convincingly through varied test scenarios. Authors showcase the system's capabilities with reenactments using both high-quality and low-quality internet videos. These scenarios, notably including political speeches and popular movie scenes, emphasize the system's flexibility and efficacy in producing realistic outputs.

Implications and Future Directions

From a practical perspective, this system could have notable applications ranging from entertainment to communications, providing tools for video dubbing, virtual reality, and privacy-focused face swapping. The removal of the need for elaborate databases marks a significant advance in accessibility and deployment efficiency.

Theoretically, this approach opens several pathways for future research. The development of more refined metrics for motion similarity and exploring alternative methods for pre-emptive lighting correction could bolster this technology's capabilities. As face modeling grows increasingly critical in various domains, the paper's insights could inform a new generation of more adaptable facial animation techniques.

Conclusion

The paper serves as a detailed guide on developing efficient, user-friendly, and realistic facial reenactment systems. By integrating innovative clustering techniques with intuitive warping strategies, the research not only elucidates the potential of image-based methods but also suggests a paradigm shift in how facial performances might be recreated and utilized across a spectrum of applications. As AI continues to evolve, such advancements represent pivotal contributions to the field, with broad implications for both the industry and academia.