Unsupervised High-Resolution Portrait Gaze Correction and Animation (2207.00256v1)

Published 1 Jul 2022 in cs.CV and cs.GR

Abstract: This paper proposes a gaze correction and animation method for high-resolution, unconstrained portrait images, which can be trained without the gaze angle and the head pose annotations. Common gaze-correction methods usually require annotating training data with precise gaze, and head pose information. Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels. To address this issue, we first create two new portrait datasets: CelebGaze and high-resolution CelebHQGaze. Second, we formulate the gaze correction task as an image inpainting problem, addressed using a Gaze Correction Module (GCM) and a Gaze Animation Module (GAM). Moreover, we propose an unsupervised training strategy, i.e., Synthesis-As-Training, to learn the correlation between the eye region features and the gaze angle. As a result, we can use the learned latent space for gaze animation with semantic interpolation in this space. Moreover, to alleviate both the memory and the computational costs in the training and the inference stage, we propose a Coarse-to-Fine Module (CFM) integrated with GCM and GAM. Extensive experiments validate the effectiveness of our method for both the gaze correction and the gaze animation tasks in both low and high-resolution face datasets in the wild and demonstrate the superiority of our method with respect to the state of the arts. Code is available at https://github.com/zhangqianhui/GazeAnimationV2

Citations (2)

View on Semantic Scholar

Summary

The paper introduces an unsupervised framework that reformulates gaze correction as an image inpainting task using dedicated Gaze Correction and Animation Modules.
It leverages two newly created datasets, CelebGaze and CelebHQGaze, to enable high-resolution portrait analysis without manual gaze annotations.
A Coarse-to-Fine Module boosts efficiency and quality, achieving an FID score of 56.37 on the CelebGaze dataset compared to state-of-the-art methods.

Analysis of Unsupervised High-Resolution Portrait Gaze Correction and Animation

The paper "Unsupervised High-Resolution Portrait Gaze Correction and Animation" presents an innovative approach to solving the complex problem of gaze correction and animation in high-resolution portrait images using machine learning techniques without supervised data annotation. This research addresses key challenges in prior gaze-correction methodologies by introducing unsupervised paradigms coupled with novel data processing strategies.

Core Contributions

The paper makes several methodological advances:

Unsupervised Framework: A comprehensive framework is proposed to handle high-resolution, unconstrained portraits for gaze correction and animation without requiring annotated gaze or head pose data.
Dataset Creation: Two new datasets, CelebGaze and CelebHQGaze, are introduced. They facilitate unsupervised learning by providing images characterized by varying resolutions and natural, "in-the-wild" conditions. CelebGaze comprises low-resolution $256 \times 256$ images, while CelebHQGaze offers high-resolution portraits at $512 \times 512$ .
Formulation as Image Inpainting: The paper reformulates gaze correction as an image inpainting problem and introduces two primary modules: the Gaze Correction Module (GCM) and the Gaze Animation Module (GAM). These modules rely on a new unsupervised training strategy termed "Synthesis-As-Training," effectively bridging the eye region features with expected gaze angles to perform gaze animation through latent space interpolation.
Coarse-to-Fine Module (CFM): An additional contribution is the Coarse-to-Fine Module, which optimizes memory usage and computational costs during training and inference phases without compromising the generation quality of high-resolution images.

Results and Implications

The authors conducted extensive experiments validating the superiority of their unsupervised method across multiple benchmarks. Numerical results such as the Frechet Inception Distance (FID) score emphasize the model’s advantage in generating high-quality corrected gaze outputs, significantly outperforming state-of-the-art models in these benchmarks. For instance, the model achieves an FID score of 56.37 for the CelebGaze dataset, showcasing an improved visual quality of generated eye regions.

Implications on Practical and Theoretical Frontiers

Practically, the method has far-reaching implications for several domains:

Photography and Film: Automatic correction for undesired gaze patterns in photos and videos without requiring extensive manual editing.
Video Conferencing: Enhancing user engagement by ensuring perceived eye contact, critical in remote communication environments.

From a theoretical perspective, the adoption of unsupervised techniques in such meticulous tasks suggests new directions where similar models could be applied to other facial feature transformations or animations. Furthermore, the concept of modeling image transformation tasks within an unsupervised inpainting framework may inspire further studies in related computer vision challenges.

Future Directions

The prospects for future work revolve around several axes:

Extending the Framework: Consideration of other facial attributes in multi-task unsupervised frameworks could add holistic elements to the system.
Real-Time Applications: Efforts toward optimizing this method for real-time applications will expand its usability into video formats and live webcam environments.

In conclusion, the presented paper offers substantial contributions to unsupervised learning methodologies in computer vision, effectively demonstrating how gaze correction and animation can be achieved with minimal manual labeling. This research may catalyze further exploration of unsupervised techniques in high-fidelity image manipulation, including but not limited to gaze correction.

PDF Markdown

Related Papers

GitHub

GitHub - zhangqianhui/GazeAnimationV2: Unsupervised High-Resolution Portrait Gaze Correction and Animation (TIP 2022) (197 stars)

Tweets

https://twitter.com/tombaranowicz/status/1798999390310539570