- The paper introduces a novel implicit-keypoint-based framework that enhances both the efficiency and control of portrait animation.
- It employs a mixed training strategy with 69 million frames and improved network architectures to achieve superior self- and cross-reenactment performance.
- The inclusion of stitching and MLP-based retargeting modules enables precise control of facial features, advancing realism in real-time applications.
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
The paper "LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control" by Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, et al. introduces an innovative framework for animating static portrait images, prioritizing both realism and computational efficiency. The proposed method diverges from mainstream diffusion-based approaches, instead extending the capabilities of the implicit-keypoint-based framework. This paper makes significant strides in enhancing the generalization, controllability, and efficiency of portrait animation systems.
Key Contributions
The core contributions of the paper include:
- Implicit-Keypoint-Based Framework: Leveraging compact implicit keypoints as the motion representation balance computational efficiency and precise control.
- Scalable Training Data: Utilizing a large-scale dataset of approximately 69 million high-quality frames and adopting a mixed image-video training strategy.
- Network Architecture Improvements: Enhancing the network components and proposing improved motion transformation and optimization objectives.
- Stitching and Retargeting Modules: Introducing low-overhead modules for stitching and precise control of eye and lip movements.
Methodology
The paper's methodology is rooted in several impactful enhancements to the traditional implicit-keypoint-based framework:
- Data Curation and Mixed Training:
- The authors curated a vast and diverse training dataset comprising public video datasets, proprietary 4K resolution portrait clips, and styled portrait images.
- A novel mixed training strategy allows the model to leverage both static images and dynamic videos, enhancing generalization capabilities to various portrait styles.
- Network Upgrades:
- Integration of the canonical implicit keypoint detector, head pose estimation, and expression deformation networks into a unified model using ConvNeXt-V2-Tiny as a backbone.
- Incorporation of SPADE Decoder for the generator to enhance animated image quality and resolution.
- Scalable Motion Transformation:
- Inclusion of a scaling factor in motion transformation, balancing the flexibility and stability of expression deformations.
- Landmark-Guided Optimization:
- Introduction of a landmark-guided loss to refine the learning of implicit keypoints, focusing particularly on subtle facial movements like eye gaze adjustments.
- Cascaded Loss Terms:
- Implementation of multi-region perceptual and GAN losses, alongside a face-id loss and the landmark-guided loss to improve both identity preservation and animation quality.
Stitching and Retargeting
The framework includes sophisticated modules for stitching and retargeting that allow for enhanced controllability with minimal computational overhead:
- Stitching Module:
- The stitching module mitigates pixel misalignment, enabling accurate reconstruction of the animated region onto the original image space.
- Eyes and Lip Retargeting:
- Two MLP-based modules allow controlling the extent of eye and lip movements independently, promoting realistic and expressive animations.
Experimental Results
Self-Reenactment:
- The model exhibits superior performance in self-reenactment tasks, preserving appearance details and effectively transferring facial motions.
Cross-Reenactment:
- In cross-reenactment scenarios, LivePortrait demonstrates commendable capabilities in maintaining identity and transferring subtle facial expressions, outperforming existing diffusion-based models in efficiency and, in some cases, quality metrics.
Quantitative Metrics:
- The paper details extensive quantitative evaluations where LivePortrait excels across multiple benchmarks, including PSNR, SSIM, LPIPS, FID, AED, and APD.
Implications and Future Work
The practical implications of this work are vast, potentially advancing applications in video conferencing, social media, and entertainment. By achieving real-time performance on a high-end GPU, LivePortrait sets the stage for accessible and efficient portrait animation.
However, the paper acknowledges limitations in handling large pose variations and anticipates further research to improve stability under significant motion conditions.
Conclusions
In summary, "LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control" provides a substantial advancement in portrait animation technology. By innovatively combining implicit-keypoint representations, scalable training practices, and advanced control mechanisms, the authors set a new benchmark for efficiency and quality in portrait animation systems. The research opens avenues for real-time, high-fidelity animation in a variety of practical applications.