PERSE: Personalized 3D Generative Avatars from A Single Portrait (2412.21206v1)

Published 30 Dec 2024 in cs.CV

Abstract: We present PERSE, a method for building an animatable personalized generative avatar from a reference portrait. Our avatar model enables facial attribute editing in a continuous and disentangled latent space to control each facial attribute, while preserving the individual's identity. To achieve this, our method begins by synthesizing large-scale synthetic 2D video datasets, where each video contains consistent changes in the facial expression and viewpoint, combined with a variation in a specific facial attribute from the original input. We propose a novel pipeline to produce high-quality, photorealistic 2D videos with facial attribute editing. Leveraging this synthetic attribute dataset, we present a personalized avatar creation method based on the 3D Gaussian Splatting, learning a continuous and disentangled latent space for intuitive facial attribute manipulation. To enforce smooth transitions in this latent space, we introduce a latent space regularization technique by using interpolated 2D faces as supervision. Compared to previous approaches, we demonstrate that PERSE generates high-quality avatars with interpolated attributes while preserving identity of reference person.

Summary

The paper introduces a novel synthetic dataset generation pipeline that animates facial attributes from a single portrait using text-conditioned inpainting and diffusion models.
It employs 3D Gaussian Splatting and latent space regularization techniques to ensure smooth, photorealistic transitions in avatar facial features.
The approach achieves effective attribute disentanglement and fine-tuning via LoRA adaptations, outperforming conventional 3D avatar generation methods.

Overview of PERSE: A Generative Approach to Personalized 3D Avatar Creation from Portraits

The paper presents "PERSE," a sophisticated methodology designed to generate animatable, personalized 3D avatars from single reference portraits. By allowing continuous and disentangled facial attribute editing while maintaining the core identity, the methodology significantly advances avatar creation for applications in virtual and augmented reality environments, where personalized digital representations are paramount.

PERSE's novelty lies in its pipeline for crafting large-scale synthetic 2D video datasets for avatar training, wherein each video reflects consistent changes in facial features under varying attributes. The synthetic dataset is crucial as it powers the avatar model's ability to manipulate facial attributes intuitively within a continuous latent space. The paper describes generating high-quality video datasets from a single portrait image using advanced techniques, including a combination of pretrained 2D portrait animation and a custom image-to-video generation model.

Key Contributions and Methodological Insights

Synthetic Dataset Generation: PERSE introduces a two-stage processing pipeline to produce synthetic datasets. Initially, portrait images are edited to highlight various facial attributes using text-conditioned inpainting techniques. The edited images are then animated with controlled changes in facial expressions and head poses, thereby compiling a dataset characterized by significant attribute diversity. This stage leverages diffusion-based models and particularly emphasizes generating realistic variations from limited original data.
3D Gaussian Splatting and Latent Space Regularization: The model's backbone integrates 3D Gaussian Splatting, an approach that enhances the photorealistic output by emulating finer details in the avatar structures. The authors propose a latent space regularization method to smooth continuous changes in facial attributes. This is achieved by enforcing constraints using interpolated facial images as pseudo-supervision, which facilitates seamless attribute transitions in the generated avatars.
Attribute Disentanglement: By structuring the latent space around meaningful organization, the model achieves attribute disentanglement whereby each subpart latent vector controls corresponding facial features independently. This architecture enables users to manipulate specific attributes such as hairstyles or facial hair without altering unrelated identity-defining features.
Attribute Transfer and Fine-Tuning: PERSE supports integrating new and unseen attributes into the avatar model through fine-tuning mechanisms, specifically employing Low-Rank Adaptation (LoRA) strategies. This capability permits the system to adapt to dynamic user preferences and emerging trends, facilitating ongoing personalization.

Evaluation and Performance

The authors provide empirical evidence underscoring the advancement of PERSE over existing approaches, particularly in the field of interpolation quality and identity preservation. The evaluation also highlights the distinctiveness of the CLIP-guided latent configuration which further refines attribute representation. Through quantitative analyses (e.g., FID and KID scores) and qualitative assessments, including a comprehensive user paper, PERSE demonstrates its superior performance in generating realistic and controllable avatars vis-à-vis baseline models leveraging diverse 3D representations.

Implications and Future Directions

The PERSE framework holds significant implications for the development of AI-driven personalization technologies within immersive environments. The approach's ability to create lifelike avatars from minimal initial input (a solitary portrait) could revolutionize user interaction in VR/AR applications, gaming, and digital fashion. Moreover, the modular nature of its latent space and the adaptability of its frameworks suggest potential applications in dynamic, user-driven content creation platforms.

Looking forward, the integration of PERSE into real-time avatar systems necessitates considerations of computational efficiency and scalability. Future work might explore optimizing the generation process or devising strategies to incorporate additional environmental or dynamic context features to enhance realism further. Additionally, extending the methodology to accommodate full-body avatars could pave the way for more comprehensive applications across various digital domains.

In conclusion, PERSE represents a seminal step towards more personalized and expressive digital personas, embodying a harmonious blend of sophisticated AI techniques aimed at enriching virtual human representation.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1874090431312781816

Reddit

[2412.21206] PERSE: Personalized 3D Generative Avatars from A Single Portrait (1 point, 0 comments)
[2412.21206] PERSE: Personalized 3D Generative Avatars from A Single Portrait (1 point, 0 comments)