MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation (2405.05806v3)

Published 9 May 2024 in cs.CV

Abstract: Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they usually suffer from overfitting issues. The learned identity tends to entangle with irrelevant information, resulting in unsatisfied text controllability, especially on faces. In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability. Specifically, MasterWeaver adopts an encoder to extract identity features and steers the image generation through additional introduced cross attention. To improve editability while maintaining identity fidelity, we propose an editing direction loss for training, which aligns the editing directions of our MasterWeaver with those of the original T2I model. Additionally, a face-augmented dataset is constructed to facilitate disentangled identity learning, and further improve the editability. Extensive experiments demonstrate that our MasterWeaver can not only generate personalized images with faithful identity, but also exhibit superiority in text controllability. Our code can be found at https://github.com/csyxwei/MasterWeaver.

PDF Abstract

Overview of MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

The paper introduces MasterWeaver, a novel approach for personalized text-to-image (T2I) generation that simultaneously addresses identity fidelity and editability challenges. By leveraging recent advancements in diffusion models, MasterWeaver operates in a tuning-free manner, ensuring faithful identity reproduction alongside flexible text-controllability.

The authors identify the limitation of existing methods, which often encounter overfitting, particularly when tasked with maintaining identity fidelity and allowing extensive edits. To counteract this, MasterWeaver incorporates an encoder specifically designed to extract identity features and incorporate them into the diffusion model through additional cross attention layers. Additionally, the introduction of an editing direction loss and a curated face-augmented dataset refines the model's ability to disentangle identity from attribute details effectively.

Main Contributions

Identity Preservation and Editability Improvement: MasterWeaver demonstrates substantial improvements in maintaining identity fidelity while enhancing editability. This is mainly achieved through two pivotal components:
- Dual Cross Attention: This approach enhances the inclusion of identity features alongside textual encoding, allowing for a more accurate reproduction of the input identity.
- Editing Direction Loss: MasterWeaver uses this novel loss function to align the model's editing directions with those captured by the original T2I model. The alignment is computed in the feature space of the diffusion model, ensuring semantic editing is attributed correctly to text prompts, but remains independent of identity features.
Face-Augmented Dataset: The dataset construction leverages techniques from face editing research to produce variations in attributes, such as hair color or style, within the same identity set. This controlled augmentation encourages the model to learn robust identity features independent of variable attributes, directly tackling the issue of entanglement that hampers editability.
Efficient Tuning-Free Framework: Unlike many other personalization methods requiring per-identity optimization, MasterWeaver's tuning-free design reduces computational burden and elevates the practicality of the model for real-world applications.

Results and Implications

MasterWeaver's capabilities are supported by extensive empirical evaluations, demonstrating superior performance compared to state-of-the-art techniques in both single and multiple reference image scenarios. The method consistently achieves competitive identity fidelity scores while allowing for more varied and complex text-driven attribute modifications.

The realization of MasterWeaver has significant implications for practical applications where personalized content creation is demanded, such as custom avatars, creative industry outputs, and digital identity protection. Furthermore, by enhancing the decoupling of identity features from contextual modifications, MasterWeaver sets a benchmark for future work aimed at striking a balance between fidelity and flexibility in generative models.

Future Directions

While MasterWeaver presents a significant advance, the authors acknowledge areas for further investigation. Central to this is exploring multi-identity personalization and enhancing attribute-specific text controls, which currently pose challenges due to the coarse granularity of textual descriptors. Additionally, expanding the model's applicability beyond static image generation offers a promising avenue, potentially integrating video or 3D model generation for even richer personalized outputs.

MasterWeaver not only advances the current understanding and capabilities within personalized T2I generation but also paves the way for novel applications and deeper explorations into identity representation within generative frameworks. Overall, it marks a significant contribution to the ongoing development and sophistication of AI-driven creative tools.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yuxiang Wei (40 papers)
Zhilong Ji (31 papers)
Jinfeng Bai (31 papers)
Hongzhi Zhang (33 papers)
Lei Zhang (1689 papers)
Wangmeng Zuo (279 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - csyxwei/MasterWeaver: MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation (Arxiv 2024) (124 stars)

Tweets

https://twitter.com/CSVisionPapers/status/1789042104292847897

YouTube

Show All Videos