Papers
Topics
Authors
Recent
Search
2000 character limit reached

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Published 12 Aug 2024 in cs.CV | (2408.05939v2)

Abstract: This paper presents UniPortrait, an innovative human image personalization framework that unifies single- and multi-ID customization with high face fidelity, extensive facial editability, free-form input description, and diverse layout generation. UniPortrait consists of only two plug-and-play modules: an ID embedding module and an ID routing module. The ID embedding module extracts versatile editable facial features with a decoupling strategy for each ID and embeds them into the context space of diffusion models. The ID routing module then combines and distributes these embeddings adaptively to their respective regions within the synthesized image, achieving the customization of single and multiple IDs. With a carefully designed two-stage training scheme, UniPortrait achieves superior performance in both single- and multi-ID customization. Quantitative and qualitative experiments demonstrate the advantages of our method over existing approaches as well as its good scalability, e.g., the universal compatibility with existing generative control tools. The project page is at https://aigcdesigngroup.github.io/UniPortrait-Page/ .

Citations (3)

Summary

  • The paper introduces a unified framework that preserves identity in both single- and multi-human scenarios by leveraging specialized ID embedding and routing modules.
  • It employs a diffusion model augmented with penultimate-layer features and CLIP-inspired integration to achieve high-fidelity, editable facial representations.
  • The framework demonstrates superior performance with lower FID scores and enhanced face similarity metrics compared to existing methods.

Overview of "UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization"

This essay explores the framework "UniPortrait," a proposition for identity-preserving human image personalization that unifies both single-ID and multi-ID customization within a cohesive architecture. This framework introduces two pivotal modules: the ID embedding module and the ID routing module. These are coupled with a diffusion model to enhance face fidelity, editability, and adaptability to free-form text descriptions and varying scene layouts.

Methodology

ID Embedding Module

The purpose of the ID embedding module is to accurately capture high-fidelity, editable facial ID embeddings. Traditional methods typically extract features from the final global layer of face recognition models. In contrast, UniPortrait uses features from the penultimate layer, preserving crucial spatial information. This enhances the robustness of the ID features to variations not integral to identity, such as expressions or poses.

To further enrich the ID representation, UniPortrait utilizes shallow features from the face backbone and merges them with CLIP local features. An MLP processes these comprehensive features into a form compatible with diffusion models. The Q-Former establishes a balance between intrinsic ID and facial structural features, leveraging techniques like DropToken and DropPath to encourage a robust, disentangled embedding suitable for various editing requirements.

ID Routing Module

In handling the multi-ID customization challenge, the ID routing module is designed to route distinct IDs to nuanced facial areas, mitigating the risk of identity blending. This position-agnostic mechanism leverages a network that associates each spatial location with a unique discrete probability distribution over IDs, using the Gumbel softmax trick to manage non-differentiability issues. A routing regularization loss ensures efficient ID allocation and guides the learning of spatial assignments, crucial for preserving distinct identities within synthesized multi-ID images.

Training and Implementation

UniPortrait delineates its training into two stages. The first is a single-ID training phase where the foundational ID embedding and parameters are established. The subsequent stage introduces the ID routing module to refine multi-ID personalizations. The training utilizes curated datasets, including LAION and CelebA, and implements LoRA within the U-Net architecture to finely adjust the model.

Results and Evaluation

Single-ID Personalization

Quantitative evaluations of UniPortrait against competing methods underscore its capability to achieve superior identity preservation while maintaining prompt consistency and producing high-quality, diverse outputs. This is observed with the lowest FID scores and notable face similarity indices compared to competing frameworks.

Multi-ID Personalization

UniPortrait excels in handling multi-ID scenarios, evidencing its strength through enhanced ID preservation and consistency metrics over models like FastComposer. This capacity to maintain distinct identities within a single visual context is a testament to the efficacy of the ID routing module.

Application Potential

UniPortrait's design is conducive to myriad applications beyond basic personalization tasks. Its architecture supports seamless interaction and integration with established textual and generative control systems like ControlNet and IP-Adapter. Additional functionalities include face attribute modifications, identity interpolations, and generating consistent characters across narrative sequences, broadening the potential use cases within digital workflows.

Conclusion

UniPortrait provides a comprehensive framework for unified single-ID and multi-ID image personalization, optimizing identity fidelity alongside significant editability. Through methodical calibration of ID embeddings and routing, it paves the way for advanced applications in synthetic imagery. Despite its successes, there remains scope for extending its routing to non-facial attributes, indicating a future research path to increase the framework's applicability across broader creative endeavors. Such advancements will address current constraints and potentially revolutionize standards in customized image generation frameworks.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.

GitHub

Tweets

Sign up for free to view the 3 tweets with 323 likes about this paper.