UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization (2408.05939v2)

Published 12 Aug 2024 in cs.CV

Abstract: This paper presents UniPortrait, an innovative human image personalization framework that unifies single- and multi-ID customization with high face fidelity, extensive facial editability, free-form input description, and diverse layout generation. UniPortrait consists of only two plug-and-play modules: an ID embedding module and an ID routing module. The ID embedding module extracts versatile editable facial features with a decoupling strategy for each ID and embeds them into the context space of diffusion models. The ID routing module then combines and distributes these embeddings adaptively to their respective regions within the synthesized image, achieving the customization of single and multiple IDs. With a carefully designed two-stage training scheme, UniPortrait achieves superior performance in both single- and multi-ID customization. Quantitative and qualitative experiments demonstrate the advantages of our method over existing approaches as well as its good scalability, e.g., the universal compatibility with existing generative control tools. The project page is at https://aigcdesigngroup.github.io/UniPortrait-Page/ .

PDF HTML Abstract

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

The paper UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization by Junjie He, Yifeng Geng, and Liefeng Bo, introduces an advanced approach aimed at the customization of human images while maintaining high fidelity to facial identities. This research outlines a method that unifies both single- and multiple-identity (ID) customization within a single framework, addressing major limitations in current human image personalization techniques.

Key Contributions

UniPortrait incorporates two principal modules: the ID embedding module and the ID routing module.

ID Embedding Module:
- Intrinsic ID Embedding: This module extracts editable facial features using a decoupling strategy for each ID, embedding them into the diffusion model's context space. Unlike previous techniques that depend on the final global features of face recognition backbones, UniPortrait utilizes features from the penultimate layer, preserving more spatial information pertinent to facial identities.
- Face Structure Features: By combining shallow features from the face backbone and CLIP local features, the module captures detailed facial shapes and textures. Applying a strong dropout regularization on this branch helps maintain a balance between high ID similarity and facial editability.
- Scalability: The framework can efficiently manage multiple reference images for a single ID and interpolate between different identities or states of a single identity by performing linear interpolation on the ID embeddings.
ID Routing Module:
- This module combines and distributes ID embeddings adaptively to their respective regions within the synthesized image. It avoids identity blending that occurs in multi-ID images by leveraging a routing network. The network assigns a unique ID to each potential face area based on discrete probability distributions.
- Routing Regularization Loss: During training, a routing regularization loss ensures that all IDs are routed accurately to their corresponding face areas, reinforcing the distinct representation of each identity.
- Adaptiveness: Unlike previous methods that use fixed layout masks or impose prompt format restrictions, the proposed ID routing module does not require predetermined layouts or text prompts, unlocking diverse and creative image generation possibilities.

Methodology

The training process is divided into two stages:

Single-ID Training Stage: Initially, the ID embedding module alone is trained on single-ID images. This phase integrates dropping regularizations to enforce the disentanglement of intrinsic ID and face structure features, ensuring flexible editability.
Multi-ID Fine-Tuning Stage: Post the single-ID training, the ID routing module is introduced. This stage fine-tunes the router and LoRA (Low-Rank Adaptation) parameters while keeping the embedding module fixed, optimizing the model for multi-ID image generation.

Experimental Analysis

Extensive experiments validate the efficacy of UniPortrait in both single-ID and multi-ID customization scenarios:

Single-ID Customization: The framework demonstrates superiority in balancing identity preservation and prompt consistency. It achieves competitive performance across metrics like face similarity, FID (Fréchet Inception Distance), CLIP textual alignment, and LAION-Aes (aesthetic) scores.
Multi-ID Customization: UniPortrait outperforms existing methods notably in handling multiple identities within a single image. Qualitative results exhibit high fidelity to distinct identity attributes and textual prompts, showcasing robust editability without the need for predefined layouts or restrictive prompt formats.

Implications and Future Directions

UniPortrait's scalable, plug-and-play architecture lowers the barrier for integrating human image personalization capabilities into various applications including AI-driven portrait creation, virtual try-ons, and image animations. By generalizing across single and multi-ID settings, the framework paves the way for more sophisticated, user-specific artwork generation and content creation.

Future research can extend this work by incorporating attribute-specific routing mechanisms, optimizing the model for customizing non-ID-related features (like clothing or actions), and refining the balance between high-fidelity identity preservation and user-defined customization.

In summary, UniPortrait marks a significant step in the evolution of human image personalization techniques, providing a unified framework for high-fidelity, editable, and diverse image generation, adaptable to a wide range of applications in AI and creative domains.