Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding (2312.04461v1)

Published 7 Dec 2023 in cs.CV, cs.AI, cs.LG, and cs.MM

Abstract: Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/

Overview of Personalized Text-to-Image Generation

The field of text-to-image generation has witnessed a significant advancement in recent years. Notably, the development has led to the synthesis of human photos that meet specific textual descriptions. A notable development, PhotoMaker, aims to enhance personalized text-to-image generation by embedding identities (or IDs) into images while adhering to a given text prompt efficiently. In contrast to existing approaches, PhotoMaker is designed for high efficiency without compromising on identity preservation and text controllability.

Methodology and Approach

The PhotoMaker methodology centers on what is termed "stacked ID embedding." This process involves taking an arbitrary number of input ID images and encoding them into a unified ID representation. The strength of this approach lies in its ability to preserve the unique characteristics of individual IDs and yet be flexible enough to integrate these characteristics when needed. PhotoMaker's ability to work efficiently with multiple encoded IDs is in sharp contrast to previous methods like DreamBooth, which require substantial computational resources and time for customization. Furthermore, the development of an ID-oriented data construction pipeline is a critical component of PhotoMaker, enabling the synthesis of a dataset that feeds the training required by the model.

Capabilities and Applications

PhotoMaker can handle various exciting applications. It demonstrates the flexibility to transform characteristics like changing attributes, morphing characters from artworks, or merging multiple identities into one. Notably, its innovative approach allows for identity mixing, where the generated photo-realistically retains aspects of multiple input identities. Additionally, the interface allows users to adjust the merge ratio of different IDs by controlling the share of images in the input sample pool or using prompt weighting.

Conclusion and Implications

In summary, PhotoMaker stands as an efficient method for generating personalized human images that are realistic and preserve ID fidelity. Its ability to generate diverse images based on text prompts quickly makes it a significant stride in digital image creation. Its applications are vast, from entertainment to virtual reality. However, it goes without saying that ethical considerations are paramount with such powerful technology. It's vital that PhotoMaker and methods like it are used responsibly and with consideration of potential misuses.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhen Li (334 papers)
  2. Mingdeng Cao (22 papers)
  3. Xintao Wang (132 papers)
  4. Zhongang Qi (40 papers)
  5. Ming-Ming Cheng (185 papers)
  6. Ying Shan (252 papers)
Citations (112)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com