Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning (2304.03119v1)

Published 6 Apr 2023 in cs.CV

Abstract: Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.

Overview of "Zero-shot Generative Model Adaptation via Image-specific Prompt Learning"

The paper "Zero-shot Generative Model Adaptation via Image-specific Prompt Learning" addresses the mode collapse issue encountered in zero-shot generative model adaptation tasks. This research leverages CLIP-guided synthesis to adapt a pre-trained generator from a source domain to various target domains without relying on target domain samples. By focusing on textual domain labels alone, the adaptation process becomes more efficient, but existing methods exhibit deficiencies in image quality and diversity due to applying a fixed adaptation direction across all cross-domain image pairs.

The authors introduce an Image-specific Prompt Learning (IPL) method to solve these limitations. IPL learns specific prompt vectors for each source-domain image, which serve as individualized adaptation directions. This enables the target domain generator to synthesize images with enhanced flexibility, quality, and diversity, overcoming mode collapse. The IPL method is notably independent of the generative model structure, proving effective across GANs and diffusion models.

Numerical Results and Claims

The paper's results indicate superior performance of the IPL method over established approaches like NADA. Through extensive qualitative and quantitative evaluations across datasets like FFHQ and AFHQ, IPL consistently generates images of higher quality and diversity. It achieves higher Inception Scores (IS) and lower Single Image Fréchet Inception Distances (SIFID), signifying improved target domain style adherence. Furthermore, IPL exhibits better identity similarity (ID) scores, demonstrating enhanced ability to preserve source-domain information.

Implications and Future Directions

The introduction of adaptive prompts marks a shift towards more precise and individualized image synthesis, addressing inherent issues in generative model adaptation. This advancement in IPL extends potential applications of zero-shot generative models in fields requiring realistic and diverse image creation without abundant data, notably artistic image domains.

Theoretical implications are significant, suggesting avenues for leveraging adaptive learning techniques in broader vision and language tasks. Although the learned prompt vectors demand further explorative visualization for complete interpretability, the foundational improvements in generative model flexibility may inspire prompt learning applications in other domains.

Future developments may focus on enhancing adaptive prompt interpretation, enabling better disentanglement and visualization of learned semantics for improved model training transparency. Additionally, exploring IPL's robustness in scenarios involving large domain shifts remains a challenging, yet promising trajectory.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jiayi Guo (24 papers)
  2. Chaofei Wang (11 papers)
  3. You Wu (60 papers)
  4. Eric Zhang (12 papers)
  5. Kai Wang (624 papers)
  6. Xingqian Xu (23 papers)
  7. Shiji Song (103 papers)
  8. Humphrey Shi (97 papers)
  9. Gao Huang (178 papers)
Citations (25)