Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots (2305.11540v1)

Published 19 May 2023 in cs.CV and cs.CL

Abstract: Diffusion models have made impressive progress in text-to-image synthesis. However, training such large-scale models (e.g. Stable Diffusion), from scratch requires high computational costs and massive high-quality text-image pairs, which becomes unaffordable in other languages. To handle this challenge, we propose IAP, a simple but effective method to transfer English Stable Diffusion into Chinese. IAP optimizes only a separate Chinese text encoder with all other parameters fixed to align Chinese semantics space to the English one in CLIP. To achieve this, we innovatively treat images as pivots and minimize the distance of attentive features produced from cross-attention between images and each language respectively. In this way, IAP establishes connections of Chinese, English and visual semantics in CLIP's embedding space efficiently, advancing the quality of the generated image with direct Chinese prompts. Experimental results show that our method outperforms several strong Chinese diffusion models with only 5%~10% training data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jinyi Hu (19 papers)
  2. Xu Han (270 papers)
  3. Xiaoyuan Yi (42 papers)
  4. Yutong Chen (30 papers)
  5. Wenhao Li (136 papers)
  6. Zhiyuan Liu (433 papers)
  7. Maosong Sun (337 papers)
Citations (4)