PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation (2403.04692v2)
Abstract: In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution. PixArt-\Sigma represents a significant advancement over its predecessor, PixArt-\alpha, offering images of markedly higher fidelity and improved alignment with text prompts. A key feature of PixArt-\Sigma is its training efficiency. Leveraging the foundational pre-training of PixArt-\alpha, it evolves from the weaker' baseline to a
stronger' model via incorporating higher quality data, a process we term "weak-to-strong training". The advancements in PixArt-\Sigma are twofold: (1) High-Quality Training Data: PixArt-\Sigma incorporates superior-quality image data, paired with more precise and detailed image captions. (2) Efficient Token Compression: we propose a novel attention module within the DiT framework that compresses both keys and values, significantly improving efficiency and facilitating ultra-high-resolution image generation. Thanks to these improvements, PixArt-\Sigma achieves superior image quality and user prompt adherence capabilities with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters). Moreover, PixArt-\Sigma's capability to generate 4K images supports the creation of high-resolution posters and wallpapers, efficiently bolstering the production of high-quality visual content in industries such as film and gaming.
- Aesthetic predictor (2023), https://github.com/christophschuhmann/improved-aesthetic-predictor
- Midjourney: Midjourney (2023), https://www.midjourney.com
- OpenAI: Dalle-2 (2023), https://openai.com/dall-e-2
- OpenAI: Dalle-3 (2023), https://openai.com/dall-e-3
- OpenAI: Gpt-4v(ision) system card. In: OpenAI (2023)
- OpenAI: Sora (2024), https://openai.com/sora
- Stability.AI: Stable diffusion 3 (2024), https://stability.ai/news/stable-diffusion-3
- Junsong Chen (13 papers)
- Chongjian Ge (23 papers)
- Enze Xie (84 papers)
- Yue Wu (339 papers)
- Lewei Yao (15 papers)
- Xiaozhe Ren (21 papers)
- Zhongdao Wang (36 papers)
- Ping Luo (340 papers)
- Huchuan Lu (199 papers)
- Zhenguo Li (195 papers)