CCM: Adding Conditional Controls to Text-to-Image Consistency Models (2312.06971v1)
Abstract: Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider alternative strategies for adding ControlNet-like conditional control to CMs and present three significant findings. 1) ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but struggles with low-level detail and realism control. 2) CMs serve as an independent class of generative models, based on which ControlNet can be trained from scratch using Consistency Training proposed by Song et al. 3) A lightweight adapter can be jointly optimized under multiple conditions through Consistency Training, allowing for the swift transfer of DMs-based ControlNet to CMs. We study these three solutions across various conditional controls, including edge, depth, human pose, low-resolution image and masked image with text-to-image latent consistency models.
- John Canny. A computational approach to edge detection. IEEE TPAMI, 1986.
- Realtime multi-person 2d pose estimation using part affinity fields. In CVPR, 2017.
- Anydoor: Zero-shot object-level image customization. arXiv preprint arXiv:2307.09481, 2023.
- Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778, 2023.
- Consistency trajectory models: Learning probability flow ode trajectory of diffusion. arXiv preprint arXiv:2310.02279, 2023.
- Multi-concept customization of text-to-image diffusion. arXiv preprint arXiv:2212.04488, 2022.
- Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862, 2017.
- Common diffusion noise schedules and sample steps are flawed. arXiv preprint arXiv:2305.08891, 2023.
- Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. arXiv preprint arXiv:2309.06380, 2023a.
- Cones: Concept neurons in diffusion models for customized generation. arXiv preprint arXiv:2303.05125, 2023b.
- Cones 2: Customizable image synthesis with multiple subjects. arXiv preprint arXiv:2305.19327, 2023c.
- Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023a.
- Lcm-lora: A universal stable-diffusion acceleration module. arXiv preprint arXiv:2311.05556, 2023b.
- On distillation of guided diffusion models. In CVPR, 2023.
- Unicontrol: A unified diffusion model for controllable visual generation in the wild. arXiv preprint arXiv:2305.11147, 2023.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence, 2020.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242, 2022.
- Imagenet large scale visual recognition challenge. IJCV, 2015.
- Progressive distillation for fast sampling of diffusion models. In ICLR, 2022.
- Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042, 2023.
- Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
- Improved techniques for training consistency models. arXiv preprint arXiv:2310.14189, 2023.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Consistency models. arXiv preprint arXiv:2303.01469, 2023.
- Pixel difference networks for efficient edge detection. In ICCV, 2021.
- Holistically-nested edge detection. In ICCV, 2015.
- Ufogen: You forward once large scale text-to-image generation via diffusion gans. arXiv preprint arXiv:2311.09257, 2023.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- Uni-controlnet: All-in-one control to text-to-image diffusion models. arXiv preprint arXiv:2305.16322, 2023.
- Jie Xiao (89 papers)
- Kai Zhu (93 papers)
- Han Zhang (338 papers)
- Zhiheng Liu (22 papers)
- Yujun Shen (111 papers)
- Yu Liu (786 papers)
- Xueyang Fu (29 papers)
- Zheng-Jun Zha (144 papers)