Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior (2404.16678v1)
Abstract: Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausible semantics. To overcome the artifacts introduced by the diffusion prior, we apply the luminance conditional guidance. Moreover, we adopt multimodal high-level semantic priors to help the model understand the image content and deliver saturated colors. Besides, a luminance-aware decoder is designed to restore details and enhance overall visual quality. The proposed pipeline synthesizes saturated colors while maintaining plausible semantics. Experiments indicate that our proposed method considers both diversity and fidelity, surpassing previous methods in terms of perceptual realism and gain most human preference.
- “Bigcolor: colorization using a generative color prior for natural images,” in European Conference on Computer Vision. Springer, 2022, pp. 350–366.
- “Ct2: Colorization transformer via color tokens,” in ECCV, 2022.
- “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847.
- Jason Antic, “DeOldify: A Deep Learning based project for colorizing and restoring old images (and video!),” .
- “Chromagan: Adversarial picture colorization with semantic class distribution,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 2445–2454.
- “Colorization transformer,” in International Conference on Learning Representations, 2021.
- “Unicolor: A unified framework for multi-modal colorization with transformer,” ACM Transactions on Graphics (TOG), vol. 41, no. 6, pp. 1–16, 2022.
- “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- “Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models,” ACM Transactions on Graphics (TOG), vol. 42, no. 4, pp. 1–10, 2023.
- “Instructpix2pix: Learning to follow image editing instructions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18392–18402.
- “Controlling vision-language models for universal image restoration,” arXiv preprint arXiv:2310.01018, 2023.
- “Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization,” arXiv preprint arXiv:2308.14469, 2023.
- “Instance-aware image colorization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7968–7977.
- “Scgan: Saliency map-guided colorization with generative adversarial network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 8, pp. 3062–3077, 2020.
- “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
- “Zero-shot image restoration using denoising diffusion null-space model,” The Eleventh International Conference on Learning Representations, 2023.
- “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
- “Efficientnet,” Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization, pp. 109–123, 2021.
- “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12888–12900.
- “Mask transfiner for high-quality instance segmentation,” in CVPR, 2022.
- “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 694–711.
- “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 248–255.
- “Classifier-free diffusion guidance,” 2022.
- “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014.
- “Denoising diffusion implicit models,” in International Conference on Learning Representations, 2020.
- “Measuring colorfulness in natural images,” in Human vision and electronic imaging VIII. SPIE, 2003, vol. 5007, pp. 87–95.
- “Towards vivid and diverse image colorization with generative color prior,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 14377–14386.
- “Disentangled image colorization via global anchors,” ACM Transactions on Graphics (TOG), vol. 41, no. 6, pp. 1–13, 2022.