ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional Latent Diffusion Models (2306.09330v2)
Abstract: Arbitrary Style Transfer (AST) aims to transform images by adopting the style from any selected artwork. Nonetheless, the need to accommodate diverse and subjective user preferences poses a significant challenge. While some users wish to preserve distinct content structures, others might favor a more pronounced stylization. Despite advances in feed-forward AST methods, their limited customizability hinders their practical application. We propose a new approach, ArtFusion, which provides a flexible balance between content and style. In contrast to traditional methods reliant on biased similarity losses, ArtFusion utilizes our innovative Dual Conditional Latent Diffusion Probabilistic Models (Dual-cLDM). This approach mitigates repetitive patterns and enhances subtle artistic aspects like brush strokes and genre-specific features. Despite the promising results of conditional diffusion probabilistic models (cDM) in various generative tasks, their introduction to style transfer is challenging due to the requirement for paired training data. ArtFusion successfully navigates this issue, offering more practical and controllable stylization. A key element of our approach involves using a single image for both content and style during model training, all the while maintaining effective stylization during inference. ArtFusion outperforms existing approaches on outstanding controllability and faithful presentation of artistic details, providing evidence of its superior style transfer capabilities. Furthermore, the Dual-cLDM utilized in ArtFusion carries the potential for a variety of complex multi-condition generative tasks, thus greatly broadening the impact of our research.
- Artflow: Unbiased image style transfer via reversible neural flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- Real-time universal style transfer on high-resolution images via zero-channel pruning. CoRR, abs/2006.09029, 2020.
- Wasserstein gan, 2017.
- Structured denoising diffusion models in discrete state-spaces. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, October 2021.
- Stylebank: An explicit representation for neural image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- Artistic style transfer with internal-external learning and contrastive learning. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Dualast: Dual style-learning networks for artistic style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 872–881, June 2021.
- Style-aware normalized loss for improving arbitrary style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 134–143, June 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Arbitrary video style transfer via multi-channel correlation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2):1210–1217, May 2021.
- Stytr2: Image style transfer with transformers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Arbitrary style transfer via multi-adaptation network. In Acm International Conference on Multimedia. ACM, 2020.
- Diffusion models beat GANs on image synthesis. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- A learned representation for artistic style. In International Conference on Learning Representations, 2017.
- Taming transformers for high-resolution image synthesis, 2020.
- Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- Exploring the structure of a real-time, arbitrary neural artistic stylization network. CoRR, abs/1705.06830, 2017.
- Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
- Arbitrary style transfer with deep feature reshuffle. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8222–8231, 2018.
- Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
- Cascaded diffusion models for high fidelity image generation. CoRR, abs/2106.15282, 2021.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Argmax flows and multinomial diffusion: Learning categorical distributions. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Global context with discrete diffusion in vector quantised modelling for image generation, 2021.
- Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017.
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- Dynamic instance normalization for arbitrary style transfer. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):4369–4376, Apr. 2020.
- Perceptual losses for real-time style transfer and super-resolution. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 694–711, Cham, 2016. Springer International Publishing.
- Imagic: Text-based real image editing with diffusion models, 2022.
- Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2426–2435, June 2022.
- On density estimation with diffusion models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Content and style disentanglement for artistic style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- A content transformation block for image style transfer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Diffusion-based image translation using disentangled style and content representation. In The Eleventh International Conference on Learning Representations, 2023.
- Vqbb: Image-to-image translation with vector quantized brownian bridge, 2022.
- Precomputed real-time texture synthesis with markovian generative adversarial networks. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 702–716, Cham, 2016. Springer International Publishing.
- Learning linear transformations for fast arbitrary style transfer. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
- Diversified texture synthesis with feed-forward networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- Universal style transfer via feature transforms. In Advances in Neural Information Processing Systems, 2017.
- Demystifying neural style transfer. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 2230–2236, 2017.
- Distribution aligned multimodal and multi-domain image stylization. ACM Trans. Multimedia Comput. Commun. Appl., 17(3), jul 2021.
- Drafting and revision: Laplacian pyramid network for fast high-quality artistic style transfer. 2021.
- Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
- Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE International Conference on Computer Vision, 2021.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11461–11471, June 2022.
- Knowledge distillation in iterative generative models for improved sampling speed, 2021.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2021.
- Improved denoising diffusion probabilistic models, 2021.
- Arbitrary style transfer with style-attentional networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5873–5881, 2018.
- Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
- Wiki Art Gallery, Inc.: A Case for Critical Thinking. Issues in Accounting Education, 26(3):593–608, 08 2011.
- Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.
- Stable and controllable neural texture synthesis and style transfer using histogram losses, 2017.
- High-resolution image synthesis with latent diffusion models, 2021.
- U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
- Palette: Image-to-image diffusion models, 2022.
- Image super-resolution via iterative refinement. arXiv:2104.07636, 2021.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
- A style-aware content loss for real-time hd style transfer. In Proceedings of the European Conference on Computer Vision (ECCV), pages 698–714, 10 2018.
- Neural style transfer via meta networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, pages 1–9, 2018.
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
- Deep unsupervised learning using nonequilibrium thermodynamics. CoRR, abs/1503.03585, 2015.
- Denoising diffusion implicit models. ArXiv, abs/2010.02502, 2021.
- Two-stage peer-regularized feature recombination for arbitrary image style transfer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Texture networks: Feed-forward synthesis of textures and stylized images. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 1349–1357. JMLR.org, 2016.
- Collaborative distillation for ultra-resolution universal style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Rethinking and improving the robustness of image style transfer. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
- Semantic image synthesis via diffusion models, 2022.
- Diversified arbitrary style transfer via deep feature perturbation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7789–7798, 2020.
- Styleformer: Real-time arbitrary style transfer via parametric style composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14618–14627, 2021.
- Attention-aware multi-stroke style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Multi-style generative network for real-time transfer. In Laura Leal-Taixé and Stefan Roth, editors, Computer Vision – ECCV 2018 Workshops, pages 349–365, Cham, 2019. Springer International Publishing.
- Domain enhanced arbitrary image style transfer via contrastive learning. In ACM SIGGRAPH, 2022.
- Dar-Yen Chen (6 papers)