DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer (2307.04157v2)
Abstract: Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and keeping most image structures the same. However, style-based deformation of the content is desirable for some styles, especially in cases where the style is abstract or the primary concept of the style is in its deformed rendition of some content. With the recent introduction of diffusion models, such as Stable Diffusion, we can access far more powerful image generation techniques, enabling new possibilities. In our work, we propose using this new class of models to perform style transfer while enabling deformable style transfer, an elusive capability in previous models. We show how leveraging the priors of these models can expose new artistic controls at inference time, and we document our findings in exploring this new direction for the field of style transfer.
- Stability ai. https://stability.ai/. (Accessed on 05/16/2023).
- Chatgpt. https://chat.openai.com/. (Accessed on 05/16/2023).
- ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers, 2023.
- Artistic style transfer with internal-external learning and contrastive learning. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=hm0i-cunzGW.
- An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020. URL https://arxiv.org/abs/2010.11929.
- An image is worth one word: Personalizing text-to-image generation using textual inversion, 2022.
- Image style transfer using convolutional neural networks. In Proc. CVPR, pages 2414–2423, 2016.
- Prompt-to-prompt image editing with cross attention control. 2022.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proc. ICCV, 2017.
- Reference-based image composition with sketch via structure-aware diffusion model, 2023.
- Deformable style transfer, 2020.
- Style transfer by relaxed optimal transport and self-similarity, 2019.
- Neural neighbor style transfer, 2022. URL https://arxiv.org/abs/2203.13215.
- Multi-concept customization of text-to-image diffusion, 2022.
- Universal style transfer via feature transforms. CoRR, abs/1705.08086, 2017. URL http://arxiv.org/abs/1705.08086.
- Consistent style transfer. CoRR, abs/2201.02233, 2022. URL https://arxiv.org/abs/2201.02233.
- Arbitrary style transfer with style-attentional networks. CoRR, abs/1812.02342, 2018. URL http://arxiv.org/abs/1812.02342.
- Shape-guided diffusion with inside-outside attention, 2023.
- Hierarchical text-conditional image generation with clip latents, 2022.
- High-resolution image synthesis with latent diffusion models, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, 2023.
- Aladin: All layer adaptive instance normalization for fine-grained style similarity. arXiv preprint arXiv:2103.09776, 2021.
- Stylebabel: Artistic style tagging and captioning, 2022a. URL https://arxiv.org/abs/2203.05321.
- Hypernst: Hyper-networks for neural style transfer, 2022b. URL https://arxiv.org/abs/2208.04807.
- Neat: Neural artistic tracing for beautiful style transfer, 2023a. URL https://arxiv.org/abs/2304.05139.
- Aladin-nst: Self-supervised disentangled representation learning of artistic style through neural style transfer, 2023b.
- Photorealistic text-to-image diffusion models with deep language understanding, 2022.
- Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
- Singan: Learning a generative model from a single natural image. CoRR, abs/1905.01164, 2019. URL http://arxiv.org/abs/1905.01164.
- Very deep convolutional networks for large-scale image recognition. In Proc. ICLR, 2015.
- Parasol: Parametric style control for diffusion image synthesis, 2023. URL https://arxiv.org/abs/2303.06464.
- Attention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.03762.
- Unifying diffusion models’ latent space, with applications to CycleDiffusion and guidance. In ArXiv, 2022.
- Uncovering the disentanglement capability in text-to-image diffusion models, 2022.
- Paint by example: Exemplar-based image editing with diffusion models, 2022.
- Scaling autoregressive models for content-rich text-to-image generation, 2022.
- Nick Zangwill. The Metaphysics of Beauty. Cornell University Press, 2001. ISBN 9780801438202. URL http://www.jstor.org/stable/10.7591/j.ctv1nhmzk.
- Arf: Artistic radiance fields, 2022a. URL https://arxiv.org/abs/2206.06360.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Domain enhanced arbitrary image style transfer via contrastive learning. In ACM SIGGRAPH, 2022b.