DiffStyler: Diffusion-based Localized Image Style Transfer (2403.18461v2)
Abstract: Image style transfer aims to imbue digital imagery with the distinctive attributes of style targets, such as colors, brushstrokes, shapes, whilst concurrently preserving the semantic integrity of the content. Despite the advancements in arbitrary style transfer methods, a prevalent challenge remains the delicate equilibrium between content semantics and style attributes. Recent developments in large-scale text-to-image diffusion models have heralded unprecedented synthesis capabilities, albeit at the expense of relying on extensive and often imprecise textual descriptions to delineate artistic styles. Addressing these limitations, this paper introduces DiffStyler, a novel approach that facilitates efficient and precise arbitrary image style transfer. DiffStyler lies the utilization of a text-to-image Stable Diffusion model-based LoRA to encapsulate the essence of style targets. This approach, coupled with strategic cross-LoRA feature and attention injection, guides the style transfer process. The foundation of our methodology is rooted in the observation that LoRA maintains the spatial feature consistency of UNet, a discovery that further inspired the development of a mask-wise style transfer technique. This technique employs masks extracted through a pre-trained FastSAM model, utilizing mask prompts to facilitate feature fusion during the denoising process, thereby enabling localized style transfer that preserves the original image's unaffected regions. Moreover, our approach accommodates multiple style targets through the use of corresponding masks. Through extensive experimentation, we demonstrate that DiffStyler surpasses previous methods in achieving a more harmonious balance between content preservation and style integration.
- “Efficient example-based painting and synthesis of 2d directional texture,” IEEE Transactions on Visualization and Computer Graphics, vol. 10, no. 3, pp. 266–277, 2004.
- “A neural algorithm of artistic style,” arXiv preprint arXiv:1508.06576, 2015.
- “Inversion-based style transfer with diffusion models,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 10146–10156.
- “Denoising diffusion implicit models,” 2022.
- “Hierarchical text-conditional image generation with clip latents,” 2022.
- “Prompt-to-prompt image editing with cross attention control,” 2022.
- “Plug-and-play diffusion features for text-driven image-to-image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 1921–1930.
- “Lora: Low-rank adaptation of large language models,” 2021.
- “Style transfer via image component analysis,” IEEE Transactions on Multimedia, vol. 15, no. 7, pp. 1594–1601, 2013.
- “Visual attribute transfer through deep image analogy,” ACM Transactions on Graphics, vol. 36, no. 4, 2017.
- “Arbitrary style transfer with deep feature reshuffle,” in CVPR, 2018.
- “Generalized one-shot domain adaptation of generative adversarial networks,” 2022.
- “Combining markov random fields and convolutional neural networks for image synthesis,” in CVPR, 2016.
- “Style transfer by relaxed optimal transport and self-similarity,” in CVPR, 2019.
- “Arbitrary style transfer via multi-adaptation network,” in Acm International Conference on Multimedia. 2020, ACM.
- “Perceptual losses for real-time style transfer and super-resolution,” in ECCV, 2016, pp. 694–711.
- “Attention-aware multi-stroke style transfer,” in CVPR, 2019.
- “Awesome typography: Statistics-based text effects transfer,” in CVPR, 2017.
- “Multi-content gan for few-shot font style transfer,” in CVPR, 2018.
- Shaoxu Li and Ye Pan, “Instant photorealistic neural radiance fields stylization,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 2980–2984.
- Shaoxu Li and Ye Pan, “Rendering and reconstruction based 3d portrait stylization,” in 2023 IEEE International Conference on Multimedia and Expo (ICME), 2023, pp. 912–917.
- “Adaattn: Revisit attention mechanism in arbitrary neural style transfer,” in Proceedings of the IEEE International Conference on Computer Vision, 2021.
- “Artflow: Unbiased image style transfer via reversible neural flows,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- “Artistic style transfer with internal-external learning and contrastive learning,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, Eds. 2021, vol. 34, pp. 26561–26573, Curran Associates, Inc.
- “Domain enhanced arbitrary image style transfer via contrastive learning,” in ACM SIGGRAPH, 2022.
- “Styleformer: Real-time arbitrary style transfer via parametric style composition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14618–14627.
- “Stytr 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Image style transfer with transformers,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- “General image-to-image translation with one-shot image guidance,” arXiv preprint arXiv:2307.14352, 2023.
- “Ilvr: Conditioning method for denoising diffusion probabilistic models,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 14347–14356.
- “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022.
- “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,” 2023.
- “Instantid: Zero-shot identity-preserving generation in seconds,” arXiv preprint arXiv:2401.07519, 2024.
- “Creativesynth: Creative blending and synthesis of visual arts based on multimodal diffusion,” 2024.
- “Diffmorpher: Unleashing the capability of diffusion models for image morphing,” arXiv preprint arXiv:2312.07409, 2023.
- “Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer,” arXiv preprint arXiv:2312.09008, 2023.
- “z*superscript𝑧z^{*}italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT: Zero-shot style transfer via attention rearrangement,” 2023.
- “Visual style prompting with swapping self-attention,” 2024.
- “Real-world image variation by aligning diffusion inversion chain,” arXiv preprint arXiv:2305.18729, 2023.
- “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2022.
- “Paint by word,” 2023.
- Katherine Crowson, “Clip-guided-diffusion,” https://colab.research.google.com/drive/1V66mUeJbXrTuQITvJunvnWVn96FEbSI3., 2023.
- “Blended diffusion for text-driven editing of natural images,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). jun 2022, IEEE.
- “Repaint: Inpainting using denoising diffusion probabilistic models,” 2022.
- “Diffusionclip: Text-guided diffusion models for robust image manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 2426–2435.
- “Sdedit: Guided image synthesis and editing with stochastic differential equations,” 2022.
- “Diffedit: Diffusion-based semantic image editing with mask guidance,” 2022.
- “High-resolution image synthesis with latent diffusion models,” 2021.
- “Denoising diffusion probabilistic models,” 2020.
- “Deep unsupervised learning using nonequilibrium thermodynamics,” 2015.
- “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2021.
- “Auto-encoding variational bayes,” 2022.
- “U-net: Convolutional networks for biomedical image segmentation,” ArXiv, vol. abs/1505.04597, 2015.
- “Segment anything,” arXiv:2304.02643, 2023.
- “Fast segment anything,” 2023.
- “Diffusion self-guidance for controllable image generation,” 2023.
- “Diffusers: State-of-the-art diffusion models,” https://github.com/huggingface/diffusers, 2022.
- pharmapsychotic, “clip-interrogator,” https://github.com/pharmapsychotic/clip-interrogator, 2023.
- Shaoxu Li (6 papers)