Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Measuring Style Similarity in Diffusion Models (2404.01292v1)

Published 1 Apr 2024 in cs.CV and cs.LG

Abstract: Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a generated image is used for professional purposes. Existing tools for this purpose focus on retrieving images of similar semantic content. Meanwhile, many artists are concerned with style replication in text-to-image models. We present a framework for understanding and extracting style descriptors from images. Our framework comprises a new dataset curated using the insight that style is a subjective property of an image that captures complex yet meaningful interactions of factors including but not limited to colors, textures, shapes, etc. We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model. We showcase promising results in various style retrieval tasks. We also quantitatively and qualitatively analyze style attribution and matching in the Stable Diffusion model. Code and artifacts are available at https://github.com/learn2phoenix/CSD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Adobe. Firefly, 2023. URL https://www.adobe.com/sensei/generative-ai/firefly.html.
  2. Genre and style based painting classification. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 588–594. IEEE, 2015.
  3. Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5297–5307, 2016.
  4. Explain me the painting: Multi-topic knowledgeable art description generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5422–5432, 2021.
  5. Frozen in time: A joint video and image encoder for end-to-end retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1728–1738, 2021.
  6. A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
  7. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023.
  8. Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  9. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.
  10. Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia, 20(9):2491–2502, 2018.
  11. Decrypt. Greg rutkowski removed from stable diffusion but brought back by ai artists, March 2024. URL https://decrypt.co/150575/greg-rutkowski-removed-from-stable-diffusion-but-brought-back-by-ai-artists.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  13. A learned representation for artistic style. arXiv preprint arXiv:1610.07629, 2016.
  14. Unsupervised image style embeddings for retrieval and recognition tasks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3281–3289, 2020.
  15. How to read paintings: semantic art understanding with multi-modal retrieval. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
  16. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
  17. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016.
  18. How much data are augmentations worth? an investigation into scaling laws, invariance, and implicit regularization. arXiv preprint arXiv:2210.06441, 2022.
  19. James Jerome Gibson. The senses considered as perceptual systems. 1966.
  20. Statistics, vision, and the analysis of artistic style. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2):115–123, 2012.
  21. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  22. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
  23. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pages 172–189, 2018.
  24. Stylometrics of artwork: uses and limitations. In Computer Vision and Image Analysis of Art, volume 7531, pages 91–105. SPIE, 2010.
  25. Comparing higher-order spatial statistics and perceptual judgements in the stylometric analysis of art. In 2011 19th European Signal Processing Conference, pages 1244–1248. IEEE, 2011.
  26. Svd: A large-scale short video dataset for near-duplicate video retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5281–5289, 2019.
  27. Art style classification with self-trained ensemble of autoencoding transformations. arXiv preprint arXiv:2012.03377, 2020.
  28. Recognizing image style. arXiv preprint arXiv:1311.3715, 2013.
  29. The art and science of portraiture. John Wiley & Sons, 2002.
  30. Recognizing art style automatically in painting with deep learning. In Asian conference on machine learning, pages 327–342. PMLR, 2017.
  31. Cosmo: Content-style modulation for image retrieval with text feedback. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 802–812, June 2021.
  32. Rhythmic brushstrokes distinguish van gogh from his contemporaries: findings via automated brushstroke extraction. IEEE transactions on pattern analysis and machine intelligence, 34(6):1159–1176, 2011.
  33. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In Proceedings of the IEEE international conference on computer vision, pages 990–998, 2015.
  34. Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4990–4998, 2017.
  35. Elements of style: learning perceptual shape style similarity. ACM Transactions on graphics (TOG), 34(4):1–14, 2015.
  36. Cnn-based style vector for style image retrieval. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pages 309–312, 2016.
  37. Distinguishing literary styles using neural networks. In Handbook of neural computation, pages G8–1. CRC Press, 2020.
  38. Deep ensemble art style recognition. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2020.
  39. Midjourney. Midjourney, n.d. URL https://www.midjourney.com/home.
  40. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
  41. Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems, 33:7198–7211, 2020.
  42. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1406–1415, 2019.
  43. Würstchen: An efficient architecture for large-scale text-to-image diffusion models. In The Twelfth International Conference on Learning Representations, 2023.
  44. pharmapsychotic. Clip interrogator. https://github.com/pharmapsychotic/clip-interrogator, 2023.
  45. A self-supervised descriptor for image copy detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14532–14542, 2022.
  46. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  47. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  48. Classification of style in fine-art paintings using transfer learning and weighted image patches. In 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS), pages 1–7. IEEE, 2018.
  49. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  50. Aladin: all layer adaptive instance normalization for fine-grained style similarity. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11926–11935, 2021.
  51. Hierarchical classification of paintings using face-and brush stroke models. In Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), volume 1, pages 172–174. IEEE, 1998.
  52. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855, 2015.
  53. Two-stage deep learning approach to the classification of fine-art paintings. IEEE Access, 7:41770–41781, 2019.
  54. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
  55. Automatic analysis of artistic paintings using information-based measures. Pattern Recognition, 114:107864, 2021.
  56. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  57. Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023a.
  58. Understanding and mitigating copying in diffusion models. Advances in Neural Information Processing Systems, 36:47783–47803, 2023b.
  59. Wikiartvectors: style and color representations of artworks for cultural analysis via information theoretic measures. Entropy, 24(9):1175, 2022.
  60. Separating style and content. Advances in neural information processing systems, 9, 1996.
  61. Teaching matters: Investigating the role of supervision in vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  62. Fine-grained image style transfer with visual transformers. In Proceedings of the Asian Conference on Computer Vision, pages 841–857, 2022.
  63. Evaluating data attribution for text-to-image models. arXiv preprint arXiv:2306.09345, 2023.
  64. Bam! the behance artistic media dataset for recognition beyond photography. In Proceedings of the IEEE international conference on computer vision, pages 1202–1211, 2017.
  65. Defining pictorial style: Lessons from linguistics and computer graphics. Axiomathes, 15(3):319–351, 2005.
  66. Unsupervised learning of artistic styles with archetypal style analysis. Advances in Neural Information Processing Systems, 31, 2018.
  67. Characterizing elegance of curves computationally for distinguishing morrisseau paintings and the imitations. In 2009 16th IEEE International Conference on Image Processing (ICIP), pages 73–76. IEEE, 2009.
  68. Style transfer via image component analysis. IEEE Transactions on multimedia, 15(7):1594–1601, 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Gowthami Somepalli (20 papers)
  2. Anubhav Gupta (12 papers)
  3. Kamal Gupta (22 papers)
  4. Shramay Palta (5 papers)
  5. Micah Goldblum (96 papers)
  6. Jonas Geiping (73 papers)
  7. Abhinav Shrivastava (120 papers)
  8. Tom Goldstein (226 papers)
Citations (23)

Summary

Measuring Style Similarity in Diffusion Models with Contrastive Style Descriptors

Introduction

Diffusion models have taken a significant role in the generative tasks involving image creation, where understanding and replicating artistic styles emerge as a complex yet fascinating challenge. The paper, "Measuring Style Similarity in Diffusion Models," dives into the intricate task of quantifying and extracting style from images, especially in the context of text-to-image models like Stable Diffusion. A novel framework is proposed, comprising a curated dataset, LAION-Styles, alongside a methodological approach to derive what the paper terms as Contrastive Style Descriptors (CSD), aimed at attributing and matching styles effectively.

Dataset Curated for Style Attribution

A notable contribution of the paper is the introduction of LAION-Styles, a dataset engineered to scaffold the extraction of style descriptors. This subset, drawn from the vast LAION dataset, focuses on images paired with style tags—accumulating 511,921 images against 3840 style tags. The authors detail the dataset curation process, highlighting a significant challenge in managing the imbalance inherent in such broad collections and underscoring the careful consideration given to deduplication and tag accuracy.

Contrastive Style Descriptors (CSD)

Central to the paper is the conceptualization and development of Contrastive Style Descriptors (CSD), which innovatively leverages both self-supervised learning (SSL) and a multi-label contrastive learning scheme. In contrast to traditional SSL approaches that often neglect style as a variable, the presented method meticulously preserves stylistic elements through the learning process. Additionally, the dual nature of the learning objective, blending SSL with the supervision informed by LAION-Styles, ensures that human perceptions of style are encapsulated within the descriptors. Significant results demonstrate the superiority of CSD over prevalent pre-trained models and style retrieval methodologies, evidenced by quantitative evaluations on benchmark datasets such as DomainNet and WikiArt.

Analysis of Style Replication in Stable Diffusion

The application of CSD extends beyond dataset creation and model training into a probing exploration of style replication within the Stable Diffusion model. Through a series of experiments and analyses, the paper investigates how styles of different artists are replicated or omitted in generated images. A case paper detailing the "General Style Similarity" scores across various artists provides insightful observations on the model's capability and biases when rendering styles. Remarkably, this section not only underscores the utility of CSD in attributing styles to artists but also sparks discussions on the implications of generative models in artistic content creation.

Implications and Future Developments

The research delineates both practical and theoretical avenues for the continuation of work in the field of generative AI and art. Practically, the framework enables deeper insights into the provenance of styles within generated images, serving artists, designers, and model users alike. Theoretically, it raises compelling questions about the nature of style as an aesthetic concept, especially when intersected with machine learning methodologies. Looking ahead, the implications for copyright, originality, and artistic tribute are ripe areas for further exploration.

Conclusion

"Measuring Style Similarity in Diffusion Models" presents a robust examination into the characterization and attribution of style within the context of diffusion models. The creation of LAION-Styles, alongside the development of Contrastive Style Descriptors, marks a significant advance in the field—pioneering not just in its technical achievements but also in its broader implications for understanding and leveraging artistic styles in generative AI.

Youtube Logo Streamline Icon: https://streamlinehq.com