Emergent Mind

Measuring Style Similarity in Diffusion Models

(2404.01292)
Published Apr 1, 2024 in cs.CV and cs.LG

Abstract

Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a generated image is used for professional purposes. Existing tools for this purpose focus on retrieving images of similar semantic content. Meanwhile, many artists are concerned with style replication in text-to-image models. We present a framework for understanding and extracting style descriptors from images. Our framework comprises a new dataset curated using the insight that style is a subjective property of an image that captures complex yet meaningful interactions of factors including but not limited to colors, textures, shapes, etc. We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model. We showcase promising results in various style retrieval tasks. We also quantitatively and qualitatively analyze style attribution and matching in the Stable Diffusion model. Code and artifacts are available at https://github.com/learn2phoenix/CSD.

Diffusion model favoritism towards better-reproduced styles in dual-tag prompts, influenced by training data prevalence.

Overview

  • The paper introduces a novel framework for quantifying and extracting style from images, particularly in the context of text-to-image models, using a curated dataset named LAION-Styles and a method named Contrastive Style Descriptors (CSD).

  • LAION-Styles, a dataset engineered for style attribution, focuses on images with style tags, aimed at facilitating the extraction of style descriptors from a vast collection.

  • Contrastive Style Descriptors leverage self-supervised learning and multi-label contrastive learning to preserve stylistic elements, demonstrating superiority over prevalent models and methodologies in style retrieval.

  • The application of CSD in analyzing style replication in the Stable Diffusion model reveals insights into the model’s capacity and biases in rendering artistic styles, and discusses future implications for generative AI in art.

Measuring Style Similarity in Diffusion Models with Contrastive Style Descriptors

Introduction

Diffusion models have taken a significant role in the generative tasks involving image creation, where understanding and replicating artistic styles emerge as a complex yet fascinating challenge. The paper, "Measuring Style Similarity in Diffusion Models," dives into the intricate task of quantifying and extracting style from images, especially in the context of text-to-image models like Stable Diffusion. A novel framework is proposed, comprising a curated dataset, LAION-Styles, alongside a methodological approach to derive what the paper terms as Contrastive Style Descriptors (CSD), aimed at attributing and matching styles effectively.

Dataset Curated for Style Attribution

A notable contribution of the paper is the introduction of LAION-Styles, a dataset engineered to scaffold the extraction of style descriptors. This subset, drawn from the vast LAION dataset, focuses on images paired with style tags—accumulating 511,921 images against 3840 style tags. The authors detail the dataset curation process, highlighting a significant challenge in managing the imbalance inherent in such broad collections and underscoring the careful consideration given to deduplication and tag accuracy.

Contrastive Style Descriptors (CSD)

Central to the paper is the conceptualization and development of Contrastive Style Descriptors (CSD), which innovatively leverages both self-supervised learning (SSL) and a multi-label contrastive learning scheme. In contrast to traditional SSL approaches that often neglect style as a variable, the presented method meticulously preserves stylistic elements through the learning process. Additionally, the dual nature of the learning objective, blending SSL with the supervision informed by LAION-Styles, ensures that human perceptions of style are encapsulated within the descriptors. Significant results demonstrate the superiority of CSD over prevalent pre-trained models and style retrieval methodologies, evidenced by quantitative evaluations on benchmark datasets such as DomainNet and WikiArt.

Analysis of Style Replication in Stable Diffusion

The application of CSD extends beyond dataset creation and model training into a probing exploration of style replication within the Stable Diffusion model. Through a series of experiments and analyses, the paper investigates how styles of different artists are replicated or omitted in generated images. A case study detailing the "General Style Similarity" scores across various artists provides insightful observations on the model's capability and biases when rendering styles. Remarkably, this section not only underscores the utility of CSD in attributing styles to artists but also sparks discussions on the implications of generative models in artistic content creation.

Implications and Future Developments

The research delineates both practical and theoretical avenues for the continuation of work in the realm of generative AI and art. Practically, the framework enables deeper insights into the provenance of styles within generated images, serving artists, designers, and model users alike. Theoretically, it raises compelling questions about the nature of style as an aesthetic concept, especially when intersected with machine learning methodologies. Looking ahead, the implications for copyright, originality, and artistic tribute are ripe areas for further exploration.

Conclusion

"Measuring Style Similarity in Diffusion Models" presents a robust examination into the characterization and attribution of style within the context of diffusion models. The creation of LAION-Styles, alongside the development of Contrastive Style Descriptors, marks a significant advance in the field—pioneering not just in its technical achievements but also in its broader implications for understanding and leveraging artistic styles in generative AI.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
References
  1. Adobe. Firefly, 2023. https://www.adobe.com/sensei/generative-ai/firefly.html.

  2. Genre and style based painting classification. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 588–594. IEEE
  3. Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5297–5307
  4. Explain me the painting: Multi-topic knowledgeable art description generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5422–5432
  5. Frozen in time: A joint video and image encoder for end-to-end retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1728–1738
  6. A Cookbook of Self-Supervised Learning
  7. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8

  8. Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV)
  9. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE
  10. Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia, 20(9):2491–2502
  11. Decrypt. Greg rutkowski removed from stable diffusion but brought back by ai artists, March 2024. https://decrypt.co/150575/greg-rutkowski-removed-from-stable-diffusion-but-brought-back-by-ai-artists.

  12. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
  13. A Learned Representation For Artistic Style
  14. Unsupervised image style embeddings for retrieval and recognition tasks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3281–3289
  15. How to read paintings: semantic art understanding with multi-modal retrieval. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0
  16. A Neural Algorithm of Artistic Style
  17. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423
  18. How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
  19. James Jerome Gibson. The senses considered as perceptual systems. 1966.
  20. Statistics, vision, and the analysis of artistic style. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2):115–123
  21. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738
  22. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510
  23. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pages 172–189
  24. Stylometrics of artwork: uses and limitations. In Computer Vision and Image Analysis of Art, volume 7531, pages 91–105. SPIE
  25. Comparing higher-order spatial statistics and perceptual judgements in the stylometric analysis of art. In 2011 19th European Signal Processing Conference, pages 1244–1248. IEEE
  26. Svd: A large-scale short video dataset for near-duplicate video retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5281–5289
  27. Art Style Classification with Self-Trained Ensemble of AutoEncoding Transformations
  28. Recognizing Image Style
  29. The art and science of portraiture. John Wiley & Sons
  30. Recognizing art style automatically in painting with deep learning. In Asian conference on machine learning, pages 327–342. PMLR
  31. Cosmo: Content-style modulation for image retrieval with text feedback. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 802–812, June 2021.
  32. Rhythmic brushstrokes distinguish van gogh from his contemporaries: findings via automated brushstroke extraction. IEEE transactions on pattern analysis and machine intelligence, 34(6):1159–1176
  33. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In Proceedings of the IEEE international conference on computer vision, pages 990–998
  34. Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4990–4998
  35. Elements of style: learning perceptual shape style similarity. ACM Transactions on graphics (TOG), 34(4):1–14
  36. Cnn-based style vector for style image retrieval. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pages 309–312
  37. Distinguishing literary styles using neural networks. In Handbook of neural computation, pages G8–1. CRC Press
  38. Deep ensemble art style recognition. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE
  39. Midjourney. Midjourney, n.d. https://www.midjourney.com/home.

  40. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26
  41. Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems, 33:7198–7211
  42. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1406–1415
  43. Würstchen: An efficient architecture for large-scale text-to-image diffusion models. In The Twelfth International Conference on Learning Representations
  44. pharmapsychotic. Clip interrogator. https://github.com/pharmapsychotic/clip-interrogator

  45. A self-supervised descriptor for image copy detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14532–14542
  46. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR
  47. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR
  48. Classification of style in fine-art paintings using transfer learning and weighted image patches. In 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS), pages 1–7. IEEE
  49. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695
  50. Aladin: all layer adaptive instance normalization for fine-grained style similarity. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11926–11935
  51. Hierarchical classification of paintings using face-and brush stroke models. In Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), volume 1, pages 172–174. IEEE
  52. Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature
  53. Two-stage deep learning approach to the classification of fine-art paintings. IEEE Access, 7:41770–41781
  54. LAION-5B: An open large-scale dataset for training next generation image-text models
  55. Automatic analysis of artistic paintings using information-based measures. Pattern Recognition, 114:107864
  56. Very Deep Convolutional Networks for Large-Scale Image Recognition
  57. Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023a.
  58. Understanding and mitigating copying in diffusion models. Advances in Neural Information Processing Systems, 36:47783–47803, 2023b.
  59. Wikiartvectors: style and color representations of artworks for cultural analysis via information theoretic measures. Entropy, 24(9):1175
  60. Separating style and content. Advances in neural information processing systems, 9
  61. Teaching matters: Investigating the role of supervision in vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
  62. Fine-grained image style transfer with visual transformers. In Proceedings of the Asian Conference on Computer Vision, pages 841–857
  63. Evaluating Data Attribution for Text-to-Image Models
  64. Bam! the behance artistic media dataset for recognition beyond photography. In Proceedings of the IEEE international conference on computer vision, pages 1202–1211
  65. Defining pictorial style: Lessons from linguistics and computer graphics. Axiomathes, 15(3):319–351
  66. Unsupervised learning of artistic styles with archetypal style analysis. Advances in Neural Information Processing Systems, 31
  67. Characterizing elegance of curves computationally for distinguishing morrisseau paintings and the imitations. In 2009 16th IEEE International Conference on Image Processing (ICIP), pages 73–76. IEEE
  68. Style transfer via image component analysis. IEEE Transactions on multimedia, 15(7):1594–1601

Show All 68