Measuring Style Similarity in Diffusion Models (2404.01292v1)
Abstract: Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a generated image is used for professional purposes. Existing tools for this purpose focus on retrieving images of similar semantic content. Meanwhile, many artists are concerned with style replication in text-to-image models. We present a framework for understanding and extracting style descriptors from images. Our framework comprises a new dataset curated using the insight that style is a subjective property of an image that captures complex yet meaningful interactions of factors including but not limited to colors, textures, shapes, etc. We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model. We showcase promising results in various style retrieval tasks. We also quantitatively and qualitatively analyze style attribution and matching in the Stable Diffusion model. Code and artifacts are available at https://github.com/learn2phoenix/CSD.
- Adobe. Firefly, 2023. URL https://www.adobe.com/sensei/generative-ai/firefly.html.
- Genre and style based painting classification. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 588–594. IEEE, 2015.
- Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5297–5307, 2016.
- Explain me the painting: Multi-topic knowledgeable art description generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5422–5432, 2021.
- Frozen in time: A joint video and image encoder for end-to-end retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1728–1738, 2021.
- A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
- Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.
- Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia, 20(9):2491–2502, 2018.
- Decrypt. Greg rutkowski removed from stable diffusion but brought back by ai artists, March 2024. URL https://decrypt.co/150575/greg-rutkowski-removed-from-stable-diffusion-but-brought-back-by-ai-artists.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- A learned representation for artistic style. arXiv preprint arXiv:1610.07629, 2016.
- Unsupervised image style embeddings for retrieval and recognition tasks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3281–3289, 2020.
- How to read paintings: semantic art understanding with multi-modal retrieval. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
- A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
- Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016.
- How much data are augmentations worth? an investigation into scaling laws, invariance, and implicit regularization. arXiv preprint arXiv:2210.06441, 2022.
- James Jerome Gibson. The senses considered as perceptual systems. 1966.
- Statistics, vision, and the analysis of artistic style. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2):115–123, 2012.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
- Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pages 172–189, 2018.
- Stylometrics of artwork: uses and limitations. In Computer Vision and Image Analysis of Art, volume 7531, pages 91–105. SPIE, 2010.
- Comparing higher-order spatial statistics and perceptual judgements in the stylometric analysis of art. In 2011 19th European Signal Processing Conference, pages 1244–1248. IEEE, 2011.
- Svd: A large-scale short video dataset for near-duplicate video retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5281–5289, 2019.
- Art style classification with self-trained ensemble of autoencoding transformations. arXiv preprint arXiv:2012.03377, 2020.
- Recognizing image style. arXiv preprint arXiv:1311.3715, 2013.
- The art and science of portraiture. John Wiley & Sons, 2002.
- Recognizing art style automatically in painting with deep learning. In Asian conference on machine learning, pages 327–342. PMLR, 2017.
- Cosmo: Content-style modulation for image retrieval with text feedback. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 802–812, June 2021.
- Rhythmic brushstrokes distinguish van gogh from his contemporaries: findings via automated brushstroke extraction. IEEE transactions on pattern analysis and machine intelligence, 34(6):1159–1176, 2011.
- Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In Proceedings of the IEEE international conference on computer vision, pages 990–998, 2015.
- Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4990–4998, 2017.
- Elements of style: learning perceptual shape style similarity. ACM Transactions on graphics (TOG), 34(4):1–14, 2015.
- Cnn-based style vector for style image retrieval. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pages 309–312, 2016.
- Distinguishing literary styles using neural networks. In Handbook of neural computation, pages G8–1. CRC Press, 2020.
- Deep ensemble art style recognition. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2020.
- Midjourney. Midjourney, n.d. URL https://www.midjourney.com/home.
- Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
- Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems, 33:7198–7211, 2020.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1406–1415, 2019.
- Würstchen: An efficient architecture for large-scale text-to-image diffusion models. In The Twelfth International Conference on Learning Representations, 2023.
- pharmapsychotic. Clip interrogator. https://github.com/pharmapsychotic/clip-interrogator, 2023.
- A self-supervised descriptor for image copy detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14532–14542, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
- Classification of style in fine-art paintings using transfer learning and weighted image patches. In 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS), pages 1–7. IEEE, 2018.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Aladin: all layer adaptive instance normalization for fine-grained style similarity. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11926–11935, 2021.
- Hierarchical classification of paintings using face-and brush stroke models. In Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), volume 1, pages 172–174. IEEE, 1998.
- Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855, 2015.
- Two-stage deep learning approach to the classification of fine-art paintings. IEEE Access, 7:41770–41781, 2019.
- Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
- Automatic analysis of artistic paintings using information-based measures. Pattern Recognition, 114:107864, 2021.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023a.
- Understanding and mitigating copying in diffusion models. Advances in Neural Information Processing Systems, 36:47783–47803, 2023b.
- Wikiartvectors: style and color representations of artworks for cultural analysis via information theoretic measures. Entropy, 24(9):1175, 2022.
- Separating style and content. Advances in neural information processing systems, 9, 1996.
- Teaching matters: Investigating the role of supervision in vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Fine-grained image style transfer with visual transformers. In Proceedings of the Asian Conference on Computer Vision, pages 841–857, 2022.
- Evaluating data attribution for text-to-image models. arXiv preprint arXiv:2306.09345, 2023.
- Bam! the behance artistic media dataset for recognition beyond photography. In Proceedings of the IEEE international conference on computer vision, pages 1202–1211, 2017.
- Defining pictorial style: Lessons from linguistics and computer graphics. Axiomathes, 15(3):319–351, 2005.
- Unsupervised learning of artistic styles with archetypal style analysis. Advances in Neural Information Processing Systems, 31, 2018.
- Characterizing elegance of curves computationally for distinguishing morrisseau paintings and the imitations. In 2009 16th IEEE International Conference on Image Processing (ICIP), pages 73–76. IEEE, 2009.
- Style transfer via image component analysis. IEEE Transactions on multimedia, 15(7):1594–1601, 2013.
- Gowthami Somepalli (20 papers)
- Anubhav Gupta (12 papers)
- Kamal Gupta (22 papers)
- Shramay Palta (5 papers)
- Micah Goldblum (96 papers)
- Jonas Geiping (73 papers)
- Abhinav Shrivastava (120 papers)
- Tom Goldstein (226 papers)