Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a generated image is used for professional purposes. Existing tools for this purpose focus on retrieving images of similar semantic content. Meanwhile, many artists are concerned with style replication in text-to-image models. We present a framework for understanding and extracting style descriptors from images. Our framework comprises a new dataset curated using the insight that style is a subjective property of an image that captures complex yet meaningful interactions of factors including but not limited to colors, textures, shapes, etc. We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model. We showcase promising results in various style retrieval tasks. We also quantitatively and qualitatively analyze style attribution and matching in the Stable Diffusion model. Code and artifacts are available at https://github.com/learn2phoenix/CSD.
The paper introduces a novel framework for quantifying and extracting style from images, particularly in the context of text-to-image models, using a curated dataset named LAION-Styles and a method named Contrastive Style Descriptors (CSD).
LAION-Styles, a dataset engineered for style attribution, focuses on images with style tags, aimed at facilitating the extraction of style descriptors from a vast collection.
Contrastive Style Descriptors leverage self-supervised learning and multi-label contrastive learning to preserve stylistic elements, demonstrating superiority over prevalent models and methodologies in style retrieval.
The application of CSD in analyzing style replication in the Stable Diffusion model reveals insights into the model’s capacity and biases in rendering artistic styles, and discusses future implications for generative AI in art.
Diffusion models have taken a significant role in the generative tasks involving image creation, where understanding and replicating artistic styles emerge as a complex yet fascinating challenge. The paper, "Measuring Style Similarity in Diffusion Models," dives into the intricate task of quantifying and extracting style from images, especially in the context of text-to-image models like Stable Diffusion. A novel framework is proposed, comprising a curated dataset, LAION-Styles, alongside a methodological approach to derive what the paper terms as Contrastive Style Descriptors (CSD), aimed at attributing and matching styles effectively.
A notable contribution of the paper is the introduction of LAION-Styles, a dataset engineered to scaffold the extraction of style descriptors. This subset, drawn from the vast LAION dataset, focuses on images paired with style tags—accumulating 511,921 images against 3840 style tags. The authors detail the dataset curation process, highlighting a significant challenge in managing the imbalance inherent in such broad collections and underscoring the careful consideration given to deduplication and tag accuracy.
Central to the paper is the conceptualization and development of Contrastive Style Descriptors (CSD), which innovatively leverages both self-supervised learning (SSL) and a multi-label contrastive learning scheme. In contrast to traditional SSL approaches that often neglect style as a variable, the presented method meticulously preserves stylistic elements through the learning process. Additionally, the dual nature of the learning objective, blending SSL with the supervision informed by LAION-Styles, ensures that human perceptions of style are encapsulated within the descriptors. Significant results demonstrate the superiority of CSD over prevalent pre-trained models and style retrieval methodologies, evidenced by quantitative evaluations on benchmark datasets such as DomainNet and WikiArt.
The application of CSD extends beyond dataset creation and model training into a probing exploration of style replication within the Stable Diffusion model. Through a series of experiments and analyses, the paper investigates how styles of different artists are replicated or omitted in generated images. A case study detailing the "General Style Similarity" scores across various artists provides insightful observations on the model's capability and biases when rendering styles. Remarkably, this section not only underscores the utility of CSD in attributing styles to artists but also sparks discussions on the implications of generative models in artistic content creation.
The research delineates both practical and theoretical avenues for the continuation of work in the realm of generative AI and art. Practically, the framework enables deeper insights into the provenance of styles within generated images, serving artists, designers, and model users alike. Theoretically, it raises compelling questions about the nature of style as an aesthetic concept, especially when intersected with machine learning methodologies. Looking ahead, the implications for copyright, originality, and artistic tribute are ripe areas for further exploration.
"Measuring Style Similarity in Diffusion Models" presents a robust examination into the characterization and attribution of style within the context of diffusion models. The creation of LAION-Styles, alongside the development of Contrastive Style Descriptors, marks a significant advance in the field—pioneering not just in its technical achievements but also in its broader implications for understanding and leveraging artistic styles in generative AI.
Adobe. Firefly, 2023. https://www.adobe.com/sensei/generative-ai/firefly.html.
Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8
Decrypt. Greg rutkowski removed from stable diffusion but brought back by ai artists, March 2024. https://decrypt.co/150575/greg-rutkowski-removed-from-stable-diffusion-but-brought-back-by-ai-artists.
Midjourney. Midjourney, n.d. https://www.midjourney.com/home.
pharmapsychotic. Clip interrogator. https://github.com/pharmapsychotic/clip-interrogator