Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works (2210.08477v3)

Published 16 Oct 2022 in cs.HC

Abstract: Large-scale Text-to-image Generation Models (LTGMs) (e.g., DALL-E), self-supervised deep learning models trained on a huge dataset, have demonstrated the capacity for generating high-quality open-domain images from multi-modal input. Although they can even produce anthropomorphized versions of objects and animals, combine irrelevant concepts in reasonable ways, and give variation to any user-provided images, we witnessed such rapid technological advancement left many visual artists disoriented in leveraging LTGMs more actively in their creative works. Our goal in this work is to understand how visual artists would adopt LTGMs to support their creative works. To this end, we conducted an interview study as well as a systematic literature review of 72 system/application papers for a thorough examination. A total of 28 visual artists covering 35 distinct visual art domains acknowledged LTGMs' versatile roles with high usability to support creative works in automating the creation process (i.e., automation), expanding their ideas (i.e., exploration), and facilitating or arbitrating in communication (i.e., mediation). We conclude by providing four design guidelines that future researchers can refer to in making intelligent user interfaces using LTGMs.

Citations (92)

View on Semantic Scholar

Summary

Large-scale Text-to-Image Generation Models and Their Role in Visual Artists' Works

The paper by Ko et al. titled "Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works" focuses on the potential applications and implications of using Large-scale Text-to-Image Generation Models (LTGMs) in the domain of visual arts. LTGMs, such as DALL-E, have demonstrated the capability to generate high-quality images from textual prompts and multi-modal inputs. This paper primarily aims to investigate how visual artists might leverage these models to enhance their creative processes.

Summary

The authors conducted an interview paper with 28 visual artists spanning 35 unique visual art domains and performed a systematic literature review of 72 system/application papers. The paper was structured around understanding the ways in which visual artists could integrate LTGMs into their workflows and how these models might alter the creative landscape.

Key Findings

The paper reveals that visual artists perceive LTGMs as versatile tools capable of fulfilling various roles:

Automation: LTGMs can automate repetitive tasks, thereby allowing artists more time to focus on core creative activities.
Exploration: LTGMs can serve as a tool for expanding creative ideas by generating a wide range of novel imagery based on diverse inputs, acting as inspiration for artists.
Mediation: These models can facilitate better communication between artists and clients or collaborators by providing visual representations that help convey ideas more effectively.

The paper further identifies that LTGMs pose several limitations, particularly concerning their inability to generate artworks involving deep contextual or philosophical interpretation, as well as personalization to reflect an artist's unique style or domain-specific knowledge.

Implications

The implications of this research are manifold. Practically, LTGMs can serve as a new reference tool, enabling artists to retrieve and visualize concepts with unprecedented speed and diversity. Moreover, they can play a crucial role in prototyping and ideation, especially beneficial to novice artists by providing low-fi prototypes that help educators and industry practitioners overcome technical or skill-based barriers in traditional art-making processes.

Theoretically, the research suggests that LTGMs can redefine art creation paradigms by enabling a new form of collaboration between AI and human creativity. Future developments in AI, particularly in foundation models, will likely continue this trajectory, providing new functionalities and becoming more deeply integrated into the artist's toolbox.

Design Guidelines

Based on the paper findings, the authors propose four design guidelines for developing intelligent user interfaces that leverage LTGMs:

Support variability level specification for different art types to cater to varying creative needs.
Enable model customization to reflect domain-specific knowledge and artist identity.
Increase controllability using multi-modal inputs to improve interactive collaboration.
Develop prompt engineering tools to aid artists in crafting effective textual inputs.

Future Research Directions

Future research could address the limitations of LTGMs, focusing on better personalization and integration of domain-specific knowledge. The ethical considerations concerning the use of LTGMs, particularly around intellectual property and biases in model outputs, need more exploration to ensure responsible usage. The societal impact, including the potential displacement of traditional art forms and artists, will be crucial to understand as we advance toward a more AI-integrated art world.

In summary, while LTGMs present substantial opportunities to transform the visual arts landscape, careful examination of their limitations and ethical implications is essential to realize their full potential responsibly.