- The paper analyzes the limitations of linguistic prompts in capturing the full emotional and metaphorical essence of traditional art.
- It evaluates how generative models rely on vast, often copyrighted datasets, questioning artistic originality and diversity.
- It discusses the detachment from material embodiment in TTI systems, challenging the notion of prompt-based outputs as genuine art.
Is Writing Prompts Really Making Art?
The paper explores the role and implications of generative machine learning systems in art creation through text prompts, fundamentally questioning whether this process constitutes true artistic creation. The authors examine several aspects of Text-to-Image (TTI) systems, such as DALL-E 2, MidJourney, and Stable Diffusion, and consider the cultural, ethical, and creative ramifications of these technologies.
Limitations of Linguistic Description
One of the primary limitations of using text prompts for art creation is the inherent difficulty in fully capturing complex artistic visions through linguistic means. The authors argue that many forms of visual art convey meanings and emotions that cannot be easily expressed in words. Text prompts reduce the richness of artistic expression to mere symbolic representations that generative systems might convert into images, missing deeper metaphorical or abstract meanings.
Figure 1: Image generated by stable diffusion from the text prompt: ``still life with human skulls of different sizes, a rose, the most beautiful image ever seen, trending on art station, hyperrealistic, 8k, studio lighting, shallow focus, unreal engine''.
Data Implications
The paper discusses the parasitic nature of generative models that draw on vast datasets of human-created art, deriving artistic value while potentially diminishing human art. Large-scale datasets used in training these models often include copyrighted content, raising ethical questions about authorship and originality. The reinforcement of statistical norms in datasets may marginalize less popular or culturally diverse artistic expressions, reducing the heterogeneity in outputs produced by TTI systems.

Figure 2: Two maps: a) Broadway generated by Stable Diffusion from the text prompt: ``An abstract map of Broadway in yellow, blue and red''.. b) Broadway Boogie Woogie by Piet Mondrian (image credit: Wally Gobetz).
Materiality and Embodiment
The discussion highlights the detachment of TTI systems from physical embodiment and material agency, which are central to many traditional art forms. Human creativity is deeply intertwined with physical interactions and sensory experiences that are absent in generative models. The black-box nature of TTI systems further obscures their creative process, prioritizing outputs without revealing the underlying decision-making framework.

Figure 3: Two images generated using DALL-E 2: a) Text prompt: an astronaut riding a horse'' b) Text prompt:a horse riding an astronaut''.
A New Artistic Medium?
The authors consider the potential for TTI systems to become a novel artistic medium, akin to past technological innovations in art. Despite the current limitations, there is potential for artists to creatively engage with these systems, highlighting their unique properties or exploiting peculiarities in the generated imagery.
Figure 4: Cosmopolitan cover created by digital artist Karen X. Cheng using DALL-E 2 and the text prompt ``wide-angle shot from below of a female astronaut with an athletic feminine body walking with swagger toward camera on Mars in an infinite universe, synthwave digital art''.
Conclusion
Ultimately, while TTI systems offer a new avenue for producing visual outputs, the question of whether writing prompts can genuinely make art presents complex ethical and creative challenges. The authors assert that while prompt-based systems can generate aesthetically pleasing images, they lack human-like intentionality and authenticity. As with previous technological advances, the artistic value of these systems may lie in their unorthodox use by artists who can transcend mere imitation. Further exploration into the capabilities and limits of these systems is necessary to fully understand their place in the broader art world.
Figure 5: Image generated by DALLE-2. Prompt: a man covered in tattoos of English words, long hair, rings on fingers, cinematic lighting, 8k.