Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 27 tok/s
GPT-5 High 22 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 163 tok/s Pro
2000 character limit reached

We're Different, We're the Same: Creative Homogeneity Across LLMs (2501.19361v1)

Published 31 Jan 2025 in cs.CY, cs.AI, cs.CL, and cs.LG

Abstract: Numerous powerful LLMs are now available for use as writing support tools, idea generators, and beyond. Although these LLMs are marketed as helpful creative assistants, several works have shown that using an LLM as a creative partner results in a narrower set of creative outputs. However, these studies only consider the effects of interacting with a single LLM, begging the question of whether such narrowed creativity stems from using a particular LLM -- which arguably has a limited range of outputs -- or from using LLMs in general as creative assistants. To study this question, we elicit creative responses from humans and a broad set of LLMs using standardized creativity tests and compare the population-level diversity of responses. We find that LLM responses are much more similar to other LLM responses than human responses are to each other, even after controlling for response structure and other key variables. This finding of significant homogeneity in creative outputs across the LLMs we evaluate adds a new dimension to the ongoing conversation about creativity and LLMs. If today's LLMs behave similarly, using them as a creative partners -- regardless of the model used -- may drive all users towards a limited set of "creative" outputs.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper reveals that LLMs, while scoring high on individual creativity tasks, exhibit markedly more homogeneous outputs than human responses.
  • Methodology employs standardized tests (AUT, FF, DAT) across 22 models, uncovering lower semantic variability in LLM outputs.
  • Findings indicate that integrating LLMs in creative tasks may constrain diversity due to convergent output clustering even with optimized prompts.

Creative Homogeneity Across LLMs

The paper "We're Different, We're the Same: Creative Homogeneity Across LLMs" examines the creative output variability between human subjects and a range of LLMs. It seeks to understand whether the popular use of LLMs as creative partners intrinsically leads to more homogeneous creative outputs, as compared to human creativity.

Introduction and Motivation

LLMs have established themselves as prominent tools for augmenting writing, generating ideas, and other creative processes. Despite their advertised potential to enhance creativity, recent observations suggest that LLM-assisted creative endeavors often culminate in outputs that exhibit significant similarity. Previous studies only investigated this phenomenon with individual LLMs, raising anecdotal questions about whether this creative convergence is a result of individual model limitations or is inherent across all LLMs.

Methodology

The methodology involves eliciting creative outputs from both human participants and a diverse set of LLMs using standardized creativity tests: Guilford's Alternative Uses Task (AUT), Forward Flow (FF), and the Divergent Association Task (DAT). The paper tests 22 models spanning different LLM architectures, with special analyses conducted on a subset to control for potential confounding variables such as model family.

Population-Level Variability: The analysis focuses on semantic similarity metrics to evaluate response diversity across populations. Figure 1

Figure 1

Figure 1

Figure 1: LLM responses exhibit far less variability than human responses

.

Key Findings

Population-Level Response Variability

The empirical results highlight a marked homogeneity in LLM outputs when compared to those of humans. Despite equivalently high scores for individual creativity, LLM responses are much more similar to each other than human responses are to each other, with LLM responses clustering closely in feature space. Figure 2

Figure 2: LLM responses cluster together in feature space more than do human responses.

Response Structure Influence

Despite the variations in prompt engineering to minimize structural differences in responses between LLMs and humans, the consistency of LLM output similarity persisted. Even with optimal prompt versions and one-word response comparisons, LLM outputs remained consistently more homogeneous. Figure 3

Figure 3

Figure 3: Even when considering only one-word responses to control for response structure, LLM AUT responses have lower population-level variability (left plot) and are closer in feature space (right plot) than human responses.

LLM Family Effects

The research further scrutinizes models from the same "family," specifically the Llama family, to deduce whether architectural or systemic similarities exacerbate output homogeneity. Models within the same lineage exhibited slightly higher homogeneity compared to varied-model evaluations. Figure 4

Figure 4: Models from the same family (Llama) exhibit slightly lower population-level variability than models from different families.

Analysis and Implications

The practical implications of using LLMs as creativity support tools are substantial. The paper suggests that integrating LLMs into creative workflows might inadvertently promote a narrow band of creativity, potentially constraining the breadth of creative exploration and expression available to users.

Furthermore, the observed homogeneity across varying model architectures and prompt designs underlines a pivotal limitation in the current landscape of generative AI: while these models can simulate creativity to a commendable degree individually, their collective outputs lack the diversity and novelty akin to human creativity.

Conclusion

This research underscores the critical need for further exploration into enhancing LLM output diversity, potentially through revolutionary training paradigms or innovative prompt engineering strategies. While LLMs exhibit proficiency in individual instances of creative problem-solving, the broader challenge for future developments lies in fostering genuine diversity akin to human creativity. This pursuit will be central to ensuring LLM contributions can imbue rather than constrain creative processes across domains.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube