Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Temperature the Creativity Parameter of Large Language Models? (2405.00492v1)

Published 1 May 2024 in cs.CL and cs.AI
Is Temperature the Creativity Parameter of Large Language Models?

Abstract: LLMs are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence. We find that temperature is weakly correlated with novelty, and unsurprisingly, moderately correlated with incoherence, but there is no relationship with either cohesion or typicality. However, the influence of temperature on creativity is far more nuanced and weak than suggested by the "creativity parameter" claim; overall results suggest that the LLM generates slightly more novel outputs as temperatures get higher. Finally, we discuss ideas to allow more controlled LLM creativity, rather than relying on chance via changing the temperature parameter.

Analyzing Temperature's Effect on Creativity in LLMs

The paper "Is Temperature the Creativity Parameter of LLMs?" investigates the widely held notion that the temperature parameter in LLMs controls their creativity. The authors, Max Peeperkorn, Tom Kouwenhoven, Dan Brown, and Anna Jordanous, engage in an empirical examination of this claim by evaluating LLM-generated narratives across different temperature settings, specifically focusing on four creativity conditions: novelty, typicality, cohesion, and coherence. This research is particularly relevant as LLMs like ChatGPT have become increasingly integrated into creative domains, sparking a need for a deeper understanding of their generative capabilities.

Temperature and Creativity in LLMs

Temperature is a hyperparameter governing the randomness in LLMs' output generation, effectively balancing probabilities for word candidate selection. Higher temperatures lead to increased randomness and diversity, ostensibly enhancing creativity, while lower temperatures result in more deterministic outputs. However, this paper challenges the oversimplification of temperature as the "creativity parameter."

Methodology

To measure the influence of temperature on creativity, the researchers employed the Llama 2-Chat 70B model to generate narratives from a fixed prompt across varying temperature settings. They established a baseline—termed the "exemplar object"—by setting the temperature to near zero, resulting in a deterministic output serving as a reference point. The authors assessed the stories using computational metrics like semantic similarity and edit distance and conducted a human evaluation to provide insights into perceived creativity.

Key Findings

  1. Weak Correlation with Novelty: Temperature showed a weak positive correlation with narrative novelty, suggesting that higher temperatures can facilitate some degree of novel output. This indicates a limited exploratory potential within the LLM's probabilistic LLM.
  2. Negative Impact on Coherence: A moderate negative correlation was observed between temperature and coherence, highlighting a trade-off where increased novelty at higher temperatures leads to decreased coherence.
  3. Lack of Relationship with Typicality and Cohesion: Notably, temperature exhibited no significant relationship with the typicality or cohesion of the generated content, undermining its designation as a straightforward creativity parameter.

These findings underscore the nuanced role of temperature in modulating creativity-related attributes in LLM outputs. While high temperature might diversify outputs, contributing to novelty, it compromises coherence—a pivotal aspect of storytelling quality.

Implications and Future Directions

The research offers several practical implications and pathways for future exploration:

  • Advanced Decoding Strategies: Designing more sophisticated decoding strategies might provide better quality creative outputs than merely adjusting the temperature.
  • Creativity Benchmarks: Developing standardized benchmarks to evaluate creativity in LLMs rigorously is crucial for drawing more substantial conclusions.
  • Prompt Engineering: Investigating how implicit knowledge within LLMs can be leveraged through advanced prompt engineering could offer greater control over creative outputs.

Conclusion

The paper contributes significantly to the understanding of LLMs' creative potential by dissecting the influence of temperature on various creativity dimensions. It invites a reevaluation of conventional beliefs and encourages the development of refined methodologies and tools to fully harness the creative power of AI. This research stands as a testament to the complexity inherent in computational creativity, advocating for a more holistic approach to unlocking it in LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. 1985. A learning algorithm for boltzmann machines. Cognitive Science 9(1):147–169.
  2. 2019. Guided neural language generation for automated storytelling. In Second Workshop on Storytelling, 46–55. Florence, Italy: ACL.
  3. 2021. Mirostat: A neural text decoding algorithm that directly controls perplexity. In International Conference on Learning Representations.
  4. Boden, M. 1992. The Creative Mind. London: Abacus.
  5. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, 1877–1901. Curran Associates, Inc.
  6. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint. arXiv:2303.12712.
  7. 2020. How novelists use generative language models: An exploratory user study. In HAI-GEN+user2agent.
  8. Casebourne, I. 1996. The grandmother program: A hybrid system for automated story generation. In Second International Symposium of Creativity and Cognition (Loughborough, England, 1996), 146–155.
  9. 2023. Probing the “creativity” of large language models: Can models produce divergent semantic association? In Findings of the Association for Computational Linguistics: EMNLP 2023, 12881–12888. ACL.
  10. Colton, S. 2008. Creativity versus the perception of creativity in computational systems. In AAAI spring symposium: creative intelligent systems, volume 8,  7.
  11. D’Souza, R. 2021. What characterises creativity in narrative writing, and how do we assess it? research findings from a systematic literature search. Thinking Skills and Creativity 42:100949.
  12. Gärdenfors, P. 2014. The geometry of meaning: Semantics based on conceptual spaces. MIT press.
  13. 2014. What to expect when you’re expecting: The role of unexpectedness in computationally evaluating creativity. In 5th International Conference on Computational Creativity, 120–128. ACC.
  14. 2004. Coh-Metrix: Analysis of text on cohesion and language. Behavior research methods, instruments, & computers 36(2):193–202.
  15. 2021. Toward automated story generation with markov chain monte carlo methods and deep neural networks. AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 13(2):191–197.
  16. 2020. The curious case of neural text degeneration. In International Conference on Learning Representations.
  17. Jordanous, A. 2012. A standardised procedure for evaluating creative systems: Computational creativity evaluation based on what it is to be creative. Cogn. Compu. 4(3):246–279.
  18. 2023. There and back again: Extracting formal domains for controllable neurosymbolic story authoring. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 19, 64–74.
  19. 2024. A study on large language models’ limitations in multiple-choice question answering. Preprint. arXiv:2401.07955.
  20. 2018. Evaluating computational creativity: An interdisciplinary tutorial. ACM Comput. Surv. 51(2).
  21. 2017. Synthetic literature: Writing science fiction in a co-creative process. In Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017), 29–37. ACL.
  22. Martindale, C. 1990. The Clockwork Muse: The Predictability of Artistic Change. Basic Books.
  23. 1984. Given versus induced category representations: Use of prototype and exemplar information in classification. Journal of Experimental Psychology: Learning, Memory, and Cognition 10(3):333.
  24. 1978. Context theory of classification learning. Psychological review 85(3):207.
  25. Meehan, J. R. 1977. TALE-SPIN, an interactive program that writes stories. In 5th International Joint Conference on Artificial Intelligence, 91–98. Morgan Kaufmann Publishers.
  26. 2023. Locally typical sampling. Transactions of the Association for Computational Linguistics 11:102–121.
  27. 2023. Using ChatGPT for story sifting in narrative generation. In 14th International Conference on Computational Creativity, 387–391. ACC.
  28. 2021. Incorporating algorithmic information theory into fundamental concepts of computational creativity. In 12th International Conference on Computational Creativity, 173–181. ACC.
  29. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2):133–142.
  30. 2023. On characterizations of large language models and creativity evaluation. In 14th International Conference on Computational Creativity, 143–147. ACC.
  31. 2001. MEXICA: A computer model of a cognitive account of creative writing. Journal of Experimental and Theoretical Artificial Intelligence 13(2):119–139.
  32. Ritchie, G. 2007. Some empirical criteria for attributing creativity to a computer program. Minds & Machines 17:76–99.
  33. 2018. Automated assistance for creative writing with an rnn language model. In 23rd International Conference on Intelligent User Interfaces Companion. ACM.
  34. 1976. Basic objects in natural categories. Cognitive Psychology 8(3):382–439.
  35. Rosch, E. 1973. Natural categories. Cognitive Psychology 4(3):328–350.
  36. 2012. The standard definition of creativity. Creativity Research Journal 24(1):92–96.
  37. Satterthwaite, F. E. 1941. Synthesis of variance. Psychometrika 6(5):309–316.
  38. 2023. On the power of special-purpose gpt models to create and evaluate new poetry in old styles. In 14th International Conference on Computational Creativity, 10–19. ACC.
  39. 2010. statsmodels: Econometric and statistical modeling with Python. In 9th Python in Science Conference.
  40. Simonton, D. K. 2012. Taking the U.S. patent office criteria seriously: A quantitative three-criterion creativity definition and its implications. Creativity Research Journal 24(2-3):97–106.
  41. Simonton, D. K. 2023. The blind-variation and selective-retention theory of creativity: Recent developments and current status of BVSR. Creativity Research Journal 35(3):304–323.
  42. 1991. A planning mechanism for generating story text. Literary and Linguistic Computing 6(2):119–126.
  43. Toplyn, J. 2022. Witscript 2: A system for generating improvised jokes without wordplay. In 13th International Conference on Computational Creativity, 54–58. ACC.
  44. 2023. Llama 2: Open foundation and fine-tuned chat models. Preprint. arXiv:2307.09288.
  45. Turner, S. R. 1994. The Creative Process: A Computer Model of Storytelling and Creativity. Lawrence Erlbaum Associates.
  46. 2019. Narrative generation in the wild: Methods from NaNoGenMo. In Second Workshop on Storytelling, 65–74. ACL.
  47. 2022. Craft an iron sword: Dynamically generating interactive game characters by prompting large language models tuned on code. In 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022), 25–43. ACL.
  48. 2023. Mind the instructions: A holistic evaluation of consistency and interactions in prompt-based learning. In 27th Conference on Computational Natural Language Learning (CoNLL), 294–313. ACL.
  49. Wittgenstein, L. 1953. Philosophical Investigations. London: Blackwell.
  50. 2023. Neural story planning. In The AAAI-23 Workshop on Creative AI Across Modalities.
  51. 2022. Wordcraft: Story writing with large language models. In 27th International Conference on Intelligent User Interfaces, 841–852. ACM.
  52. 2012. Image characterization and classification by physical complexity. Complexity 17(3):26–42.
  53. 2021. Trading off diversity and quality in natural language generation. In Workshop on Human Evaluation of NLP Systems (HumEval), 25–33. ACL.
  54. 2023. Revisiting block-based quantisation: What is important for sub-8-bit LLM inference? In 2023 Conference on Empirical Methods in Natural Language Processing, 9988–10006. ACL.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Max Peeperkorn (5 papers)
  2. Tom Kouwenhoven (9 papers)
  3. Dan Brown (7 papers)
  4. Anna Jordanous (5 papers)
Citations (22)
Youtube Logo Streamline Icon: https://streamlinehq.com