Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Assessing and Understanding Creativity in Large Language Models (2401.12491v1)

Published 23 Jan 2024 in cs.CL and cs.AI

Abstract: In the field of natural language processing, the rapid development of LLM has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy and efficiency. This paper aims to establish an efficient framework for assessing the level of creativity in LLMs. By adapting the modified Torrance Tests of Creative Thinking, the research evaluates the creative performance of various LLMs across 7 tasks, emphasizing 4 criteria including Fluency, Flexibility, Originality, and Elaboration. In this context, we develop a comprehensive dataset of 700 questions for testing and an LLM-based evaluation method. In addition, this study presents a novel analysis of LLMs' responses to diverse prompts and role-play situations. We found that the creativity of LLMs primarily falls short in originality, while excelling in elaboration. Besides, the use of prompts and the role-play settings of the model significantly influence creativity. Additionally, the experimental results also indicate that collaboration among multiple LLMs can enhance originality. Notably, our findings reveal a consensus between human evaluations and LLMs regarding the personality traits that influence creativity. The findings underscore the significant impact of LLM design on creativity and bridges artificial intelligence and human creativity, offering insights into LLMs' creativity and potential applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  2. Touvron, H. et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  3. Wu, Y. et al. Autoformalization with large language models. Advances in Neural Information Processing Systems 35, 32353–32368 (2022).
  4. Laskar, M. T. R. et al. A systematic study and comprehensive evaluation of chatgpt on benchmark datasets. In Rogers, A., Boyd-Graber, J. L. & Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, 431–469 (2023).
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, vol. 1, 2 (2019).
  6. The idea machine: Llm-based expansion, rewriting, combination, and suggestion of ideas. In Proceedings of the 14th Conference on Creativity and Cognition, 623–627 (2022).
  7. Creative data generation: A review focusing on text and poetry. arXiv preprint arXiv:2305.08493 (2023).
  8. Zhao, Z. et al. More human than human: Llm-generated narratives outperform human-llm interleaved narratives. In Proceedings of the 15th Conference on Creativity and Cognition, 368–370 (2023).
  9. Li, Y. et al. Competition-level code generation with alphacode. Science 378, 1092–1097 (2022).
  10. Kasneci, E. et al. Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences 103, 102274 (2023).
  11. Siyao, L. et al. Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11050–11059 (2022).
  12. A systematic evaluation of gpt-2-based music generation. In International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar), 19–35 (Springer, 2022).
  13. The standard definition of creativity. Creativity research journal 24, 92–96 (2012).
  14. Judging the creative prowess of ai. Nature Machine Intelligence 1–1 (2023).
  15. Torrance, E. P. Torrance tests of creative thinking: Directions manual and scoring guide (Personnel Press, Incorporated, 1966).
  16. Creativity assessment: Pitfalls, solutions, and standards. Psychology of Aesthetics, Creativity, and the Arts 13, 131 (2019).
  17. Guilford’s structure of intellect model and model of creativity: Contributions and limitations. Creativity Research Journal 13, 309–316 (2001).
  18. Divergent thinking: New methods, recent research, and extended theory. Psychology of Aesthetics, Creativity, and the Arts 13, 153 (2019).
  19. A scattered cat: A critical evaluation of the consensual assessment technique for creativity research. Psychology of Aesthetics, Creativity, and the Arts 13, 159 (2019).
  20. Kaufman, J. C. Self-assessments of creativity: Not ideal, but better than you think. Psychology of aesthetics, creativity, and the arts 13, 187 (2019).
  21. Creativity assessment in psychological research:(re) setting the standards. Psychology of Aesthetics, Creativity, and the Arts 13, 233 (2019).
  22. Kim, K. H. The apa 2009 division 10 debate: Are the torrance tests of creative thinking still relevant in the 21st century? Psychology of Aesthetics, Creativity, and the Arts 5, 302 (2011).
  23. Plucker, J. A. Is the proof in the pudding? reanalyses of torrance’s (1958 to present) longitudinal data. In Longitudinal Studies of Creativity, 103–114 (Routledge, 2013).
  24. Kim, K. H. Can we trust creativity tests? a review of the torrance tests of creative thinking (ttct). Creativity research journal 18, 3–14 (2006).
  25. Personal factors of creativity: A second order meta-analysis. Revista de Psicología del Trabajo y de las Organizaciones 31, 165–173 (2015).
  26. Ma, H.-H. The effect size of variables associated with creativity: A meta-analysis. Creativity Research Journal 21, 30–42 (2009).
  27. Chang, Y. et al. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109 (2023).
  28. Probing the psychology of ai models. Proceedings of the National Academy of Sciences 120, e2300963120 (2023).
  29. On the creativity of large language models. arXiv preprint arXiv:2304.00008 (2023).
  30. Brainstorm, then select: a generative language model improves its creativity score. In The AAAI-23 Workshop on Creative AI Across Modalities (2023).
  31. Putting gpt-3’s creativity to the (alternative uses) test. arXiv preprint arXiv:2206.08932 (2022).
  32. Large language models are fixated by red herrings: Exploring creative problem solving and einstellung effect using the only connect wall dataset. arXiv preprint arXiv:2306.11167 (2023).
  33. The originality of machines: Ai takes the torrance test. Journal of Creativity 33, 100065 (2023).
  34. Bai, Y. et al. Benchmarking foundation models with language-model-as-an-examiner. arXiv preprint arXiv:2306.04181 (2023).
  35. Zheng, L. et al. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685 (2023).
  36. Chan, C.-M. et al. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201 (2023).
  37. Liu, Y. et al. G-eval: Nlg evaluation using gpt-4 with better human alignment, may 2023. arXiv preprint arXiv:2303.16634 (2023).
  38. Using large language models in psychology. Nature Reviews Psychology 2, 688–701 (2023).
  39. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958 (2021).
  40. Xiang, T. et al. Care-mi: chinese benchmark for misinformation evaluation in maternity and infant care. arXiv preprint arXiv:2307.01458 (2023).
  41. Xu, Q. et al. On the tool manipulation capability of open-source large language models. arXiv preprint arXiv:2305.16504 (2023).
  42. Large language models are zero-shot reasoners. Advances in neural information processing systems 35, 22199–22213 (2022).
  43. Collaborative creativity—group creativity and team innovation. In Handbook of organizational creativity, 327–357 (Elsevier, 2012).
  44. Creative collaboration and collaborative creativity: a systematic literature review. Frontiers in Psychology 12, 713445 (2021).
  45. Driess, D. et al. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).
  46. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  47. Computational creativity and music generation systems: An introduction to the state of the art. Frontiers in Artificial Intelligence 3, 14 (2020).
  48. Wang, L. et al. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432 (2023).
  49. Wolf, T. et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38–45 (2020).
  50. Bai, J. et al. Qwen technical report. arXiv preprint arXiv:2309.16609 (2023).
  51. New paradigms for assessing emotional intelligence: theory and data. Emotion 8, 540 (2008).
  52. The toronto empathy questionnaire: Scale development and initial validation of a factor-analytic solution to multiple empathy measures. Journal of personality assessment 91, 62–71 (2009).
  53. Generalized self-efficacy scale. J. Weinman, S. Wright, & M. Johnston, Measures in health psychology: A user’s portfolio. Causal and control beliefs 35, 37 (1995).
  54. The big-five trait taxonomy: History, measurement, and theoretical perspectives (1999).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Yunpu Zhao (4 papers)
  2. Rui Zhang (1138 papers)
  3. Wenyi Li (11 papers)
  4. Di Huang (203 papers)
  5. Jiaming Guo (37 papers)
  6. Shaohui Peng (20 papers)
  7. Yifan Hao (28 papers)
  8. Yuanbo Wen (19 papers)
  9. Xing Hu (122 papers)
  10. Zidong Du (41 papers)
  11. Qi Guo (237 papers)
  12. Ling Li (112 papers)
  13. Yunji Chen (51 papers)
Citations (8)
Youtube Logo Streamline Icon: https://streamlinehq.com