Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play (2405.06373v4)

Published 10 May 2024 in cs.CL and cs.AI

Abstract: LLMs have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, we propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges and ensures convergence to creative answers. Moreover, we adopt a role-playing technique by assigning distinct roles to LLMs to combat the homogeneity of LLMs. We evaluate the efficacy of the proposed framework with the Alternative Uses Test, Similarities Test, Instances Test, and Scientific Creativity Test through both LLM evaluation and human study. The results show that our proposed framework outperforms single-LLM approaches and existing multi-LLM frameworks across various creativity metrics. The code is available at https://github.com/lawraa/LLM-Discussion.

Enhancing Creativity in LLMs via Collaborative Discussion and Role-Playing

The paper, "LLM Discussion: Enhancing the Creativity of LLMs via Discussion Framework and Role-Play," introduces an innovative approach to augment the creativity of LLMs by simulating multi-agent interaction akin to human brainstorming sessions. The authors propose a structured discussion framework that harnesses role-playing to diversify perspectives, thus fostering creative idea generation.

Framework Description

The proposed framework involves a three-phase discussion process designed to encourage dynamic idea exchanges among LLMs. The phases—initiation, discussion, and convergence—are tailored with specific prompts to maintain vigorous interaction and guide the LLMs toward producing creative outputs. By assigning distinct roles to each LLM, the approach aims to mitigate the limitations of homogeneity, which hinders creativity in standard collaborative LLM settings.

Evaluation and Results

The framework's effectiveness is assessed using four established creativity benchmarks: the Alternative Uses Test, Instances Test, Similarities Test, and Scientific Creativity Test. These benchmarks measure various aspects of creativity, including originality, elaboration, fluency, and flexibility. LLM-based evaluation and human studies reveal that the proposed method significantly enhances creative output compared to single-LLM and other multi-LLM frameworks. Notably, the LLM Discussion framework achieved higher scores in originality and elaboration metrics across all tests.

Key Insights and Observations

  • Collaborative Dynamics: The structured interaction promotes divergent thinking by requiring LLMs to build upon each other's suggestions rather than converge prematurely, as observed in traditional debate-type frameworks.
  • Role-Specific Contributions: By assigning roles like Environmentalist, Visionary Millionaire, and Futurist, the framework ensures that LLMs approach problems from diverse angles, enriching the discussion with unique insights.
  • Conceptual Complexity: The enhanced elaboration scores suggest that role-play contributes to more detailed and intricately developed ideas, reflecting a broader range of thought and in-depth exploration.

Implications and Future Directions

This research opens avenues for applying the role-enhanced discussion framework to other facets of computational creativity and beyond. Practically, this could influence the design of creative AI applications, making them better suited for tasks requiring innovative problem-solving. Theoretically, it challenges the community to rethink how interactions between AI agents can be structured to maximize collective intelligence and creativity.

Future work could include integrating human participants into the discussion framework, exploring the synergy between human intuition and LLM's computational creativity, and further refining evaluation metrics to encompass nuanced aspects of creativity. Additionally, examining the framework's robustness across diverse LLM architectures and domains could provide insights into its generalized applicability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Evaluating correctness and faithfulness of instruction-following models for question answering. ArXiv, 2023.
  2. R Botsch. Scopes and methods of political science, 2011.
  3. Language models are few-shot learners. 2020.
  4. Maria Camacho. David kelley: From design to design thinking at stanford and ideo. She Ji: The Journal of Design, Economics, and Innovation, 2016.
  5. Art or artifice? large language models and the false promise of creativity. ArXiv, 2023.
  6. Chateval: Towards better LLM-based evaluators through multi-agent debate. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=FQepisCUWu.
  7. Evaluating large language models trained on code. ArXiv, 2021.
  8. Training verifiers to solve math word problems. ArXiv, 2021a.
  9. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021b.
  10. A report on the 40year follow-up of the torrance tests of creative thinking: Alive and well in the new millennium. Gifted Child Quarterly - GIFTED CHILD QUART, 2005.
  11. Thespian: Multi-character text role-playing game agents. ArXiv, 2023.
  12. Enhancing chat language models by scaling high-quality instructional conversations. In Empirical Methods in Natural Language Processing, 2023.
  13. Automatic scoring of metaphor creativity with large language models. Creativity Research Journal, 2024.
  14. Improving factuality and reasoning in language models through multiagent debate. ArXiv, 2023.
  15. Using gpt-4 to augment unbalanced data for automatic scoring. ArXiv, abs/2310.18365, 2023. URL https://api.semanticscholar.org/CorpusID:264590461.
  16. A confederacy of models: a comprehensive evaluation of llms on creative writing. Findings of the Association for Computational Linguistics: EMNLP 2023, 2023.
  17. A confederacy of models: a comprehensive evaluation of LLMs on creative writing. In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023.
  18. Is group work beneficial for producing creative designs in stem design education? International Journal of Technology and Design Education, 2022.
  19. Measuring massive multitask language understanding. ArXiv, 2020.
  20. Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021.
  21. A scientific creativity test for secondary school students. International Journal of Science Education, 2002.
  22. Agentcoder: Multi-agent-based code generation with iterative testing and optimisation. ArXiv, 2023.
  23. Creative writing with an ai-powered writing assistant: Perspectives from professional writers. ArXiv, 2022.
  24. Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. In Annual Meeting of the Association for Computational Linguistics, 2023.
  25. Large language models are state-of-the-art evaluators of translation quality. In Annual Conference of the European Association for Machine Translation, 2023.
  26. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, 2022.
  27. Shalom Lappuin. Assessing the strengths and weaknesses of large language models. Journal of Logic, Language and Information, 2024.
  28. Chatharuhi: Reviving anime character in reality via large language model. ArXiv, 2023a.
  29. Large language models understand and can be enhanced by emotional stimuli. 2023b.
  30. Camel: Communicative agents for ”mind” exploration of large language model society. In Neural Information Processing Systems, 2023c.
  31. Encouraging divergent thinking in large language models through multi-agent debate. ArXiv, 2023.
  32. Tinygsm: achieving ¿80% on gsm8k with small language models. ArXiv, abs/2312.09241, 2023a. URL https://api.semanticscholar.org/CorpusID:266210221.
  33. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. ArXiv, 2023b.
  34. Skeleton-of-thought: Large language models can do parallel decoding. In International Conference on Learning Representations, 2024.
  35. Gpt-4 technical report. 2023.
  36. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, 2022a.
  37. Training language models to follow instructions with human feedback. In Neural Information Processing Systems, 2022b.
  38. Vishakh Padmakumar and He He. Does writing with language models reduce content diversity? In International Conference on Learning Representations, 2024.
  39. Generative agents: Interactive simulacra of human behavior. ACM Symposium on User Interface Software and Technology, 2023.
  40. Instruction tuning with gpt-4. ArXiv, abs/2304.03277, 2023. URL https://api.semanticscholar.org/CorpusID:257985497.
  41. Is chatGPT a general-purpose natural language processing task solver? In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
  42. Lamp: When large language models meet personalization. ArXiv, 2023.
  43. Six thinking hats method for developing critical thinking skills. Journal of Educational Science and Technology, 2019.
  44. Role play with large language models. Nature, 2023.
  45. Character-llm: A trainable agent for role-playing. 2023.
  46. Putting gpt-3’s creativity to the (alternative uses) test. ArXiv, 2022.
  47. Brainstorm, then select: a generative language model improves its creativity score. In The AAAI-23 Workshop on Creative AI Across Modalities, 2023.
  48. Ellis Paul Torrance. Torrance Tests of Creative Thinking. Norms-Technical Manual. Research Edition. Verbal Tests Forms a and B. Figural Tests Forms a and B. Personnel Press, 1966.
  49. Zeroshotdataaug: Generating and augmenting training data with chatgpt. ArXiv, abs/2304.14334, 2023. URL https://api.semanticscholar.org/CorpusID:258352747.
  50. Modes of thinking in young children: A study of the creativity-intelligence distinction. American Psychological Association, 1965.
  51. Is ChatGPT a good NLG evaluator? a preliminary study. In New Frontiers in Summarization Workshop, 2023a.
  52. Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews. 2023b.
  53. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. ArXiv, 2023c.
  54. Multi-party chat: Conversational agents in group settings with humans and models. ArXiv, 2023.
  55. Autogen: Enabling next-gen llm applications via multi-agent conversation. 2023.
  56. Large language models as optimizers. ArXiv, 2023a.
  57. Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 2023b.
  58. Wordcraft: Story writing with large language models. In 27th International Conference on Intelligent User Interfaces, 2022.
  59. Sentiment analysis in the era of large language models: A reality check. ArXiv, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Li-Chun Lu (3 papers)
  2. Shou-Jen Chen (2 papers)
  3. Tsung-Min Pai (2 papers)
  4. Chan-Hung Yu (3 papers)
  5. Hung-yi Lee (325 papers)
  6. Shao-Hua Sun (22 papers)
Citations (16)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com