Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Creative Beam Search: LLM-as-a-Judge For Improving Response Generation (2405.00099v4)

Published 30 Apr 2024 in cs.AI, cs.CL, cs.HC, and cs.LG

Abstract: LLMs are revolutionizing several areas, including artificial creativity. However, the process of generation in machines profoundly diverges from that observed in humans. In particular, machine generation is characterized by a lack of intentionality and an underlying creative process. We propose a method called Creative Beam Search that uses Diverse Beam Search and LLM-as-a-Judge to perform response generation and response validation. The results of a qualitative experiment show how our approach can provide better output than standard sampling techniques. We also show that the response validation step is a necessary complement to the response generation step.

Analyzing the Creative Beam Search Methodology

The paper "Creative Beam Search" by Giorgio Franceschelli and Mirco Musolesi introduces an innovative approach aimed at enhancing the creative capacities of LLMs through a novel sampling scheme. The methodology put forth, termed Creative Beam Search (CBS), attempts to bridge the contrast between human creativity and machine generation. By integrating Diverse Beam Search (DBS) and the LLM-as-a-Judge approach, the authors propose a two-step process that simulates the creative production stages: response generation and response validation.

Core Contributions

The CBS methodology is grounded in the idea of replicating key components of the human creative process, as suggested in the componential model of creativity. The process of CBS is divided into two main phases:

  1. Response Generation: Harnessing Diverse Beam Search, CBS generates an array of potential outputs. Unlike standard Beam Search, which often results in similar sequences, DBS introduces diversity by penalizing token selections that overlap across different sequence groups, thus ensuring variability in generated outputs. This phase is critical as it aims to simulate the response generation step in human creativity, leveraging creativity-relevant skills.
  2. Response Validation: The second phase employs an evaluative framework inspired by the LLM-as-a-Judge methodology. Here, the model assesses the quality and creativity of the options generated in the first phase. The process involves the model selecting the best candidate from a set of potential responses based on a self-assessment mechanism. This phase mirrors the domain-relevant skill component of human creativity, emphasizing the refinement and selection of the most appropriate creative output.

Experimental Insights

The paper presents a qualitative paper involving graduate students to evaluate the efficacy of CBS compared to traditional sampling techniques. Notably, CBS was preferred for its perceived creativity in 45% of cases, outscoring standard sampling mechanisms. Interestingly, the self-evaluation step resulted in a decision pattern that deviated from random selection, reinforcing its contributory value. The response validation process appeared to enhance DBS outputs further, demonstrating its utility as a complement to the primary generation phase.

Implications and Future Directions

The CBS approach provides several insights into the potential for improving creative outputs in machine-generated content. By fostering diversity and leveraging self-assessment, CBS aligns more closely with aspects of the human creative process compared to traditional LLM outputs. However, limitations remain, such as the reliance on Hamming diversity and the inherent lack of genuine intentionality and consciousness in LLMs. These factors underscore the artificial nature of the simulated creativity process.

The paper paves the way for further exploration into combining CBS with more advanced LLM configurations or those trained with creativity-oriented strategies. An avenue for future research might include expanding the set of candidate outputs for validation to potentially yield more creatively diverse results. Additionally, other LLMs could be considered to assess the generalizability and scalability of the CBS framework.

In conclusion, the paper contributes significantly to the discourse on enhancing machine creativity and outlines a feasible path towards refining the capabilities of LLMs in generating creative content. Despite its challenges, the CBS approach holds promise in computational creativity research, offering a structured methodology to replicate aspects of human-like creativity in artificial systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. 2019. Gradio: Hassle-free sharing and testing of ML models in the wild. arXiv:1906.02569 [cs.LG].
  2. Amabile, T. M. 1983. The social psychology of creativity: A componential conceptualization. Journal of Personality and Social Psychology 45(2):357–376.
  3. 2022. Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073 [cs.CL].
  4. 2020. Bridging generative deep learning and computational creativity. In Proceedings of the 11th International Conference on Computational Creativity (ICCC’20).
  5. 2021. On the opportunities and risks of foundation models. arXiv:2108.07258 [cs.LG].
  6. 2023. Quality-Diversity through AI feedback. arXiv:2310.13032 [cs.CL].
  7. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NIPS’20).
  8. 2020. Language GANs falling short. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20).
  9. 2024. Self-play fine-tuning converts weak language models to strong language models. arXiv:2401.01335 [cs.LG].
  10. 2023. Can large language models be an alternative to human evaluations? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23).
  11. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems (NeurIPS’17).
  12. 2023. Quality diversity through human feedback. In Proceedings of the NeurIPS’23 ALOE Workshop.
  13. 2023. On the creativity of large language models. arXiv:2304.00008 [cs.AI].
  14. 2024. Creativity and machine learning. ACM Computing Surveys. Accepted for Publication. To Appear.
  15. 2023. Pushing GPT’s creativity to its limits: Alternative Uses and Torrance Tests. In Proceedings of the 14th International Conference on Computational Creativity (ICCC’23).
  16. 2017. Lexically constrained decoding for sequence generation using grid beam search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17).
  17. 2020. The curious case of neural text degeneration. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20).
  18. 2023. RLAIF: Scaling reinforcement learning from human feedback with AI feedback. arXiv:2309.00267 [cs.CL].
  19. 2023. Is AI art another industrial revolution in the making? In Proceedings of the AAAI’23 Creative AI Across Modalities Workshop.
  20. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL].
  21. 2018. Analyzing uncertainty in neural machine translation. In Proceedings of the 35th International Conference on Machine Learning (ICML’18).
  22. 2023. Leveraging human preferences to master poetry. In Proceedings of the AAAI’23 Workshop on Creative AI Across Modalities.
  23. 2023a. Bits of Grass: Does GPT already know how to write like Whitman? In Proceedings of the 14th International Conference on Computational Creativity (ICCC’23).
  24. 2023b. On the power of special-purpose GPT models to create and evaluate new poetry in old styles. In Proc. of the 14th International Conference on Computational Creativity (ICCC’23).
  25. Shanahan, M. 2024. Talking about large language models. Communications of the ACM 67(2):68–79.
  26. 2022. Putting GPT-3’s creativity to the (Alternative Uses) Test. In Proceedings of the 13th International Conference on Computational Creativity (ICCC’22).
  27. 2023. Brainstorm, then select: a generative language model improves its creativity score. In Proceedings of the AAAI’23 Workshop on Creative AI Across Modalities.
  28. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288 [cs.CL].
  29. 2018. Diverse beam search for improved description of complex scenes. Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).
  30. 2023. Large language models are not fair evaluators. arXiv:2305.17926 [cs.CL].
  31. 2022. Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT’22).
  32. 2023. Self-evaluation guided beam search for reasoning. In Proceedings of the 37th Conference on Neural Information Processing Systems (NIPS’23).
  33. 2024. Self-rewarding language models. arXiv:2401.10020 [cs.CL].
  34. 2023. Judging LLM-as-a-judge with MT-bench and chatbot arena. In Proceedings of the 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NIPS’23).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Giorgio Franceschelli (11 papers)
  2. Mirco Musolesi (81 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com