Designing reasoning-enhancing methods to maximize originality and diversity
Develop and evaluate frameworks for reinforcement learning of language models, chain-of-thought prompting, and scaling test-time compute that explicitly optimize for originality relative to the training set and diversity across multiple responses in open-ended tasks, rather than only enhancing the quality of single examples.
Sponsor
References
It is unclear how to design them to maximize originality against a training set, and diversity over multiple responses.
— Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
(2504.15266 - Nagarajan et al., 21 Apr 2025) in Section 6 (Discussion), Subsection: Effects of reasoning-enhancing methods