Designing reasoning-enhancing methods to maximize originality and diversity

Develop and evaluate frameworks for reinforcement learning of language models, chain-of-thought prompting, and scaling test-time compute that explicitly optimize for originality relative to the training set and diversity across multiple responses in open-ended tasks, rather than only enhancing the quality of single examples.

Background

The authors note that methods such as RL, chain-of-thought prompting, and scaling test-time compute are geared toward improving the quality of a single output, which does not directly address originality and diversity across multiple outputs.

They explicitly state that it is unclear how to design these methods to maximize originality and diversity in the open-ended setting, highlighting a methodological gap.

References

It is unclear how to design them to maximize originality against a training set, and diversity over multiple responses.

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction (2504.15266 - Nagarajan et al., 21 Apr 2025) in Section 6 (Discussion), Subsection: Effects of reasoning-enhancing methods