Reward Function Design for Open-Ended Scientific Discovery
Develop a formal, computationally evaluable reward function for training reinforcement learning–based large language model (LLM) scientific agents that quantifies novelty, impact, and reproducibility in open-ended scientific discovery tasks, enabling effective reward-guided learning when definitive final signals are sparse or unavailable.
References
Designing a reward function that can measure novelty, impact, or reproducibility is a major, unresolved problem. For open-ended scientific exploration, defining and calculating the reward remains the most central obstacle on the path to a general-purpose Science Agent.
— Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics
(2510.09901 - Zhou et al., 10 Oct 2025) in Discussions — From LLM Reasoning to Agentic Reasoning via Reinforcement Learning, The Reward Problem