Defining a Versatile Reward Model for SWE Across TTS and RL
Determine the defining properties of a versatile reward model for software engineering agents that remains effective across both test-time scaling (TTS) and reinforcement learning (RL), and ascertain whether high TTS performance implies effectiveness in RL or whether TTS and RL impose different requirements on reward models.
Sponsor
References
As we aim to develop a versatile reward model that can be applied across different scenarios such as TTS and RL, but it is unknown what defines such a versatile reward model and whether TTS and RL impose different requirements.
— SWE-RM: Execution-free Feedback For Software Engineering Agents
(2512.21919 - Shum et al., 26 Dec 2025) in Section 3.1