Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 79 tok/s

Gemini 2.5 Pro 57 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 39 tok/s Pro

GPT-4o 109 tok/s Pro

Kimi K2 197 tok/s Pro

GPT OSS 120B 453 tok/s Pro

Claude Sonnet 4.5 38 tok/s Pro

2000 character limit reached

Reward Models Enable Scalable Code Verification by Trading Accuracy for Throughput (2506.10056v1)

Published 11 Jun 2025 in cs.SE and cs.PL

Abstract: The standard paradigm for solving coding tasks via LLMs is to generate-then-rank programs, where the latter step uses a verifier in the ranking process. The growing consensus is that a comprehensive verifier (e.g., a full test suite) should be prioritized over an outcome reward model (ORM) whenever possible, with little consideration given to the trade-offs involved. We aim to challenge this assumption by systematically exploring the tradeoff between speed and accuracy. We find that ORMs play a crucial role in scaling verification through trading accuracy for speed, even when a comprehensive verifier is available. Their value becomes especially apparent when used in a generate-prune-then-rank approach, where a faster but less accurate verifier removes incorrect solutions prior to ranking -- leading to a system that is 11.65x faster while only being 8.33% less accurate than the full test suite. We analyze the generate-prune-then-rank approach and show that it works by filtering out incorrect but highly ranked solutions. These findings enable the design of scalable and accurate program ranking systems.