When and why does pass@k policy optimization degrade pass@1 performance?

Determine when and why optimizing the pass@k objective through policy optimization degrades single-shot pass@1 performance for large language models on verifiable tasks, by identifying the conditions under which this trade-off appears and explaining the mechanisms that cause it.

Background

The paper studies inference-aware fine-tuning for LLMs using the pass@k metric, which evaluates success if any of k independently sampled responses pass a verifier. Recent works show that optimizing pass@k can improve multi-attempt metrics while sometimes reducing single-shot pass@1 performance—a practically important constraint due to latency, cost, verifier coverage, and fallback reliability.

The authors introduce the concept of prompt interference and analyze how pass@k policy gradients implicitly reweight prompts toward low-success prompts. When these prompts are negatively interfering, such reweighting can rotate the pass@k update direction away from the pass@1 direction, potentially causing pass@1 degradation. The open question asks for a principled characterization of when and why this degradation occurs.

References

Open question. Despite growing adoption of pass@$k$ objectives, it is still not well understood why pass@$k$ optimization can hurt pass@1, and when we should expect this trade-off to appear. Without a principled explanation, it is difficult to design reliable inference-aware fine-tuning methods that deliver multi-attempt gains while preserving strong single-shot performance. This leads to our research question: "When and why can pass@$k$ policy optimization degrade pass@1 performance?"

— Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training (2602.21189 - Barakat et al., 24 Feb 2026) in Section 1: Introduction (Open question)

When and why does pass@k policy optimization degrade pass@1 performance?

Background

References

Related Problems