Reproducibility of DeepCoder’s reported 60.6% LiveCodeBench Pass@1 accuracy
Determine whether the reported 60.6% Pass@1 accuracy of the DeepCoder 14B model on LiveCodeBench can be reproduced under the same sampling parameters and software environment described in the DeepCoder evaluation.
References
We were unable to replicate the 60.6% result despite our best efforts to match their sampling parameters and software environment.
— LLMs Can Learn to Reason Via Off-Policy RL
(2602.19362 - Ritter et al., 22 Feb 2026) in Section: Results on Code Generation, Pass@k performance paragraph (footnote)