Specialization on truly private codebases

Ascertain whether repository specialization with SERA yields comparable performance improvements on truly private codebases that were not included in base-model or teacher-model training data and that models have never seen, given the lack of directly testable evaluation instances.

Background

Specialization experiments target public repositories (Django, Sympy, Sphinx) using evaluation instances from SWE-bench Verified, which may overlap with model pretraining data.

The authors note potential bias and explicitly state they have not verified specialization on truly private codebases due to the absence of evaluation data.

References

While specialization effects are well-studied in fine-tuning scaling laws and our results appear plausible, we have not verified specialization on truly private codebases that models have never seen because we have no evaluation data to test this directly.

SERA: Soft-Verified Efficient Repository Agents  (2601.20789 - Shen et al., 28 Jan 2026) in Section 9 (Limitations), Private repo specialization