Cross-model generalization of findings

Determine whether the empirical findings reported for SERA using the Qwen 3-32B base model and GLM-4.5-Air/GLM-4.6 teachers generalize to other language model families when evaluated thoroughly.

Background

Experiments in the paper use Qwen 3-32B as the base model and GLM-4.5-Air/GLM-4.6 as teachers, with limited checks using Claude models.

The authors explicitly state uncertainty about whether their findings generalize across other base model families.

References

While we have some experiments with Claude 3.7 Sonnet and CLaude 4.0 Sonnet that hints at generalization of our method, we do not know whether our findings generalize to other model families when evaluated thoroughly.

SERA: Soft-Verified Efficient Repository Agents  (2601.20789 - Shen et al., 28 Jan 2026) in Section 9 (Limitations), Model-specific results