Principledness of post‑hoc logit calibration after mask discretization

Determine whether applying a learned affine calibration (a scale and shift fitted by limited‑step L‑BFGS) to the final logits after discretizing node masks in the structured pruning procedure for weight‑sparse transformers is a principled and faithful practice, or whether this post‑hoc adjustment introduces methodological bias or artifact in evaluating pruned circuits.

Background

The paper introduces a structured pruning method that learns binary node masks to isolate minimal task‑specific circuits in weight‑sparse transformers. After training the continuous mask parameters, the authors discretize the masks and observe that the resulting pruned models are often uncalibrated.

To mitigate this, they apply a post‑hoc affine calibration to the final logits by optimizing a scale and shift using a small number of L‑BFGS steps. However, they explicitly note uncertainty about the methodological validity of this step, raising a question about whether such calibration is principled when assessing the faithfulness and performance of pruned circuits.

References

As we find that our discretized models often are quite uncalibrated, we optimize a scale+shift transformation to the final logits using 16 steps of LBFGS. It's unclear whether this is principled to do in general.

Weight-sparse transformers have interpretable circuits  (2511.13653 - Gao et al., 17 Nov 2025) in Appendix, Method details, Subsection “Pruning algorithm,” Mask discretization paragraph