Behavior outside the NTK linearization regime

Ascertain the generalization and performance behavior of SAMerging and the associated PAC-Bayes excess-risk guarantees when the Neural Tangent Kernel local linearization assumption does not hold, specifically when the merged parameters deviate sufficiently far from the pretrained checkpoint so that the NTK approximation is invalid.

Background

The main bound in the paper (Theorem 2) passes from posterior-level PAC-Bayesian guarantees to a single merged parameter by invoking a local Neural Tangent Kernel (NTK) linearization around the pretrained weights. This enables convexity and smoothness in score space to relate Gaussian posteriors to their means and derive single-model generalization terms.

The authors caution that this analysis is local: the NTK approximation is most faithful near the pretrained checkpoint, whereas merging may move parameters farther away. Understanding how SAMerging and its bound behave outside this local regime remains unresolved.

References

The analysis assumes a local NTK-style linearization, so behavior far from this regime is uncertain.

Model Merging via Multi-Teacher Knowledge Distillation (2512.21288 - Dalili et al., 24 Dec 2025) in Section 5, Conclusion — Limitations and future work