Extension of SAMerging to generative tasks

Extend SAMerging from classification to generative tasks and evaluate its effectiveness, adapting the multi-teacher knowledge distillation setup and assessing performance on generative modeling benchmarks.

Background

Empirical results in the paper focus on multi-task classification across vision suites (TA-8, TALL-14, TALL-20) and NLP classification benchmarks (GLUE), using unlabeled calibration data and a KL-based multi-teacher distillation objective.

The authors explicitly note that evaluation is limited to classification tasks and leave the extension to generative tasks for future work, indicating a concrete gap in validation and potential methodological adaptation needed for generative settings.

References

Our evaluation focuses solely on classification; extending it to generative tasks is left for future work.

Model Merging via Multi-Teacher Knowledge Distillation (2512.21288 - Dalili et al., 24 Dec 2025) in Section 5, Conclusion — Limitations and future work