Generalizability of model merging beyond sentence classification

Determine whether the adaptation strategy of merging a continued pre-training checkpoint with a base multilingual model using Task Arithmetic or TIES, followed by fine-tuning on labeled data—shown to improve performance on code-mixed sentence classification for English-Hindi and English-Spanish—generalizes to other NLP tasks beyond sentence classification in both monolingual and code-mixed settings.

Background

The paper evaluates model merging—specifically Task Arithmetic and TIES—as an adaptation method for code-mixed NLP, demonstrating consistent gains over full fine-tuning and continued pre-training plus fine-tuning for sentence classification tasks in English-Hindi and English-Spanish. These experiments use multilingual models such as XLM-R and Llama 3.2 1B.

However, due to a lack of consistent datasets for tasks other than sentence classification in both monolingual and code-mixed languages, the scope of evaluation is limited. As a result, the authors explicitly note uncertainty about whether their findings generalize to other NLP tasks, highlighting a need for further investigation of model merging beyond the tested sentence-level tasks.

References

Therefore, the generalizability of our findings to other NLP tasks is unclear.

— Adapting Multilingual Models to Code-Mixed Tasks via Model Merging (2510.19782 - Kodali et al., 22 Oct 2025) in Section 6 (Discussion), Limitations

Generalizability of model merging beyond sentence classification

Background

References

Related Problems