Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe (2502.09056v3)

Published 13 Feb 2025 in cs.CL and cs.AI

Abstract: This paper investigates data selection and model merging methodologies aimed at incorporating advanced reasoning capabilities such as those of DeepSeek R1 into language-specific LLMs, with a particular focus on the Thai LLM. Our goal is to enhance the reasoning capabilities of language-specific LLMs while maintaining their target language abilities. DeepSeek R1 excels in reasoning but primarily benefits high-resource languages such as English and Chinese. However, low-resource languages remain underserved due to the dominance of English-centric training data and model optimizations, which limit performance in these languages. This limitation results in unreliable code-switching and diminished effectiveness on tasks in low-resource languages. Meanwhile, local and regional LLM initiatives have attempted to bridge this gap by developing language-specific LLMs that focus on improving local linguistic fidelity. We demonstrate that, with only publicly available datasets and a computational budget of $120, it is possible to enhance the reasoning capabilities of language-specific LLMs to match the level of DeepSeek R1, without compromising their performance on target language tasks.

Summary

The paper introduces a one-day model merging recipe that adapts language-specific LLMs for enhanced reasoning via fine-tuning and SFT alignment.
It details a two-stage strategy combining bilingual representation alignment and selective layer merging to preserve language-specific performance.
Empirical results on benchmarks like MATH-500 and IFEval confirm significant reasoning improvements in low-resource settings.

Evaluating Techniques for Enhancing the Reasoning Abilities of Language-Specific LLMs Using Model Merging

This paper addresses the challenge of adapting language-specific LLMs to improve their reasoning capabilities, aligning them with high-performance reasoning LLMs such as DeepSeek R1. Recognizing the disparities in the performance of LLMs on low-resource languages compared to prominent languages like English and Chinese, the authors investigate a methodology utilizing data selection and model merging, specifically targeting the Thai LLM. This approach aims to integrate reasoning capabilities while preserving the language-specific competencies.

The research highlights the inefficiency of existing large-scale LLMs that rely heavily on high-resource languages, causing suboptimal performance on tasks requiring language-specific nuances in low-resource settings. The paper's methodology features a two-pronged strategy: representation alignment through supervised fine-tuning (SFT) and ability-aware model merging.

Methodological Insights

The methodology adopts Llama 3.1 70B as the common architectural backbone for the models involved, facilitating parameter alignment and eventual merging. This structural compatibility is crucial for ensuring successful integration of disparate model capabilities. The phase of representation alignment is executed by fine-tuning using a bilingual adaptation of datasets, effectively translating questions and solutions into Thai while maintaining the reasoning trace quality. This is complemented by the selection of diverse datasets that push both language and reasoning capabilities of the models.

The model merging strategy relies on empirical insights suggesting that mid-to-high layers of LLMs are more relevant to comprehension and reasoning, whereas the latter layers are responsible for language generation. This insight shapes the merging schema where early layers are predominantly sourced from the reasoning model, and later layers prioritize the language-specific model.

Strong Numerical Results and Implications

The experimental results portray a significant improvement in the reasoning tasks, achieving performance comparable to dedicated reasoning models, while minimally impacting language task performance. Notable benchmarks include MATH-500, AIME 2024 for mathematical reasoning, and LiveCodeBench for coding, alongside language proficiency tests in Thai using custom benchmarks like IFEval and MT-Bench-TH. These results echo the successful blending of specialized capabilities through a budget-efficient model merging strategy.

The implications of this research are profound. It posits that regional LLMs do not need direct, extensive training in reasoning tasks but can achieve similar competence by strategically merging with well-tuned reasoning models. This methodology opens doors for advancing language-specific AI, enabling more equitable AI capabilities across different linguistic communities without necessitating extensive computational resources.

Speculation on Future Developments

The unification of disparate skillsets using model merging presents prospects for developing multilingual reasoning LLMs that are not hindered by the high costs of training and maintaining vast datasets in various languages. Future advancements may see this methodological framework adapted to a broader selection of low-resource languages, further democratizing AI technology. Additionally, the integration of cultural and contextual knowledge could refine reasoning models' applicability and relevance across different sociocultural landscapes.

Conclusion

The paper's approach exemplifies a resource-efficient solution to the discrepancies in LLM performance across languages, capitalizing on the synergy between reasoning models and language-specific models. By offering detailed insights and empirical results, this work stands to substantially influence future endeavors in multilingual LLM development, particularly in enhancing reasoning capabilities without undermining language-specific proficiencies. The public availability of merge configurations and model weights marks a significant step in supporting and expanding language-specific LLM initiatives.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/_akhaliq/status/1890235251701731578

https://twitter.com/GptMaestro/status/1896203736965611914

https://twitter.com/arXivGPT/status/1890824741055029741

https://twitter.com/arXivGPT/status/1891186962801037738

https://twitter.com/arXivGPT/status/1891549300540260759

Reddit

An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging (2 points, 6 comments)