Cross-lingual and multilingual generalization of convergence dynamics
Determine whether training language models on non-English corpora or in multilingual settings exhibits the same convergence behavior across random seeds during training when measured by expected per-token Kullback–Leibler divergence, specifically whether the dynamics across training steps mirror the four-phase pattern of initial uniform, sharp-convergence, sharp-divergence, and slow-reconvergence observed in English-language setups.
References
Third, our analysis is restricted to English-language data, leaving open questions about whether similar convergence dynamics occur in other languages and in multilingual settings.
— Convergence and Divergence of Language Models under Different Random Seeds
(2509.26643 - Fehlauer et al., 30 Sep 2025) in Section: Limitations