Verify capacity-driven gains from multilingual SFT
Determine whether the 27B TranslateGemma model, due to its higher capacity, benefits more from exposure to the large number of languages used during supervised fine-tuning than smaller TranslateGemma models, by providing direct experimental confirmation or refutation of this hypothesis.
References
We also hypothesize that the 27B model, with its higher capacity, will have benefited more from the vast amount of languages seen during the SFT phase (detailed in Appendix~\ref{sec:list-of-languages}), although we do not have direct experimental confirmation of this.
— TranslateGemma Technical Report
(2601.09012 - Finkelstein et al., 13 Jan 2026) in Section 6.1 (Text translation)