Do findings generalize from open-weight LRMs to proprietary LRMs?
Ascertain whether empirical differences between reasoning in English versus the question’s language, established using open-weight Large Reasoning Models, generalize to closed/proprietary models such as OpenAI o1 and Gemini 2.5 Pro.
References
Likewise, our experiments are limited to open-weight LRMs, so it remains to be seen if these findings generalize to closed/proprietary models such as o1 \citep{jaech2024openai} and Gemini 2.5 Pro \citep{comanici2025gemini}.
— The Reasoning Lingua Franca: A Double-Edged Sword for Multilingual AI
(2510.20647 - Saji et al., 23 Oct 2025) in Section: Limitations