Do findings generalize from open-weight LRMs to proprietary LRMs?

Ascertain whether empirical differences between reasoning in English versus the question’s language, established using open-weight Large Reasoning Models, generalize to closed/proprietary models such as OpenAI o1 and Gemini 2.5 Pro.

Background

All experiments in the paper are conducted with open-weight reasoning models (e.g., Qwen and DeepSeek variants). The results suggest stronger performance when reasoning in English and reveal translation-related errors when translating non-English inputs to English.

The authors explicitly state uncertainty about whether these findings hold for closed/proprietary models, naming o1 and Gemini 2.5 Pro, and identify this generalization as an unresolved issue.

References

Likewise, our experiments are limited to open-weight LRMs, so it remains to be seen if these findings generalize to closed/proprietary models such as o1 \citep{jaech2024openai} and Gemini 2.5 Pro \citep{comanici2025gemini}.

— The Reasoning Lingua Franca: A Double-Edged Sword for Multilingual AI (2510.20647 - Saji et al., 23 Oct 2025) in Section: Limitations

Do findings generalize from open-weight LRMs to proprietary LRMs?

Sponsor

Background

References

Related Problems