Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective (2510.02272v1)

Published 2 Oct 2025 in cs.CL and cs.AI

Abstract: Recent advancements in Reinforcement Post-Training (RPT) have significantly enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased interest in the generalization of RL-based reasoning. While existing work has primarily focused on investigating its generalization across tasks or modalities, this study proposes a novel cross-linguistic perspective to investigate reasoning generalization. This raises a crucial question: $\textit{Does the reasoning capability achieved from English RPT effectively transfer to other languages?}$ We address this by systematically evaluating English-centric LRMs on multilingual reasoning benchmarks and introducing a metric to quantify cross-lingual transferability. Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm. Through interventional studies, we find that models with stronger initial English capabilities tend to over-rely on English-specific patterns, leading to diminished cross-lingual generalization. To address this, we conduct a thorough parallel training study. Experimental results yield three key findings: $\textbf{First-Parallel Leap}$, a substantial leap in performance when transitioning from monolingual to just a single parallel language, and a predictable $\textbf{Parallel Scaling Law}$, revealing that cross-lingual reasoning transfer follows a power-law with the number of training parallel languages. Moreover, we identify the discrepancy between actual monolingual performance and the power-law prediction as $\textbf{Monolingual Generalization Gap}$, indicating that English-centric LRMs fail to fully generalize across languages. Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.

Summary

The paper introduces the Parallel Scaling Law, showing a power-law relationship between the number of training languages and reasoning generalization.
It employs the Multilingual Transferability Index and analyzes RL and SFT paradigms to quantify cross-lingual performance improvements.
The study finds that bilingual training yields a significant boost, revealing a Monolingual Generalization Gap in English-only models.

Parallel Scaling Law and Cross-Linguistic Reasoning Generalization in Large Reasoning Models

Introduction

This paper presents a systematic investigation into the cross-lingual generalization of Large Reasoning Models (LRMs) trained primarily on English data, with a particular focus on the effects of Reinforcement Post-Training (RPT) and parallel data exposure. The authors introduce the Multilingual Transferability Index (MTI) to quantify reasoning transfer across languages and propose the Parallel Scaling Law, a power-law relationship governing the improvement in cross-lingual reasoning as the number of parallel training languages increases. The paper is structured into observational, interventional, and parallel training experiments, providing a comprehensive analysis of the factors influencing multilingual reasoning generalization.

Observational Study: Cross-Lingual Transferability in Open-Source LRMs

The initial observational paper evaluates 13 open-source English-centric LRMs across four multilingual reasoning benchmarks (MATH500, AIME24, AIME25, GPQA-Diamond) and eleven languages. The MTI metric is used to assess the relative gain in reasoning accuracy for unseen languages compared to the training set.

Key findings include:

Initial Model Dependence: The inherent properties of the initial model significantly affect cross-lingual transferability. Models with strong English-centric capabilities tend to overfit to English-specific reasoning patterns, limiting generalization.
Training Paradigm Effects: RL-tuned models consistently outperform SFT-tuned models in cross-lingual transfer, especially in low-resource languages, where SFT can even degrade performance.
Figure 1: Cross-lingual reasoning transferability across open-source LRMs, showing MTI and TI for SFT- and RL-tuned models on MATH500.

Interventional Study: Isolating Factors in Reasoning Generalization

To control for confounding variables, the interventional paper systematically varies initial model type, model family, and model size using a curated dataset and the GRPO algorithm for RPT.

Initial Model Type

Instruction-tuned models achieve the highest reasoning accuracy and language consistency but exhibit lower MTI compared to base and math-specialized models.
Base and math-specialized models retain more general pre-trained knowledge, resulting in higher cross-lingual transferability.

Model Family

Llama3.1-8B-Instruct (weaker initial English performance) demonstrates superior cross-lingual generalization compared to Qwen2.5-7B-Instruct (stronger initial performance), indicating that less specialized models may be better suited for broad transfer.
Figure 2: Multilingual reasoning performance on MATH500, comparing Qwen2.5-7B-Instruct and Llama3.1-8B-Instruct before and after GRPO fine-tuning.

Model Size

Smaller models (1.5B) show larger gains on in-domain math reasoning and broader reasoning domains.
Larger models (7B) demonstrate more robust transfer to challenging benchmarks, but with smaller relative improvements.
Figure 3: Performance differences across benchmarks for models of different sizes, highlighting scaling effects on reasoning generalization.

Parallel Training Study: The Parallel Scaling Law

The parallel training paper exposes models to parallel data in up to seven languages, revealing three key phenomena:

First-Parallel Leap

The transition from monolingual to bilingual training yields a disproportionately large improvement in both accuracy and MTI, far exceeding the cumulative gains from additional languages.

Parallel Scaling Law

Cross-lingual reasoning performance follows a power-law scaling with the number of parallel languages:
- For transferability: $f(X) = 2.00 \cdot X^{0.29}$
- For accuracy: $f(X) = 56.98 \cdot X^{0.02}$
The exponent for transferability ( $\beta=0.29$ ) is significantly higher than for accuracy ( $\beta=0.02$ ), indicating that parallel training primarily enhances the model's ability to generalize rather than absolute performance.
Figure 4: The Parallel Scaling Law in multilingual reasoning, showing experimental data and fitted power-law curves for accuracy and transferability.

Monolingual Generalization Gap

The actual performance of English-only models falls short of the power-law prediction, revealing a Monolingual Generalization Gap. This gap indicates that reasoning skills acquired through monolingual training are not fully language-agnostic.

Analysis: Parallel vs. Unparallel Data and Language Selection

Parallel data provides explicit semantic equivalence signals, forcing the model to learn unified, language-agnostic reasoning representations. Training with unparallel data yields inferior performance, underscoring the necessity of parallel exposure.

Figure 5: Accuracy difference comparison across parallel and unparallel data training, demonstrating the superiority of parallel data.

The choice of parallel language has only minor effects on MTI and off-target rates, with low-resource languages consistently benefiting the most from parallel training.

Figure 6: Multilingual reasoning performance across different parallel languages, showing consistent gains regardless of language selection.

Implications and Future Directions

The findings challenge the assumption that LRM reasoning mirrors human cognition, as human reasoning is largely language-independent. The observed over-reliance on English-specific patterns in strong English-centric models suggests that current LRM architectures and training paradigms do not fully disentangle reasoning from linguistic processing. The Parallel Scaling Law provides a principled framework for improving cross-lingual generalization, with practical implications for building more robust, language-agnostic LRMs.

Future research should extend these analyses to other domains (e.g., coding, agent planning), develop advanced parallel training strategies to overcome diminishing returns, and pursue mechanistic interpretability to elucidate the internal representations underlying reasoning and language coupling.

Conclusion

This work establishes a rigorous framework for understanding and improving cross-lingual reasoning generalization in LRMs. The introduction of the Multilingual Transferability Index, the identification of the First-Parallel Leap, and the formulation of the Parallel Scaling Law collectively advance the field's understanding of multilingual reasoning. The Monolingual Generalization Gap highlights the limitations of current approaches and motivates the development of more language-agnostic reasoning models. These insights are critical for the next generation of LRMs capable of robust, universal reasoning across diverse linguistic contexts.