Evaluating Zero-Shot Cross-Lingual Alignment in LLMs Using a Single-Language Reward Model
Introduction
Cross-lingual transfer of reward models (RMs) stands as a fundamental approach to facilitate LLM (LM) alignment when multilingual preference data are scarce. This work investigates the efficacy of using a single-language RM to align LMs across multiple languages, offering a potential solution to the problem of scaling alignment practices to diverse language settings where specific preference data may be lacking.
Zero-Shot Cross-Lingual Transfer of Reward Models
The core methodology proposes transferring a RM trained on one source language to guide the alignment of LMs in target languages. This approach side-steps the necessity for target-specific annotated datasets by leveraging the interlingual generality of pretrained multilingual models. The paper explores two tasks: summarization and open-ended dialog generation, utilizing reinforcement learning and best-of- reranking as reward optimization techniques.
Cross-lingual effectiveness is measured through comprehensive evaluation methods, including direct human evaluation and automated metrics by larger and unbiased LMs (GPT-4 and PaLM-2-L), revealing a surprising observation. Aligned models using the transferred RM from another language often showed alignment quality surpassing models that utilized a same-language RM. This suggests that the generalization capabilities of RMs may be robust to input language changes and that certain biases tied to same-language RM might be sidestepped with source-language RMs.
Key Results and Observations
- Generalizability of RMs: Despite being trained on data from one language, RMs were able to effectively drive alignment in different languages, with human evaluator preference reaching over 70% in favor of aligned models across various instances.
- Comparison with Translate-Train Baseline: The RMs directly transferred cross-lingually outperformed the translate-train baseline, where the RM data was automatically translated into the target language, hinting at the strong adaptability and perhaps superior interlingual transfer capabilities of the original RMs.
- Unexpected Superiority of Cross-lingual Alignment: In several instances, using a RM from a different language yielded better alignment than using a RM from the target language. It is hypothesized this could be due to the reduced likelihood of overfitting to language-specific artifacts present in the target-language training data.
Implications and Future Directions
The findings underscore the potential to lower the barriers for deploying multilingual LMs aligned to human preferences, especially for under-resourced languages. Cross-lingual RM transfer, by avoiding the need for extensive language-specific annotated data, could democratize the benefits of advanced LMs globally.
However, the implications of this strategy are complex. It opens questions about the extent to which language-agnostic principles of generation quality hold across different contexts and cultural nuances. Conducting further studies on tasks or domains with heavier cultural or context-specific elements could enrich our understanding of the limits of cross-lingual RM transferability.
Recommendations
For practical deployment, using RMs from a high-resource language like English to guide alignment in other languages might be an effective strategy. This strategy should ideally be complemented by rigorous evaluations and comparisons against in-language RMs to ensure that the alignment preserves the intended semantic and pragmatic properties across languages.
In conclusion, this work represents an important step towards scalable, cross-lingual alignment of LMs, though future research is necessary to refine these methods and fully understand the boundary conditions under which they operate optimally.