Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment (2404.12318v2)

Published 18 Apr 2024 in cs.CL

Abstract: Aligning LLMs (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human preference data are difficult to obtain at scale, making it challenging to extend this framework to diverse languages. In this work, we evaluate a simple approach for zero-shot cross-lingual alignment, where a reward model is trained on preference data in one source language and directly applied to other target languages. On summarization and open-ended dialog generation, we show that this method is consistently successful under comprehensive evaluation settings, including human evaluation: cross-lingually aligned models are preferred by humans over unaligned models on up to >70% of evaluation instances. We moreover find that a different-language reward model sometimes yields better aligned models than a same-language reward model. We also identify best practices when there is no language-specific data for even supervised finetuning, another component in alignment.

Evaluating Zero-Shot Cross-Lingual Alignment in LLMs Using a Single-Language Reward Model

Introduction

Cross-lingual transfer of reward models (RMs) stands as a fundamental approach to facilitate LLM (LM) alignment when multilingual preference data are scarce. This work investigates the efficacy of using a single-language RM to align LMs across multiple languages, offering a potential solution to the problem of scaling alignment practices to diverse language settings where specific preference data may be lacking.

Zero-Shot Cross-Lingual Transfer of Reward Models

The core methodology proposes transferring a RM trained on one source language to guide the alignment of LMs in target languages. This approach side-steps the necessity for target-specific annotated datasets by leveraging the interlingual generality of pretrained multilingual models. The paper explores two tasks: summarization and open-ended dialog generation, utilizing reinforcement learning and best-of-nn reranking as reward optimization techniques.

Cross-lingual effectiveness is measured through comprehensive evaluation methods, including direct human evaluation and automated metrics by larger and unbiased LMs (GPT-4 and PaLM-2-L), revealing a surprising observation. Aligned models using the transferred RM from another language often showed alignment quality surpassing models that utilized a same-language RM. This suggests that the generalization capabilities of RMs may be robust to input language changes and that certain biases tied to same-language RM might be sidestepped with source-language RMs.

Key Results and Observations

  • Generalizability of RMs: Despite being trained on data from one language, RMs were able to effectively drive alignment in different languages, with human evaluator preference reaching over 70% in favor of aligned models across various instances.
  • Comparison with Translate-Train Baseline: The RMs directly transferred cross-lingually outperformed the translate-train baseline, where the RM data was automatically translated into the target language, hinting at the strong adaptability and perhaps superior interlingual transfer capabilities of the original RMs.
  • Unexpected Superiority of Cross-lingual Alignment: In several instances, using a RM from a different language yielded better alignment than using a RM from the target language. It is hypothesized this could be due to the reduced likelihood of overfitting to language-specific artifacts present in the target-language training data.

Implications and Future Directions

The findings underscore the potential to lower the barriers for deploying multilingual LMs aligned to human preferences, especially for under-resourced languages. Cross-lingual RM transfer, by avoiding the need for extensive language-specific annotated data, could democratize the benefits of advanced LMs globally.

However, the implications of this strategy are complex. It opens questions about the extent to which language-agnostic principles of generation quality hold across different contexts and cultural nuances. Conducting further studies on tasks or domains with heavier cultural or context-specific elements could enrich our understanding of the limits of cross-lingual RM transferability.

Recommendations

For practical deployment, using RMs from a high-resource language like English to guide alignment in other languages might be an effective strategy. This strategy should ideally be complemented by rigorous evaluations and comparisons against in-language RMs to ensure that the alignment preserves the intended semantic and pragmatic properties across languages.

In conclusion, this work represents an important step towards scalable, cross-lingual alignment of LMs, though future research is necessary to refine these methods and fully understand the boundary conditions under which they operate optimally.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhaofeng Wu (21 papers)
  2. Ananth Balashankar (13 papers)
  3. Yoon Kim (92 papers)
  4. Jacob Eisenstein (73 papers)
  5. Ahmad Beirami (86 papers)
Citations (8)