Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization (2409.17673v1)

Published 26 Sep 2024 in cs.CL

Abstract: Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT, leading to improvements across all languages of a multilingual model, even when task-alignment is only applied to a subset of those languages. We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kaden Uhlig (2 papers)
  2. Joern Wuebker (9 papers)
  3. Raphael Reinauer (6 papers)
  4. John DeNero (13 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets