Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Aligning Neural Machine Translation Models: Human Feedback in Training and Inference (2311.09132v2)

Published 15 Nov 2023 in cs.CL

Abstract: Reinforcement learning from human feedback (RLHF) is a recent technique to improve the quality of the text generated by a LLM, making it closer to what humans would generate. A core ingredient in RLHF's success in aligning and improving LLMs is its reward model, trained using human feedback on model outputs. In machine translation (MT), where metrics trained from human annotations can readily be used as reward models, recent methods using minimum Bayes risk decoding and reranking have succeeded in improving the final quality of translation. In this study, we comprehensively explore and compare techniques for integrating quality metrics as reward models into the MT pipeline. This includes using the reward model for data filtering, during the training phase through RL, and at inference time by employing reranking techniques, and we assess the effects of combining these in a unified approach. Our experimental results, conducted across multiple translation tasks, underscore the crucial role of effective data filtering, based on estimated quality, in harnessing the full potential of RL in enhancing MT quality. Furthermore, our findings demonstrate the effectiveness of combining RL training with reranking techniques, showcasing substantial improvements in translation quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Miguel Moura Ramos (4 papers)
  2. Patrick Fernandes (32 papers)
  3. António Farinhas (18 papers)
  4. André F. T. Martins (113 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com