Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution (2402.11525v3)

Published 18 Feb 2024 in cs.CL and cs.LG

Abstract: Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback.
  2. Open problems and fundamental limitations of reinforcement learning from human feedback. CoRR, abs/2307.15217.
  3. No language left behind: Scaling human-centered machine translation. CoRR, abs/2207.04672.
  4. Achieving human parity on automatic chinese to english news translation. CoRR, abs/1803.05567.
  5. Improving machine translation with human feedback: An exploration of quality estimation as a reward model. CoRR, abs/2401.12873.
  6. How good are gpt models at machine translation? a comprehensive evaluation.
  7. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. CoRR, abs/1907.00456.
  8. Parrot: Translating during chat using large language models tuned with human translation and feedback. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 15009–15020. Association for Computational Linguistics.
  9. Is chatgpt a good translator? yes with gpt-4 as the engine.
  10. Findings of the 2023 conference on machine translation (WMT23): llms are here but not quite there yet. In Proceedings of the Eighth Conference on Machine Translation, WMT 2023, Singapore, December 6-7, 2023, pages 1–42. Association for Computational Linguistics.
  11. To ship or not to ship: An extensive evaluation of automatic metrics for machine translation. In Proceedings of the Sixth Conference on Machine Translation, WMT@EMNLP 2021, Online Event, November 10-11, 2021, pages 478–494. Association for Computational Linguistics.
  12. Can neural machine translation be improved with user feedback? In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 3 (Industry Papers), pages 92–105. Association for Computational Linguistics.
  13. Has machine translation achieved human parity? A case for document-level evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 4791–4796. Association for Computational Linguistics.
  14. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  15. Training language models to follow instructions with human feedback.
  16. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA. Association for Computational Linguistics.
  17. Sequence level training with recurrent neural networks.
  18. COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 2685–2702. Association for Computational Linguistics.
  19. Cometkiwi: Ist-unbabel 2022 submission for the quality estimation shared task. In Proceedings of the Seventh Conference on Machine Translation, WMT 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 7-8, 2022, pages 634–645. Association for Computational Linguistics.
  20. Learning to summarize from human feedback. CoRR, abs/2009.01325.
  21. Llama: Open and efficient foundation language models.
  22. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  23. Secrets of rlhf in large language models part ii: Reward modeling.
  24. Guofeng: A discourse-aware evaluation benchmark for language understanding, translation and generation.
  25. Beyond BLEU:training neural machine translation with semantic similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4344–4355, Florence, Italy. Association for Computational Linguistics.
  26. A study of reinforcement learning for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3612–3621, Brussels, Belgium. Association for Computational Linguistics.
  27. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144.
  28. A paradigm shift in machine translation: Boosting translation performance of large language models.
  29. A survey of deep learning techniques for neural machine translation.
  30. Bigtranslate: Augmenting large language models with multilingual translation capability over 100 languages.
  31. Multilingual machine translation with large language models: Empirical results and analysis.
  32. The united nations parallel corpus v1.0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016. European Language Resources Association (ELRA).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Nuo Xu (37 papers)
  2. Jun Zhao (469 papers)
  3. Can Zu (5 papers)
  4. Tao Gui (127 papers)
  5. Qi Zhang (785 papers)
  6. Xuanjing Huang (287 papers)
  7. Sixian Li (12 papers)
  8. Lu Chen (245 papers)
  9. Zhihao Zhang (61 papers)
  10. Rui Zheng (79 papers)
  11. Shihan Dou (46 papers)
  12. Wenjuan Qin (3 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.