Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-tuning Large Language Models for Domain-specific Machine Translation (2402.15061v2)

Published 23 Feb 2024 in cs.CL and cs.LG

Abstract: LLMs have shown great potential in domain-specific machine translation (MT). However, one major issue is that LLMs pre-trained on general domain corpus might not generalize well to specific domains due to the lack of domain-specific knowledge. To address this issue, this paper focuses on enhancing the domain-specific MT capability of LLMs, by providing high-quality training datasets and proposing a novel fine-tuning framework denoted by DragFT. DragFT augments LLMs via three techniques: (i) Dictionary-enhanced prompting integrates dictionary information into prompts to improve the translation of domain-specific terminology.; (ii) RAG-based few-shot example selection provides high-quality examples that simulate both the domain and style characteristics; (iii) Fine-tuning with few-shot examples further enhances performance when using in-domain examples. We deploy DragFT on three well-known LLM backbones with 13B training parameters to validate its effectiveness. The results on three domain-specific datasets show that DragFT achieves a significant performance boost and shows superior performance compared to advanced models such as GPT-3.5 and GPT-4o. The drastic performance improvement of DragFT over existing LLMs can be attributed to incorporating relevant knowledge while mitigating noise.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. In-context examples selection for machine translation. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, ACL Findings, pages 8857–8873, 2023.
  2. Steering large language models for machine translation with finetuning and in-context learning. In EMNLP, pages 11127–11148, 2023.
  3. Language models are few-shot learners. In NIPS, pages 1877–1901, 2020.
  4. Neural fuzzy repair: Integrating fuzzy matches into neural machine translation. In ACL, pages 1800–1809, 2019.
  5. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  6. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022.
  7. Multi-domain neural machine translation through unsupervised adaptation. In WMT, pages 127–137, 2017.
  8. Using natural language prompts for machine translation, 2022.
  9. Dictionary-based phrase-level prompting of large language models for machine translation, 2023.
  10. The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation. Transactions of the Association for Computational Linguistics, 10:522–538, 2022.
  11. How good are gpt models at machine translation? a comprehensive evaluation, 2023.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  13. Towards effective disambiguation for machine translation with large language models. In WMT, pages 482–495, 2023.
  14. ParroT: Translating during chat using large language models tuned with human translation and feedback. In EMNLP, pages 15009–15020, 2023.
  15. Is chatgpt a good translator? a preliminary study. ArXiv, abs/2301.08745, 2023.
  16. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872, 2017.
  17. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In EMNLP: System Demonstrations, pages 66–71, 2018.
  18. Eliciting the translation ability of large language models via multilingual finetuning with translation instructions, 2023.
  19. Few-shot learning with multilingual generative language models. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, EMNLP, pages 9019–9052, 2022.
  20. Chain-of-dictionary prompting elicits translation in large language models. arXiv preprint arXiv:2305.06575, 2023.
  21. Adaptive machine translation with large language models. In EAMT Annual Conference, pages 227–237, 2023.
  22. Fine-tuning large language models for adaptive machine translation, 2023.
  23. Domain terminology integration into machine translation: Leveraging large language models. In WMT, pages 902–911, 2023.
  24. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  25. Maja Popović. chrf: character n-gram f-score for automatic mt evaluation. In Proceedings of the tenth workshop on statistical machine translation, pages 392–395, 2015.
  26. Comet-22: Unbabel-ist 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585, 2022.
  27. Neural machine translation models can learn to be few-shot learners, 2023.
  28. Prompt programming for large language models: Beyond the few-shot paradigm. In CHI EA, pages 1–7, 2021.
  29. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  30. Cross-lingual supervision improves large language models pre-training, 2023.
  31. InternLM Team. Internlm: A multilingual language model with progressively enhanced capabilities, 2023.
  32. Um-corpus: A large english-chinese parallel corpus for statistical machine translation. In LREC, pages 1837–1842, 2014.
  33. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  34. Prompting PaLM for translation: Assessing strategies and performance. In ACL, pages 15406–15427, 2023.
  35. Finetuned language models are zero-shot learners. In ICLR, 2022.
  36. In NIPS, volume 35, pages 24824–24837, 2022.
  37. Polylm: An open source polyglot large language model. arXiv preprint arXiv:2307.06018, 2023.
  38. Language models are few-shot multilingual learners. In Proceedings of the 1st Workshop on Multilingual Representation Learning, ACL, pages 1–15, 2021.
  39. Boosting neural machine translation with similar translations. In ACL, pages 1580–1590, 2020.
  40. A paradigm shift in machine translation: Boosting translation performance of large language models. volume abs/2309.11674, 2023.
  41. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, 2021.
  42. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305, 2023.
  43. Bigtrans: Augmenting large language models with multilingual translation capability over 100 languages. arXiv preprint arXiv:2305.18098, 2023.
  44. Improving massively multilingual neural machine translation and zero-shot translation. arXiv preprint arXiv:2004.11867, 2020.
  45. Opt: Open pre-trained transformer language models, 2022.
  46. Prompting large language model for machine translation: A case study. In ICML, pages 41092–41110, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiawei Zheng (6 papers)
  2. Hanghai Hong (1 paper)
  3. Xiaoli Wang (40 papers)
  4. Jingsong Su (1 paper)
  5. Yonggui Liang (1 paper)
  6. Shikai Wu (2 papers)
  7. Feiyan Liu (2 papers)
Citations (16)