Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On-the-Fly Fusion of Large Language Models and Machine Translation (2311.08306v2)

Published 14 Nov 2023 in cs.CL

Abstract: We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input. We perform experiments on 4 language pairs (both directions) with varying data amounts. We find that a slightly weaker-at-translation LLM can improve translations of a NMT model, and ensembling with an LLM can produce better translations than ensembling two stronger MT models. We combine our method with various techniques from LLM prompting, such as in context learning and translation context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. In-context examples selection for machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8857–8873, Toronto, Canada. Association for Computational Linguistics.
  2. Findings of the 2021 conference on machine translation (WMT21). In Proceedings of the Sixth Conference on Machine Translation, pages 1–88, Online. Association for Computational Linguistics.
  3. Computing consensus translation from multiple machine translation systems. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU ’01., pages 351–354.
  4. ParaCrawl: Web-scale acquisition of parallel corpora. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4555–4567, Online. Association for Computational Linguistics.
  5. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 272–303, Belgium, Brussels. Association for Computational Linguistics.
  6. Searching for needles in a haystack: On the role of incidental bilingualism in PaLM’s translation capability. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9432–9452, Toronto, Canada. Association for Computational Linguistics.
  7. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  8. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, page 535–541, New York, NY, USA. Association for Computing Machinery.
  9. Improving translation faithfulness of large language models via augmenting instructions.
  10. Palm: Scaling language modeling with pathways.
  11. A character-level decoder without explicit segmentation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1693–1703, Berlin, Germany. Association for Computational Linguistics.
  12. Praveen Dakwale and Christof Monz. 2017. Fine-tuning for neural machine translation with limited degradation across in- and out-of-domain data. In Proceedings of Machine Translation Summit XVI: Research Track, pages 156–169, Nagoya Japan.
  13. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1370–1380, Baltimore, Maryland. Association for Computational Linguistics.
  14. Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems, pages 1–15, Berlin, Heidelberg. Springer Berlin Heidelberg.
  15. The JHU machine translation systems for WMT 2017. In Proceedings of the Second Conference on Machine Translation, pages 276–282, Copenhagen, Denmark. Association for Computational Linguistics.
  16. Jane: Open source machine translation system combination. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 29–32, Gothenburg, Sweden. Association for Computational Linguistics.
  17. On using monolingual corpora in neural machine translation.
  18. On integrating a language model into neural machine translation. Computer Speech & Language, 45:137–148.
  19. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10):993–1001.
  20. Kenneth Heafield and Alon Lavie. 2010. Combining machine translation output with open source: The Carnegie Mellon multi-engine machine translation scheme. The Prague Bulletin of Mathematical Linguistics, 93:27–36.
  21. How good are gpt models at machine translation? a comprehensive evaluation.
  22. Distilling the knowledge in a neural network.
  23. LLM-blender: Ensembling large language models with pairwise ranking and generative fusion. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14165–14178, Toronto, Canada. Association for Computational Linguistics.
  24. Is chatgpt a good translator? yes with gpt-4 as the engine.
  25. The AMU-UEDIN submission to the WMT16 news translation task: Attention-based NMT models as feature functions in phrase-based SMT. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 319–325, Berlin, Germany. Association for Computational Linguistics.
  26. Marian: Fast neural machine translation in C++. In Proceedings of ACL 2018, System Demonstrations, pages 116–121, Melbourne, Australia. Association for Computational Linguistics.
  27. Huda Khayrallah and Philipp Koehn. 2018. On the impact of various types of noise on neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 74–83, Melbourne, Australia. Association for Computational Linguistics.
  28. Neural lattice search for domain adaptation in machine translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 20–25, Taipei, Taiwan. Asian Federation of Natural Language Processing.
  29. Regularized training objective for continued training for domain adaptation in neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 36–44, Melbourne, Australia. Association for Computational Linguistics.
  30. Simulated multiple reference training improves low-resource machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 82–89, Online. Association for Computational Linguistics.
  31. Findings of the 2022 conference on machine translation (WMT22). In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1–45, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  32. Findings of the WMT 2020 shared task on parallel corpus filtering and alignment. In Proceedings of the Fifth Conference on Machine Translation, pages 726–742, Online. Association for Computational Linguistics.
  33. Findings of the WMT 2019 shared task on parallel corpus filtering for low-resource conditions. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 54–72, Florence, Italy. Association for Computational Linguistics.
  34. Findings of the WMT 2018 shared task on parallel corpus filtering. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 726–739, Belgium, Brussels. Association for Computational Linguistics.
  35. Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pages 66–71. Association for Computational Linguistics.
  36. Has machine translation achieved human parity? a case for document-level evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4791–4796, Brussels, Belgium. Association for Computational Linguistics.
  37. Eliciting the translation ability of large language models via multilingual finetuning with translation instructions.
  38. Vocabulary manipulation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 124–129, Berlin, Germany. Association for Computational Linguistics.
  39. Adaptive machine translation with large language models. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 227–237, Tampere, Finland. European Association for Machine Translation.
  40. Comblm: Adapting black-box language models through small fine-tuned models.
  41. Document-level language models for machine translation.
  42. Ofir Press and Lior Wolf. 2017. Using the output embedding to improve language models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 157–163, Valencia, Spain. Association for Computational Linguistics.
  43. COMET-22: Unbabel-IST 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  44. Chatgpt mt: Competitive for high- (but not low-) resource languages.
  45. Multilingual tedx corpus for speech recognition and translation. In Proceedings of Interspeech.
  46. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany. Association for Computational Linguistics.
  47. Findings of the WMT 2023 shared task on parallel data curation. In Proceedings of the Eighth Conference on Machine Translation (WMT), Singapore, Singapore. Association for Computational Linguistics.
  48. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model.
  49. ParaPat: The multi-million sentences parallel corpus of patents abstracts. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3769–3774, Marseille, France. European Language Resources Association.
  50. Simple fusion: Return of the language model. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 204–211, Brussels, Belgium. Association for Computational Linguistics.
  51. Neural machine translation by minimising the Bayes-risk with respect to syntactic translation lattices. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 362–368, Valencia, Spain. Association for Computational Linguistics.
  52. Jörg Tiedemann. 2012. Parallel data, tools and interfaces in opus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA).
  53. Attaining the unattainable? reassessing claims of human parity in neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 113–123, Brussels, Belgium. Association for Computational Linguistics.
  54. Llama: Open and efficient foundation language models.
  55. Llama 2: Open foundation and fine-tuned chat models.
  56. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  57. Embarrassingly easy document-level MT metrics: How to convert any pretrained metric into a document-level metric. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 118–128, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  58. Prompting PaLM for translation: Assessing strategies and performance. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15406–15427, Toronto, Canada. Association for Computational Linguistics.
  59. Neural machine translation advised by statistical machine translation. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 3330–3336. AAAI Press.
  60. Rachel Wicks and Matt Post. 2023. Identifying context-dependent translations for evaluation set production. In Proceedings of the Eighth Conference on Machine Translation, pages 452–467, Singapore. Association for Computational Linguistics.
  61. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  62. A paradigm shift in machine translation: Boosting translation performance of large language models.
  63. Bigtranslate: Augmenting large language models with multilingual translation capability over 100 languages.
  64. Simple and effective noisy channel modeling for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5696–5701, Hong Kong, China. Association for Computational Linguistics.
  65. Tim: Teaching large language models to translate with comparison.
  66. Prompting large language model for machine translation: A case study. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 41092–41110. PMLR.
  67. Neural machine translation with explicit phrase alignment. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:1001–1010.
  68. Benchmarking large language models for news summarization.
  69. Sentiment analysis in the era of large language models: A reality check. arXiv preprint arXiv:2305.15005.
  70. Fine-tuned machine translation metrics struggle in unseen domains. arXiv preprint arXiv:2306.07899.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hieu Hoang (13 papers)
  2. Huda Khayrallah (15 papers)
  3. Marcin Junczys-Dowmunt (29 papers)