Emergent Mind

Abstract

Generative LLMs have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Massively Multilingual Neural Machine Translation
  2. Falcon-40B: an open large language model with state-of-the-art performance. 2023.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901
  4. Improving Translation Faithfulness of Large Language Models via Augmenting Instructions
  5. PaLM: Scaling Language Modeling with Pathways
  6. XNLI: Evaluating Cross-lingual Sentence Representations
  7. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  8440–8451, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.747. https://aclanthology.org/2020.acl-main.747.

  8. Results of WMT22 metrics shared task: Stop using BLEU – neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), pp.  46–68, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. https://aclanthology.org/2022.wmt-1.2.

  9. Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135
  10. A framework for few-shot language model evaluation, September 2021. https://doi.org/10.5281/zenodo.5371628.

  11. Textbooks Are All You Need
  12. How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation
  13. LoRA: Low-rank adaptation of LLMs. In International Conference on Learning Representations, 2022. https://openreview.net/forum?id=nZeVKeeFYf9.

  14. Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine
  15. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526
  16. Quality at a glance: An audit of web-crawled multilingual datasets. Transactions of the Association for Computational Linguistics, 10:50–72, 2022. doi: 10.1162/tacl˙a˙00447. https://aclanthology.org/2022.tacl-1.4.

  17. Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions
  18. Few-shot Learning with Multilingual Language Models
  19. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965
  20. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742, 2020. doi: 10.1162/tacl˙a˙00343. https://aclanthology.org/2020.tacl-1.47.

  21. Small data, big impact: Leveraging minimal data for effective machine translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  2740–2756
  22. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft

  23. MosaicML. Introducing mpt-7b: A new standard for open-source, commercially usable llms, 2023. www.mosaicml.com/blog/mpt-7b. Accessed: 2023-05-05.
  24. Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation
  25. LSDSem 2017 shared task: The story cloze test. In Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp.  46–51, Valencia, Spain, April 2017. Association for Computational Linguistics. doi: 10.18653/v1/W17-0906. https://aclanthology.org/W17-0906.

  26. No Language Left Behind: Scaling Human-Centered Machine Translation
  27. OpenAI. Gpt-4 technical report
  28. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019, pp.  9 – 16, Mannheim, 2019. Leibniz-Institut f”ur Deutsche Sprache. doi: 10.14618/ids-pub-9021. http://nbn-resolving.de/urn:nbn:de:bsz:mh39-90215.

  29. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  311–318
  30. Matt Post. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pp.  186–191, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-6319. https://aclanthology.org/W18-6319.

  31. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21:1–67
  32. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  3505–3506
  33. COMET-22: Unbabel-IST 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pp.  578–585, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. https://aclanthology.org/2022.wmt-1.52.

  34. Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics, 8:264–280, 2020. doi: 10.1162/tacl˙a˙00313. https://aclanthology.org/2020.tacl-1.18.

  35. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  36. Ul2: Unifying language learning paradigms. In The Eleventh International Conference on Learning Representations, 2022a.
  37. Transcending Scaling Laws with 0.1% Extra Compute
  38. It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp.  3534–3546, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.310. https://aclanthology.org/2021.findings-acl.310.

  39. LLaMA: Open and Efficient Foundation Language Models
  40. Llama 2: Open Foundation and Fine-Tuned Chat Models
  41. What language model architecture and pretraining objective works best for zero-shot generalization? In International Conference on Machine Learning, pp. 22964–22984. PMLR
  42. PolyLM: An Open Source Polyglot Large Language Model
  43. BERT, mBERT, or BiBERT? a study on contextualized embeddings for neural machine translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  6663–6675, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.534. https://aclanthology.org/2021.emnlp-main.534.

  44. Language-aware multilingual machine translation with self-supervised learning. In Findings of the Association for Computational Linguistics: EACL 2023, pp.  526–539, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-eacl.38. https://aclanthology.org/2023.findings-eacl.38.

  45. BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages
  46. TIM: Teaching Large Language Models to Translate with Comparison
  47. Prompting Large Language Model for Machine Translation: A Case Study
  48. The effect of translationese in machine translation test sets. In Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), pp.  73–81, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-5208. https://aclanthology.org/W19-5208.

  49. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
  50. BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models
  51. OPT: Open Pre-trained Transformer Language Models
  52. LIMA: Less Is More for Alignment
  53. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
  54. Extrapolating Large Language Models to Non-English by Aligning Languages

Show All 54