Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages (2401.05811v2)

Published 11 Jan 2024 in cs.CL and cs.AI

Abstract: This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on LLMs. One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a straightforward approach to the first challenge. However, MTInstruct is limited by weak cross-lingual signals inherent in the second challenge. AlignInstruct emphasizes cross-lingual supervision via a cross-lingual discriminator built using statistical word alignments. Our results based on fine-tuning the BLOOMZ models (1b1, 3b, and 7b1) in up to 24 unseen languages showed that: (1) LLMs can effectively translate unseen languages using MTInstruct; (2) AlignInstruct led to consistent improvements in translation quality across 48 translation directions involving English; (3) Discriminator-based instructions outperformed their generative counterparts as cross-lingual instructions; (4) AlignInstruct improved performance in 30 zero-shot directions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. In-context examples selection for machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8857–8873, Toronto, Canada. Association for Computational Linguistics.
  2. Palm 2 technical report. CoRR, abs/2305.10403.
  3. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online. Association for Computational Linguistics.
  4. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  5. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, volume 382 of ACM International Conference Proceeding Series, pages 41–48. ACM.
  6. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311.
  7. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  8. Rich Caruana. 1997. Multitask learning. Machine Learning, 28(1):41–75.
  9. Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311.
  10. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 385–391, Vancouver, Canada. Association for Computational Linguistics.
  11. Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
  12. No language left behind: Scaling human-centered machine translation. CoRR, abs/2207.04672.
  13. Javier de la Rosa and Andrés Fernández. 2022. Zero-shot reading comprehension and reasoning for spanish with BERTIN GPT-J-6B. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2022) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2022), A Coruña, Spain, September 20, 2022, volume 3202 of CEUR Workshop Proceedings. CEUR-WS.org.
  14. A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 644–648, Atlanta, Georgia. Association for Computational Linguistics.
  15. Abteen Ebrahimi and Katharina Kann. 2021. How to adapt your pretrained multilingual model to 1600 languages. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4555–4567, Online. Association for Computational Linguistics.
  16. Beyond english-centric multilingual machine translation. J. Mach. Learn. Res., 22:107:1–107:48.
  17. Dictionary-based phrase-level prompting of large language models for machine translation. CoRR, abs/2302.07856.
  18. Exploring human-like translation strategy with large language models. CoRR, abs/2305.04118.
  19. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  20. Is chatgpt A good translator? A preliminary study. CoRR, abs/2301.08745.
  21. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 388–395, Barcelona, Spain. Association for Computational Linguistics.
  22. Memory-efficient NLLB-200: Language-specific expert pruning of a massively multilingual machine translation model. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3567–3585, Toronto, Canada. Association for Computational Linguistics.
  23. What types of word alignment improve statistical machine translation? Mach. Transl., 26(4):289–323.
  24. Eliciting the translation ability of large language models via multilingual finetuning with translation instructions. CoRR, abs/2305.15083.
  25. Contrastive demonstration tuning for pre-trained language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 799–811, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  26. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  27. Pre-training multilingual neural machine translation by leveraging alignment information. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2649–2663, Online. Association for Computational Linguistics.
  28. Improving zero-shot translation by disentangling positional information. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1259–1273, Online. Association for Computational Linguistics.
  29. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. In NeurIPS.
  30. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
  31. When do contrastive word alignments improve many-to-many neural machine translation? In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1766–1775, Seattle, United States. Association for Computational Linguistics.
  32. Exploring the impact of layer normalization for zero-shot neural machine translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1300–1316, Toronto, Canada. Association for Computational Linguistics.
  33. Mixed precision training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
  34. Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
  35. SMaLL-100: Introducing shallow multilingual machine translation model for low-resource languages. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8348–8359, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  36. Adaptive machine translation with large language models. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 227–237, Tampere, Finland. European Association for Machine Translation.
  37. Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada. Association for Computational Linguistics.
  38. When being unseen from mBERT is just the beginning: Handling new languages with multilingual language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 448–462, Online. Association for Computational Linguistics.
  39. Martin Müller and Florian Laurent. 2022. Cedille: A large autoregressive french language model. CoRR, abs/2202.03371.
  40. Graham Neubig and Junjie Hu. 2018. Rapid adaptation of neural machine translation to new languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 875–880, Brussels, Belgium. Association for Computational Linguistics.
  41. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  42. Training language models to follow instructions with human feedback. In NeurIPS.
  43. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  44. Towards making the most of chatgpt for machine translation. CoRR, abs/2303.13780.
  45. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673, Online. Association for Computational Linguistics.
  46. Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.
  47. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium. Association for Computational Linguistics.
  48. Improving language understanding by generative pre-training.
  49. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, pages 3505–3506. ACM.
  50. COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online. Association for Computational Linguistics.
  51. Explicit cross-lingual pre-training for unsupervised machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 770–779, Hong Kong, China. Association for Computational Linguistics.
  52. Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  53. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100.
  54. mgpt: Few-shot learners go multilingual. CoRR, abs/2204.07580.
  55. MASS: masked sequence to sequence pre-training for language generation. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 5926–5936. PMLR.
  56. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 3104–3112.
  57. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  58. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  59. Prompting PaLM for translation: Assessing strategies and performance. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15406–15427, Toronto, Canada. Association for Computational Linguistics.
  60. Multi-task learning for multilingual neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1022–1034, Online. Association for Computational Linguistics.
  61. Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  62. Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  63. Language tags matter for zero-shot neural machine translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3001–3007, Online. Association for Computational Linguistics.
  64. A paradigm shift in machine translation: Boosting translation performance of large language models. CoRR, abs/2309.11674.
  65. Condensing multilingual knowledge with lightweight language-specific modules. CoRR, abs/2305.13993.
  66. Bigtrans: Augmenting large language models with multilingual translation capability over 100 languages. CoRR, abs/2305.18098.
  67. BLOOM+1: Adding language support to BLOOM for zero-shot prompting. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11682–11703, Toronto, Canada. Association for Computational Linguistics.
  68. Fine-tuning language models with generative adversarial feedback. CoRR, abs/2305.06176.
  69. GLM-130B: an open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  70. Prompting large language model for machine translation: A case study. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 41092–41110. PMLR.
  71. Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, Online. Association for Computational Linguistics.
  72. Bayling: Bridging cross-lingual alignment and instruction following through interactive translation for large language models. CoRR, abs/2306.10968.
  73. OPT: open pre-trained transformer language models. CoRR, abs/2205.01068.
  74. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  75. Multilingual machine translation with large language models: Empirical results and analysis. CoRR, abs/2304.04675.
  76. Extrapolating large language models to non-english by aligning languages. CoRR, abs/2308.04948.
Citations (2)

Summary

  • The paper introduces AlignInstruct, a contrastive alignment technique that augments translation quality by leveraging cross-lingual discriminators and statistical word alignments.
  • It fine-tunes BLOOMZ models starting from an MTInstruct baseline, achieving significant improvements in translation across 24 low-resource languages.
  • Results demonstrate that combining discriminative methods with generative models notably boosts zero-shot translation performance and overall multilingual efficacy.

Overview of Contrastive Alignment Instructions

In the field of machine translation (MT) with LLMs, expanding language coverage—especially for low-resource languages that have limited data available—is a considerable challenge. Fine-tuning LLMs with machine translation instructions (MTInstruct) is one approach to expanding language support, but has limitations when dealing with languages that have weak cross-lingual signals. This paper introduces contrastive alignment instructions (AlignInstruct), an innovative technique that leverages cross-lingual discriminator methods, which are informed by statistical word alignments, to augment translation performance across a multitude of languages.

Innovation in Machine Translation Adaptation

The core innovation, AlignInstruct, is designed to improve cross-lingual supervision without the need for additional training data. The paper tested AlignInstruct by fine-tuning various BLOOMZ models, showing promising results across 24 previously unseen languages. Results indicated that AlignInstruct, especially when combined with MTInstruct, consistently improved translation quality across multiple language directions. Moreover, the discriminator-based direction of AlignInstruct was shown to be more effective than generative counterparts, highlighting its value as a cross-lingual instruction mechanism.

Methodology and Hierarchical Approach

The process involved an initial baseline setting established using MTInstruct, followed by the implementation of AlignInstruct and an exploration of its generative variants. The methodical fine-tuning involved progressively adapting models with AlignInstruct after establishing a baseline with MTInstruct, both of which originated from the same parallel corpora. The paper investigated and compared the efficacy of both methods using a set curriculum, and the results were evaluated by standard metrics for machine translation.

Results and Implications

The approach led to statistically significant improvements in translation quality for the vast majority of the languages tested. More notably, AlignInstruct demonstrated consistent effectiveness across different model sizes. In zero-shot translation scenarios—where the model had not directly been exposed to certain language directions during training—the technique was proficient, especially when combining languages already supported by the model. The research provides a promising avenue to address the translation for low-resource languages using LLMs, a milestone that has the potential to democratize access to information across linguistic barriers.

Through rigorous experimentation, the paper not only advances our understanding of how LLMs can be adapted to support multilingual translation tasks, but also provides insights into the interplay between discriminative and generative tasks in fine-tuning LLMs. Additionally, by examining alterations in the layer-wise language representations within the models, the paper sheds light on the internal effects of applying AlignInstruct, paving the way for more nuanced and effective LLM fine-tuning strategies in the future.