Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages (2401.05811v2)

Published 11 Jan 2024 in cs.CL and cs.AI

Abstract: This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on LLMs. One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a straightforward approach to the first challenge. However, MTInstruct is limited by weak cross-lingual signals inherent in the second challenge. AlignInstruct emphasizes cross-lingual supervision via a cross-lingual discriminator built using statistical word alignments. Our results based on fine-tuning the BLOOMZ models (1b1, 3b, and 7b1) in up to 24 unseen languages showed that: (1) LLMs can effectively translate unseen languages using MTInstruct; (2) AlignInstruct led to consistent improvements in translation quality across 48 translation directions involving English; (3) Discriminator-based instructions outperformed their generative counterparts as cross-lingual instructions; (4) AlignInstruct improved performance in 30 zero-shot directions.

References (76)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces AlignInstruct, a contrastive alignment technique that augments translation quality by leveraging cross-lingual discriminators and statistical word alignments.
It fine-tunes BLOOMZ models starting from an MTInstruct baseline, achieving significant improvements in translation across 24 low-resource languages.
Results demonstrate that combining discriminative methods with generative models notably boosts zero-shot translation performance and overall multilingual efficacy.

Overview of Contrastive Alignment Instructions

In the field of machine translation (MT) with LLMs, expanding language coverage—especially for low-resource languages that have limited data available—is a considerable challenge. Fine-tuning LLMs with machine translation instructions (MTInstruct) is one approach to expanding language support, but has limitations when dealing with languages that have weak cross-lingual signals. This paper introduces contrastive alignment instructions (AlignInstruct), an innovative technique that leverages cross-lingual discriminator methods, which are informed by statistical word alignments, to augment translation performance across a multitude of languages.

Innovation in Machine Translation Adaptation

The core innovation, AlignInstruct, is designed to improve cross-lingual supervision without the need for additional training data. The paper tested AlignInstruct by fine-tuning various BLOOMZ models, showing promising results across 24 previously unseen languages. Results indicated that AlignInstruct, especially when combined with MTInstruct, consistently improved translation quality across multiple language directions. Moreover, the discriminator-based direction of AlignInstruct was shown to be more effective than generative counterparts, highlighting its value as a cross-lingual instruction mechanism.

Methodology and Hierarchical Approach

The process involved an initial baseline setting established using MTInstruct, followed by the implementation of AlignInstruct and an exploration of its generative variants. The methodical fine-tuning involved progressively adapting models with AlignInstruct after establishing a baseline with MTInstruct, both of which originated from the same parallel corpora. The paper investigated and compared the efficacy of both methods using a set curriculum, and the results were evaluated by standard metrics for machine translation.

Results and Implications

The approach led to statistically significant improvements in translation quality for the vast majority of the languages tested. More notably, AlignInstruct demonstrated consistent effectiveness across different model sizes. In zero-shot translation scenarios—where the model had not directly been exposed to certain language directions during training—the technique was proficient, especially when combining languages already supported by the model. The research provides a promising avenue to address the translation for low-resource languages using LLMs, a milestone that has the potential to democratize access to information across linguistic barriers.

Through rigorous experimentation, the paper not only advances our understanding of how LLMs can be adapted to support multilingual translation tasks, but also provides insights into the interplay between discriminative and generative tasks in fine-tuning LLMs. Additionally, by examining alterations in the layer-wise language representations within the models, the paper sheds light on the internal effects of applying AlignInstruct, paving the way for more nuanced and effective LLM fine-tuning strategies in the future.

PDF Markdown

Tweets

https://twitter.com/fly51fly/status/1745930652036288697

https://twitter.com/TheTuringPost/status/1747636498491740650

https://twitter.com/daubman/status/1745966995542098168