Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation (2401.08417v4)

Published 16 Jan 2024 in cs.CL

Abstract: Moderate-sized LLMs -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

PDF Abstract

Introduction

Advancements in neural machine translation (MT) have largely been associated with transformer encoder-decoder architectures. With the advent of decoder-only LLMs, such as the GPT series, these models have started to show promising results in various NLP tasks, including translation. However, there is a notable performance gap between moderate-sized LLMs (7B to 13B parameters) and their larger counterparts or conventional translation models. The paper under discussion addresses this gap by examining the limitations of supervised fine-tuning (SFT) and introduces a novel training methodology.

The Problem with Supervised Fine-Tuning

The authors highlight a compelling issue with the current supervised fine-tuning (SFT) approach: it relies heavily on the quality of reference data. Even human-annotated datasets contain imperfections that can hamper model performance when models are trained to mimic these reference translations. By relying on such datasets for model evaluation, the full potential of translation models may unintentionally be capped. In turn, this reliance limits our ability to evaluate translation effectiveness accurately with reference-based metrics.

A New Fine-Tuning Approach: Contrastive Preference Optimization (CPO)

Countering the shortcomings of SFT, the researchers introduce Contrastive Preference Optimization (CPO), a new fine-tuning approach. Unlike conventional methods focused exclusively on mimicking gold reference translations, CPO nudges models to avoid generating "adequate but not perfect" translations. Using a specially curated preference dataset derived from high-quality translations, CPO guides models to discern superior translation options. This method demonstrates that even a moderate-sized LLM can rival state-of-the-art models with minimal additional parameters and dataset changes.

Results and Insights

The implementation of CPO on the ALMA model with minimal resources produced a variant, ALMA-R, that matches or exceeds the performance of leading models, such as GPT-4 and WMT competition winners, on benchmark datasets like WMT’21, WMT’22, and WMT’23. Importantly, these promising results were achieved with only an additional 12 million parameters (0.1% of the original model size) and 22,000 parallel sentences.

Conclusion

The paper’s exploration raises essential questions about the efficacy of current MT models' fine-tuning methods and the quality of gold reference datasets. By innovatively training the ALMA model with the CPO method, researchers can successfully bridge the performance divide that separates moderate-sized LLMs from their larger or more specialized counterparts. The findings underscore the potential for moderate-sized LLMs within the field of machine translation, marking a pivotal step toward more efficient and high-performing LLMs.