- The paper introduces TOWER, an open multilingual LLM that enhances translation-related tasks through tailored pretraining and instruction-driven finetuning.
- It leverages 20B tokens from 10 languages and the TOWER BLOCKS dataset to boost quality estimation, automatic post-editing, and grammatical error correction.
- Benchmark results show TOWER INSTRUCT outperforms open alternatives and rivals closed models like GPT-4 across key evaluation metrics.
Introduction
In the continuously evolving landscape of multilingual NLP, the demand for systems that proficiently handle a variety of translation-related tasks -- like quality estimation, automatic post-edition, and grammatical error correction -- remains high. Recent advancements have spotlighted the use of general-purpose LLMs in setting new benchmarks across these tasks. However, a gap persists in the performance of open LLMs, particularly when catering to a range of tasks within translation workflows. "TOWER: An Open Multilingual LLM for Translation-Related Tasks" addresses this gap by introducing a tailored LLM that not only stands competitive against closed-source giants but also sets a new standard for open multilingual models across a spectrum of translation-related tasks.
TOWER is architected on three primary fronts:
- TOWER BASE, which extends the multilingual capabilities of LLaMA-2 through continued pretraining on a mixture of monolingual and parallel data, encompassing a corpus of 20B tokens across 10 languages.
- TOWER BLOCKS, a curated dataset aimed at finetuning LLMs for translation-related tasks through instruction-formed tasks.
- TOWER INSTRUCT, the culminating model obtained after finetuning TOWER BASE on TOWER BLOCKS, designed for a high comprehension and execution of translation-related tasks.
Analyzing the performance through exhaustive benchmarks reveals that TOWER INSTRUCT consistently outperforms open alternatives and is fiercely competitive with the leading closed-source models, such as GPT-4 and GPT-3.5 turbo, across various metrics including COMET -22, BLEURT, and chrF. Noteworthy is its ability to excel in both directions of translation (source to target and vice versa) for languages included in its training corpus, highlighting its refined multilingual capabilities.
TOWER is meticulously evaluated against a wide array of translation tasks and related activities including automatic post-editing (APE) and named entity recognition (NER), where it showcases notable proficiency. Its adeptness in APE rectifies oscillatory hallucinations in translated texts, manifesting significant quality enhancements. Moreover, its prowess in NER across multiple languages underlines its effective instruction-following capacity, a testament to the diversity and quality considerations instilled in TOWER BLOCKS.
In the domain of grammatical error correction (GEC), though TOWER delivers promising results, it indicates a potential for further improvement, suggesting an avenue for expansion in future versions of the model.
The Importance of Parallel Data
A pivotal element in TOWER's development involves the integration of parallel data during its pretraining phase, a strategic move that significantly bolsters its translation quality. This approach underscores the utility of incorporating cross-lingual signals early in model development, a practice that presents a considerable sample efficiency and continues to yield translation quality improvements with increasing data volume.
Conclusion and Future Directions
TOWER marks a significant stride towards refining the utility and accessibility of open LLMs for multilingual translation tasks. By harnessing the nuanced complexities of translation workflows through a structured training and evaluation pipeline, TOWER stands as a robust framework for future explorations in enhancing translation quality and related processes.
Released alongside the model are the TOWER family, TOWER BLOCKS, and TOWER EVAL – comprehensive resources that ensure reproducibility and encourage further research. Such contributions are pivotal for the broader NLP community, fostering advancements in multilingual processing and translation task efficiencies.
As TOWER navigates the challenges and intricacies of multilingual NLP, its development trajectory illuminates potential enhancements in handling longer contexts and exploring complex task interrelationships. With its open-source model and expansive dataset, TOWER not only elevates the benchmark for translation-related tasks but also propels forward the dialogue on the development of versatile, multilingual LLMs.