Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis (2304.04675v4)

Published 10 Apr 2023 in cs.CL

Abstract: LLMs have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating massive languages? 2) Which factors affect LLMs' performance in translation? We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4. Our empirical results show that translation capabilities of LLMs are continually involving. GPT-4 has beat the strong supervised baseline NLLB in 40.91% of translation directions but still faces a large gap towards the commercial translation system like Google Translate, especially on low-resource languages. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, LLM can acquire translation ability in a resource-efficient way and generate moderate translation even on zero-resource languages. Second, instruction semantics can surprisingly be ignored when given in-context exemplars. Third, cross-lingual exemplars can provide better task guidance for low-resource translation than exemplars in the same language pairs. Code will be released at: https://github.com/NJUNLP/MMT-LLM.

PDF Abstract

Overview of Multilingual Machine Translation with LLMs

The paper "Multilingual Machine Translation with LLMs: Empirical Results and Analysis" provides a comprehensive evaluation of the capabilities of LLMs, particularly in the context of multilingual machine translation (MMT). The research primarily discusses the effectiveness of LLMs such as GPT-4 and others in handling translation tasks across a spectrum of languages, focusing on their performance, strengths, and the limitations they currently face.

The paper responds to two critical questions: how effectively LLMs perform MMT across a significant number of languages and the factors influencing their translation capabilities. Through empirical examination, the authors evaluated eight prominent LLMs, including ChatGPT and GPT-4, across 102 languages and 606 translation directions, providing a broad comparison against well-established supervised methods such as NLLB and Google Translator. Notably, GPT-4 showed superior performance, surpassing the supervised baseline NLLB in a significant fraction of the English-centric translation directions, although it still exhibits weaknesses in handling low-resource languages.

Key Findings

Improving Capabilities of LLMs: The research demonstrates that LLMs are progressively refining their translation abilities. GPT-4, in particular, displayed exceptional performance improvements over its predecessors and other LLMs like ChatGPT in numerous tested language pairs.
Comparative Analysis: While GPT-4 evidenced better performance than NLLB in more than 40% of the English-centric direction tests, there remains a considerable gap when facing off against Google Translator, especially in translations involving low-resource or non-English-centric languages.
Factors Affecting Translation: The analysis indicated that translation direction and resource availability significantly impact LLM performance. Notably, instruction semantics provided via in-context learning (ICL) exemplars can sometimes be ignored, yet they still enable effective translation, suggesting a new dimension in optimization potential for ICL paradigms.
Data's Role in Translation: LLMs, evidenced by models like XGLM-7.5B, show that even minimal multilingual corpus presence can provide a basis for developing translation faculties, indicating promising implications for resource-efficient MMT.

Implications and Future Directions

This paper substantially contributes to understanding the transformative potential of LLMs in the field of MMT by establishing them as viable tools, especially with ongoing advances in training techniques and increasing model capabilities. However, there is a persistent challenge related to low-resource languages, suggesting a need for further innovations in leveraging monolingual data and improving cross-lingual knowledge transfer.

The findings suggest several future research trajectories, such as optimizing exemplar selection strategies to better harness translation capabilities, fine-tuning instruction templates for consistency and effectiveness, and exploring emergent patterns in ICL usage. These paths could bridge the performance gap with supervised systems and enhance accessibility to reliable translation across even the most linguistically diverse landscapes.

In conclusion, the paper underscores the incremental but significant strides LLMs are making in multilingual machine translation. As research progresses, LLMs are set to become crucial players in achieving seamless multilingual communication, unlocking new possibilities in cross-cultural exchanges and international collaboration.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Wenhao Zhu (32 papers)
Hongyi Liu (26 papers)
Qingxiu Dong (39 papers)
Jingjing Xu (80 papers)
Shujian Huang (106 papers)
Lingpeng Kong (134 papers)
Jiajun Chen (125 papers)
Lei Li (1293 papers)

Citations (104)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/aporeticaxis/status/1851667216854585628

YouTube

Show All Videos