SiLLM: Large Language Models for Simultaneous Machine Translation (2402.13036v1)

Published 20 Feb 2024 in cs.CL

Abstract: Simultaneous Machine Translation (SiMT) generates translations while reading the source sentence, necessitating a policy to determine the optimal timing for reading and generating words. Despite the remarkable performance achieved by LLMs (LLM) across various NLP tasks, existing SiMT methods predominantly focus on conventional transformers, employing a single model to concurrently determine the policy and generate the translations. However, given the complexity of SiMT, it is challenging to effectively address both tasks with a single model. Therefore, there is a need to decouple the SiMT task into policy-decision and translation sub-tasks. We propose SiLLM, which delegates the two sub-tasks to separate agents, thereby incorporating LLM into SiMT. The policy-decision agent is managed by a conventional SiMT model, responsible for determining the translation policy. The translation agent, leveraging the capabilities of LLM, generates translation using the partial source sentence. The two agents collaborate to accomplish SiMT. To facilitate the application of token-level policies determined by conventional SiMT models to LLM, we propose a word-level policy adapted for LLM. Experiments on two datasets demonstrate that, with a small amount of data for fine-tuning LLM, SiLLM attains state-of-the-art performance.

PDF HTML Abstract

SiLLM: LLMs for Simultaneous Machine Translation

The paper introduces SiLLM, a newly developed framework that seeks to enhance Simultaneous Machine Translation (SiMT) by incorporating LLMs. Traditionally, SiMT methods employ a singular transformer-based model to concurrently manage two critical aspects: determining the translation policy and generating translations. However, this dual responsibility often surpasses the capability threshold of such models, thereby necessitating an innovative approach proposed by this paper.

Overview of SiLLM Framework

SiLLM strategically decomposes the SiMT task into two distinct sub-tasks, which are delegated to separate agents: the policy-decision agent and the translation agent. The policy-decision agent leverages an established transformer-based SiMT model focused specifically on policy determination. In contrast, the translation agent deploys an LLM for translation generation, capitalizing on its advanced capabilities in understanding and producing linguistic content.

To seamlessly integrate LLM into the SiMT task, the paper introduces a word-level policy that can be readily adapted for LLMs. This policy is designed to address the compatibility challenge posed by the existing token-level policies of conventional SiMT models.

Experimental Results and Findings

Extensive experiments, conducted on two datasets, underscore the effectiveness of SiLLM, highlighting its ability to achieve state-of-the-art performance with minimal data for fine-tuning the LLM. Specifically, the results reflect significant improvements in translation quality across varied latency scenarios, with SiLLM consistently outperforming baseline models.

The experimentation also reveals several insights:

SiLLM’s approach of separating the policy-decision and translation tasks allows more efficient resource utilization, thereby enhancing overall performance.
Fine-tuning LLMs with a relatively small dataset markedly boosts translation capabilities, signifying the potential of LLMs in adaptive translation settings.
SiLLM performs particularly well in settings with high latency, where its ability to utilize comprehensive source information aligns with superior translation outcomes.

Implications and Future Directions

The practical implication of this research lies in its potential application in real-time translation scenarios, such as live international conference translation and the generation of real-time subtitles. Beyond practical applications, the theoretical groundwork laid by this paper opens avenues for further research that could explore variations of LLM integration into SiMT systems.

There exist opportunities to investigate the effects of employing increasingly sophisticated and larger LLMs as translation agents. Additionally, future studies might explore the optimization of policy-decision components further to improve latency-quality trade-offs.

In conclusion, SiLLM represents a substantial methodological innovation in the SiMT domain. By leveraging the distinct capabilities of LLMs for translation generation and decoupling these from policy-decision tasks, this framework introduces a new paradigm for achieving efficient, high-quality simultaneous translation.

PDF Markdown Bookmark Chat (Pro)

References (21)

Authors (5)

Shoutao Guo (17 papers)
Shaolei Zhang (36 papers)
Zhengrui Ma (18 papers)
Min Zhang (630 papers)
Yang Feng (230 papers)

Citations (6)

View on Semantic Scholar

SiLLM: Large Language Models for Simultaneous Machine Translation (2402.13036v1)

SiLLM: LLMs for Simultaneous Machine Translation

Overview of SiLLM Framework

Experimental Results and Findings

Implications and Future Directions

Related Papers

GitHub

YouTube