Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings (2010.11247v2)

Published 21 Oct 2020 in cs.CL

Abstract: Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large-scale, high-quality simultaneous translation datasets, most such systems are still trained on conventional full-sentence bitexts. This is far from ideal for the simultaneous scenario due to the abundance of unnecessary long-distance reorderings in those bitexts. We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation. Experiments on Zh->En and Ja->En simultaneous translation show substantial improvements (up to +2.7 BLEU) with the addition of these generated pseudo-references.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Junkun Chen (27 papers)
  2. Renjie Zheng (29 papers)
  3. Atsuhito Kita (1 paper)
  4. Mingbo Ma (32 papers)
  5. Liang Huang (108 papers)
Citations (22)