Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference (2303.07914v2)

Published 14 Mar 2023 in cs.CL

Abstract: A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech representations extracted at the end of a streaming input are significantly different from those extracted from a complete utterance. To address this issue, we propose a new approach called Future-Aware Streaming Translation (FAST) that adapts an offline ST model for streaming input. FAST includes a Future-Aware Inference (FAI) strategy that incorporates future context through a trainable masked embedding, and a Future-Aware Distillation (FAD) framework that transfers future context from an approximation of full speech to streaming input. Our experiments on the MuST-C EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs between translation quality and latency than strong baselines. Extensive analyses suggest that our methods effectively alleviate the aforementioned mismatch problem between offline training and online inference.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Biao Fu (8 papers)
  2. Minpeng Liao (11 papers)
  3. Kai Fan (44 papers)
  4. Zhongqiang Huang (20 papers)
  5. Boxing Chen (67 papers)
  6. Yidong Chen (27 papers)
  7. Xiaodong Shi (34 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.