Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation (2203.15479v2)

Published 29 Mar 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ryo Fukuda (5 papers)
  2. Katsuhito Sudoh (35 papers)
  3. Satoshi Nakamura (94 papers)
Citations (6)