Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead (2210.14446v2)

Published 26 Oct 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Piyush Behre (6 papers)
  2. Naveen Parihar (1 paper)
  3. Sharman Tan (5 papers)
  4. Amy Shah (3 papers)
  5. Eva Sharma (4 papers)
  6. Geoffrey Liu (2 papers)
  7. Shuangyu Chang (9 papers)
  8. Hosam Khalil (2 papers)
  9. Chris Basoglu (2 papers)
  10. Sayan Pathak (3 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.