Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR (2305.18419v1)

Published 28 May 2023 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken real-world utterances rarely contain punctuation. We address this limitation by distilling punctuation knowledge from a bidirectional teacher LLM (LM) trained on written, punctuated text. We compare our segmenter, which is distilled from the LM teacher, against a segmenter distilled from a acoustic-pause-based teacher used in other works, on a streaming ASR pipeline. The pipeline with our segmenter achieves a 3.2% relative WER gain along with a 60 ms median end-of-segment latency reduction on a YouTube captioning task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. W. Ronny Huang (25 papers)
  2. Hao Zhang (948 papers)
  3. Shankar Kumar (34 papers)
  4. Shuo-yiin Chang (25 papers)
  5. Tara N. Sainath (79 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.