Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model (2210.16663v2)

Published 29 Oct 2022 in eess.AS and cs.CL

Abstract: This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC). Our formulation relaxes the conditional independence assumptions used in conventional CTC and incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding. BERT-CTC attends to the full contexts of the input and hypothesized output sequences via the self-attention mechanism. This mechanism encourages a model to learn inner/inter-dependencies between the audio and token representations while maintaining CTC's training efficiency. During inference, BERT-CTC combines a mask-predict algorithm with CTC decoding, which iteratively refines an output sequence. The experimental results reveal that BERT-CTC improves over conventional approaches across variations in speaking styles and languages. Finally, we show that the semantic representations in BERT-CTC are beneficial towards downstream spoken language understanding tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yosuke Higuchi (23 papers)
  2. Brian Yan (40 papers)
  3. Siddhant Arora (50 papers)
  4. Tetsuji Ogawa (22 papers)
  5. Tetsunori Kobayashi (15 papers)
  6. Shinji Watanabe (416 papers)
Citations (24)