Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving CTC-based ASR Models with Gated Interlayer Collaboration (2205.12462v2)

Published 25 May 2022 in cs.CL, cs.SD, and eess.AS

Abstract: The CTC-based automatic speech recognition (ASR) models without the external LLM usually lack the capacity to model conditional dependencies and textual interactions. In this paper, we present a Gated Interlayer Collaboration (GIC) mechanism to improve the performance of CTC-based models, which introduces textual information into the model and thus relaxes the conditional independence assumption of CTC-based models. Specifically, we consider the weighted sum of token embeddings as the textual representation for each position, where the position-specific weights are the softmax probability distribution constructed via inter-layer auxiliary CTC losses. The textual representations are then fused with acoustic features by developing a gate unit. Experiments on AISHELL-1, TEDLIUM2, and AIDATATANG corpora show that the proposed method outperforms several strong baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yuting Yang (45 papers)
  2. Yuke Li (76 papers)
  3. Binbin Du (8 papers)
Citations (11)