Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Independent language modeling architecture for end-to-end ASR (1912.00863v1)

Published 25 Nov 2019 in cs.CL and eess.AS

Abstract: The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and LLMs within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network (subnet), which incorporates the role of the LLM (LM), is conditioned on the encoder output. This means that the acoustic encoder and the LLM are entangled that doesn't allow LLM to be trained separately from external text data. To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output. In this way, the decoupled subnet becomes an independently trainable LM subnet, which can easily be updated using the external text data. We study two strategies for updating the new architecture. Experimental results show that, 1) the independent LM architecture benefits from external text data, achieving 9.3% and 22.8% relative character and word error rate reduction on Mandarin HKUST and English NSC datasets respectively; 2)the proposed architecture works well with external LM and can be generalized to different amount of labelled data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Van Tung Pham (13 papers)
  2. Haihua Xu (23 papers)
  3. Yerbolat Khassanov (19 papers)
  4. Zhiping Zeng (6 papers)
  5. Eng Siong Chng (112 papers)
  6. Chongjia Ni (18 papers)
  7. Bin Ma (78 papers)
  8. Haizhou Li (286 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.