BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder (2211.00792v2)

Published 2 Nov 2022 in eess.AS, cs.CL, and cs.SD

Abstract: We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained LLM (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain. To overcome such an issue, we propose BECTRA, an extended version of our previous BERT-CTC, that realizes BERT-based E2E-ASR using a vocabulary of interest. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task. With the combination of the transducer and BERT-CTC, we also propose a novel inference algorithm for taking advantage of both autoregressive and non-autoregressive decoding. Experimental results on several ASR tasks, varying in amounts of data, speaking styles, and languages, demonstrate that BECTRA outperforms BERT-CTC by effectively dealing with the vocabulary mismatch while exploiting BERT knowledge.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Yosuke Higuchi (23 papers)
Tetsuji Ogawa (22 papers)
Tetsunori Kobayashi (15 papers)
Shinji Watanabe (416 papers)

Citations (12)

View on Semantic Scholar

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder (2211.00792v2)

Related Papers