Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition (2104.09106v4)

Published 19 Apr 2021 in cs.CL

Abstract: Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing. We propose an acoustic data-driven subword modeling (ADSM) approach that adapts the advantages of several text-based and acoustic-based subword methods into one pipeline. With a fully acoustic-oriented label design and learning process, ADSM produces acoustic-structured subword units and acoustic-matched target sequence for further ASR training. The obtained ADSM labels are evaluated with different end-to-end ASR approaches including CTC, RNN-Transducer and attention models. Experiments on the LibriSpeech corpus show that ADSM clearly outperforms both byte pair encoding (BPE) and pronunciation-assisted subword modeling (PASM) in all cases. Detailed analysis shows that ADSM achieves acoustically more logical word segmentation and more balanced sequence length, and thus, is suitable for both time-synchronous and label-synchronous models. We also briefly describe how to apply acoustic-based subword regularization and unseen text segmentation using ADSM.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Wei Zhou (308 papers)
Mohammad Zeineldeen (16 papers)
Zuoyun Zheng (2 papers)
Ralf Schlüter (73 papers)
Hermann Ney (104 papers)

Citations (14)

View on Semantic Scholar

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition (2104.09106v4)

Related Papers