2000 character limit reached
Hybrid Autoregressive Transducer (hat) (2003.07705v1)
Published 12 Mar 2020 in eess.AS, cs.CL, cs.LG, and cs.SD
Abstract: This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal LLM that can be used to decide whether inference with an external LLM is beneficial or not. This article also presents a finite context version of the HAT model that addresses the exposure bias problem and significantly simplifies the overall training and inference. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches.