Emma

Summary:

  • NVIDIA researchers have introduced a GPU-accelerated Weighted Finite State Transducer (WFST) beam search decoder to enhance the performance of Automated Speech Recognition (ASR) systems. The new decoder is designed to integrate with current Connectionist Temporal Classification (CTC) models, improving throughput, latency, and support for features like on-the-fly composition for utterance-specific word boosting.
  • The GPU-accelerated decoder showed up to seven times higher throughput in an offline scenario and over eight times lower latency in an online streaming scenario, while maintaining the same or even higher word error rates. This shows that it can significantly improve efficiency and accuracy in comparison to the conventional CPU-based beam search decoding method.