Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch (1907.05955v3)

Published 12 Jul 2019 in cs.CL and eess.AS

Abstract: We introduce PyKaldi2 speech recognition toolkit implemented based on Kaldi and PyTorch. While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE. In particular, we implemented the sequence training module with on-the-fly lattice generation during model training in order to simplify the training pipeline. To address the challenging acoustic environments in real applications, PyKaldi2 also supports on-the-fly noise and reverberation simulation to improve the model robustness. With this feature, it is possible to backpropogate the gradients from the sequence-level loss to the front-end feature extraction module, which, hopefully, can foster more research in the direction of joint front-end and backend learning. We performed benchmark experiments on Librispeech, and show that PyKaldi2 can achieve reasonable recognition accuracy. The toolkit is released under the MIT license.

Citations (14)

Summary

We haven't generated a summary for this paper yet.