kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels (2312.13560v2)

Published 21 Dec 2023 in cs.SD and eess.AS

Abstract: The success of retrieval-augmented LLMs in various NLP tasks has been constrained in automatic speech recognition (ASR) applications due to challenges in constructing fine-grained audio-text datastores. This paper presents kNN-CTC, a novel approach that overcomes these challenges by leveraging Connectionist Temporal Classification (CTC) pseudo labels to establish frame-level audio-text key-value pairs, circumventing the need for precise ground truth alignments. We further introduce a skip-blank strategy, which strategically ignores CTC blank frames, to reduce datastore size. kNN-CTC incorporates a k-nearest neighbors retrieval mechanism into pre-trained CTC ASR systems, achieving significant improvements in performance. By incorporating a k-nearest neighbors retrieval mechanism into pre-trained CTC ASR systems and leveraging a fine-grained, pruned datastore, kNN-CTC consistently achieves substantial improvements in performance under various experimental settings. Our code is available at https://github.com/NKU-HLT/KNN-CTC.

References (23)

Authors (6)

Jiaming Zhou (41 papers)
Shiwan Zhao (47 papers)
Yaqi Liu (18 papers)
Wenjia Zeng (5 papers)
Yong Chen (299 papers)
Yong Qin (35 papers)

Citations (7)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - NKU-HLT/KNN-CTC: [ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels (27 stars)

Tweets

https://twitter.com/1185963278385766400/status/1738372082356961452

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels (2312.13560v2)

Summary

Related Papers

GitHub

Tweets