Distilling Knowledge Using Parallel Data for Far-field Speech Recognition (1802.06941v1)

Published 20 Feb 2018 in cs.CL, cs.SD, and eess.AS

Abstract: In order to improve the performance for far-field speech recognition, this paper proposes to distill knowledge from the close-talking model to the far-field model using parallel data. The close-talking model is called the teacher model. The far-field model is called the student model. The student model is trained to imitate the output distributions of the teacher model. This constraint can be realized by minimizing the Kullback-Leibler (KL) divergence between the output distribution of the student model and the teacher model. Experimental results on AMI corpus show that the best student model achieves up to 4.7% absolute word error rate (WER) reduction when compared with the conventionally-trained baseline models.

Authors (4)

Jiangyan Yi (77 papers)
Jianhua Tao (139 papers)
Zhengqi Wen (69 papers)
Bin Liu (441 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition (1802.06941v1)

Summary

Related Papers