Invertible DNN-based nonlinear time-frequency transform for speech enhancement (1911.10764v2)

Published 25 Nov 2019 in eess.AS and cs.SD

Abstract: We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform~(STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect reconstruction is considered. An invertible nonlinear T-F transform is constructed by DNNs and learned from data so that the obtained transform is perfectly reconstructing filterbank.

Authors (5)

Daiki Takeuchi (30 papers)
Kohei Yatabe (39 papers)
Yuma Koizumi (39 papers)
Yasuhiro Oikawa (14 papers)
Noboru Harada (48 papers)

Citations (10)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Invertible DNN-based nonlinear time-frequency transform for speech enhancement (1911.10764v2)

Summary

Related Papers