Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Invertible DNN-based nonlinear time-frequency transform for speech enhancement (1911.10764v2)

Published 25 Nov 2019 in eess.AS and cs.SD

Abstract: We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform~(STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect reconstruction is considered. An invertible nonlinear T-F transform is constructed by DNNs and learned from data so that the obtained transform is perfectly reconstructing filterbank.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Daiki Takeuchi (30 papers)
  2. Kohei Yatabe (39 papers)
  3. Yuma Koizumi (39 papers)
  4. Yasuhiro Oikawa (14 papers)
  5. Noboru Harada (48 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.