Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation (2012.13442v2)

Published 24 Dec 2020 in eess.AS and cs.SD

Abstract: Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved in the MVDR solution is sometimes numerically unstable during joint training with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The proposed MCMF ADL-MVDR system addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed systems are evaluated using a Mandarin audio-visual corpus and are compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed systems under different scenarios and across several objective evaluation metrics, including ASR performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhuohuang Zhang (7 papers)
  2. Yong Xu (432 papers)
  3. Meng Yu (65 papers)
  4. Shi-Xiong Zhang (48 papers)
  5. Lianwu Chen (14 papers)
  6. Donald S. Williamson (12 papers)
  7. Dong Yu (329 papers)
Citations (26)

Summary

We haven't generated a summary for this paper yet.