Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning (2003.03927v2)

Published 9 Mar 2020 in eess.AS and cs.SD

Abstract: Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed spatial features are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. In this architecture, time-domain filters spanning signal channels are trained to perform adaptive spatial filtering. These filters are implemented by a 2d convolution (conv2d) layer and their parameters are optimized using a speech separation objective function in a purely data-driven fashion. Furthermore, inspired by the IPD formulation, we design a conv2d kernel to compute the inter-channel convolution differences (ICDs), which are expected to provide the spatial cues that help to distinguish the directional sources. Evaluation results on simulated multi-channel reverberant WSJ0 2-mix dataset demonstrate that our proposed ICD based MCSS model improves the overall signal-to-distortion ratio by 10.4% over the IPD based MCSS model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Rongzhi Gu (28 papers)
  2. Shi-Xiong Zhang (48 papers)
  3. Lianwu Chen (14 papers)
  4. Yong Xu (432 papers)
  5. Meng Yu (65 papers)
  6. Dan Su (101 papers)
  7. Yuexian Zou (119 papers)
  8. Dong Yu (329 papers)
Citations (57)