Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions (2110.14139v1)

Published 27 Oct 2021 in eess.AS and cs.SD

Abstract: The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful investigation of applying multi-channel Conv-TasNet based speech enhancement to both simulation and real data. Our preliminary experiments show a large performance gap between the two conditions in terms of the ASR performance. Several approaches are applied to close this gap, including the integration of multi-channel Conv-TasNet into the beamforming model with various strategies, and the joint training of speech enhancement and speech recognition models. Our experiments on the CHiME-4 corpus show that our proposed approaches can greatly reduce the speech recognition performance discrepancy between simulation and real data, while preserving the strong speech enhancement capability in the frontend.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Wangyou Zhang (35 papers)
  2. Jing Shi (123 papers)
  3. Chenda Li (21 papers)
  4. Shinji Watanabe (416 papers)
  5. Yanmin Qian (97 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.