Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts (2211.06493v2)

Published 11 Nov 2022 in eess.AS, cs.SD, and eess.SP

Abstract: Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model improves the SS performance, it also requires a higher computational cost. Second, an SS model that is more optimized for handling overlapped speech is likely to introduce more processing artifacts in non-overlapped-speech regions. In this paper, we address these trade-offs with a sparsely-gated mixture-of-experts (MoE) architecture. Comprehensive evaluation results obtained using both simulated and real meeting recordings show that our proposed sparsely-gated MoE SS model achieves superior separation capabilities with less speech distortion, while involving only a marginal run-time cost increase.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaofei Wang (138 papers)
  2. Zhuo Chen (319 papers)
  3. Yu Shi (153 papers)
  4. Jian Wu (314 papers)
  5. Naoyuki Kanda (61 papers)
  6. Takuya Yoshioka (77 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.