Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts (2211.06493v2)

Published 11 Nov 2022 in eess.AS, cs.SD, and eess.SP

Abstract: Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model improves the SS performance, it also requires a higher computational cost. Second, an SS model that is more optimized for handling overlapped speech is likely to introduce more processing artifacts in non-overlapped-speech regions. In this paper, we address these trade-offs with a sparsely-gated mixture-of-experts (MoE) architecture. Comprehensive evaluation results obtained using both simulated and real meeting recordings show that our proposed sparsely-gated MoE SS model achieves superior separation capabilities with less speech distortion, while involving only a marginal run-time cost increase.

Authors (6)

Xiaofei Wang (138 papers)
Zhuo Chen (319 papers)
Yu Shi (153 papers)
Jian Wu (314 papers)
Naoyuki Kanda (61 papers)
Takuya Yoshioka (77 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts (2211.06493v2)

Summary

Related Papers