Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers (2203.17068v2)

Published 31 Mar 2022 in eess.AS and cs.SD

Abstract: In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting. Our proposed framework integrates speaker diarization based on end-to-end neural diarization (EEND) models, speaker counting with encoder-decoder based attractors (EDA), and speech separation using Conv-TasNet. In addition, we propose a multiple 1x1 convolutional layer architecture for estimating the separation masks corresponding to a flexible number of speakers and a fusion technique for refining the separated speech signal with obtained speaker diarization information to improve the joint framework. Experiments using the LibriMix dataset show that our proposed method outperforms the single-task baselines in both diarization and separation metrics for fixed and flexible numbers of speakers and improves speaker counting performance for flexible numbers of speakers. All materials will be open-sourced and reproducible in ESPnet toolkit.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Soumi Maiti (26 papers)
  2. Yushi Ueda (7 papers)
  3. Shinji Watanabe (416 papers)
  4. Chunlei Zhang (40 papers)
  5. Meng Yu (65 papers)
  6. Shi-Xiong Zhang (48 papers)
  7. Yong Xu (432 papers)
Citations (28)

Summary

We haven't generated a summary for this paper yet.