Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-Latency Speech Separation Guided Diarization for Telephone Conversations (2204.02306v2)

Published 5 Apr 2022 in eess.AS

Abstract: In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly reduces the false alarm errors of a SSGD pipeline. We perform our experiments on two datasets: Fisher Corpus Part 1 and CALLHOME, evaluating both separation and diarization metrics. Notably, our SSGD DPRNN-based online model achieves 11.1% DER on CALLHOME, comparable with most state-of-the-art end-to-end neural diarization models despite being trained on an order of magnitude less data and having considerably lower latency, i.e., 0.1 vs. 10 seconds. We also show that the separated signals can be readily fed to a speech recognition back-end with performance close to the oracle source signals.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Giovanni Morrone (10 papers)
  2. Samuele Cornell (41 papers)
  3. Desh Raj (32 papers)
  4. Luca Serafini (11 papers)
  5. Enrico Zovato (7 papers)
  6. Alessio Brutti (30 papers)
  7. Stefano Squartini (17 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.