Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online End-to-End Neural Diarization with Speaker-Tracing Buffer (2006.02616v2)

Published 4 Jun 2020 in eess.AS and cs.SD

Abstract: This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker regions incorrectly across the recording. To circumvent this inconsistency, we proposed a speaker-tracing buffer mechanism that selects several input frames representing the speaker permutation information from previous chunks and stores them in a buffer. These buffered frames are stacked with the input frames in the current chunk and fed into a self-attention network. Our method ensures consistent diarization outputs across the buffer and the current chunk by checking the correlation between their corresponding outputs. Additionally, we trained SA-EEND with variable chunk-sizes to mitigate the mismatch between training and inference introduced by the speaker-tracing buffer mechanism. Experimental results, including online SA-EEND and variable chunk-size, achieved DERs of 12.54% for CALLHOME and 20.77% for CSJ with 1.4s actual latency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yawen Xue (10 papers)
  2. Shota Horiguchi (45 papers)
  3. Yusuke Fujita (37 papers)
  4. Shinji Watanabe (416 papers)
  5. Kenji Nagamatsu (19 papers)
Citations (44)

Summary

We haven't generated a summary for this paper yet.