Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints (2403.14268v1)

Published 21 Mar 2024 in eess.AS and cs.SD

Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Transformer encoders at the lower layer of EEND-EDA model to enhance the effect of self-attention modules using speaker activity information. The results evaluated on public dataset Mini LibriSpeech, demonstrates the effectiveness of the work, reducing Diarization Error Rate from 30.95% to 28.17%. We will release the source code on GitHub to allow further research and reproducibility.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. PeiYing Lee (1 paper)
  2. HauYun Guo (1 paper)
  3. Berlin Chen (53 papers)

Summary

We haven't generated a summary for this paper yet.