Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints (2403.14268v1)

Published 21 Mar 2024 in eess.AS and cs.SD

Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Transformer encoders at the lower layer of EEND-EDA model to enhance the effect of self-attention modules using speaker activity information. The results evaluated on public dataset Mini LibriSpeech, demonstrates the effectiveness of the work, reducing Diarization Error Rate from 30.95% to 28.17%. We will release the source code on GitHub to allow further research and reproducibility.

Authors (3)

PeiYing Lee (1 paper)
HauYun Guo (1 paper)
Berlin Chen (53 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/ArxivSound/status/1771025064202281459

https://twitter.com/AudioAndSpeech/status/1771062302990643342

Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints (2403.14268v1)

Summary

Related Papers

Tweets