BER: Balanced Error Rate For Speaker Diarization (2211.04304v1)

Published 8 Nov 2022 in cs.SD, cs.CL, and eess.AS

Abstract: DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones. Short segments, e.g., yes' orno,' still have semantic information. Besides, DER overlooks errors in less-talked speakers. Although JER balances speaker errors, it still suffers from the same dilemma. Considering all those aspects, duration error, segment error, and speaker-weighted error constituting a complete diarization evaluation, we propose a Balanced Error Rate (BER) to evaluate speaker diarization. First, we propose a segment-level error rate (SER) via connected sub-graphs and adaptive IoU threshold to get accurate segment matching. Second, to evaluate diarization in a unified way, we adopt a speaker-specific harmonic mean between duration and segment, followed by a speaker-weighted average. Third, we analyze our metric via the modularized system, EEND, and the multi-modal method on real datasets. SER and BER are publicly available at https://github.com/X-LANCE/BER.

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

BER: Balanced Error Rate For Speaker Diarization (2211.04304v1)

Summary

Related Papers