Enrollment-based prefixing to reduce error propagation across utterance groups
Determine whether using enrollment utterances for speaker prefixing, instead of previously recognized frames, reduces error propagation across utterance groups during SURT’s speaker label reconciliation and improves session-level cpWER.
References
We conjecture that when enrollment utterances are not used, speaker attribution errors in earlier chunks can adversely impact performance on current chunk, since the buffer frames are used to guide the relative order.
— Listening to Multi-talker Conversations: Modular and End-to-end Perspectives
(2402.08932 - Raj, 14 Feb 2024) in Chapter 7 (Speaker Attribution in the SURT Framework), Section “Utterance-group evaluation on AMI”