Dice Question Streamline Icon: https://streamlinehq.com

Enrollment-based prefixing to reduce error propagation across utterance groups

Determine whether using enrollment utterances for speaker prefixing, instead of previously recognized frames, reduces error propagation across utterance groups during SURT’s speaker label reconciliation and improves session-level cpWER.

Information Square Streamline Icon: https://streamlinehq.com

Background

SURT performs speaker attribution with relative labels per utterance group and reconciles labels across groups via speaker prefixing. The authors hypothesize that prefixing with previously recognized frames can propagate early speaker-tagging errors, whereas using enrollment utterances mitigates this.

Establishing this effect would justify enrollment-based prefixing for robust session-level speaker attribution.

References

We conjecture that when enrollment utterances are not used, speaker attribution errors in earlier chunks can adversely impact performance on current chunk, since the buffer frames are used to guide the relative order.

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives (2402.08932 - Raj, 14 Feb 2024) in Chapter 7 (Speaker Attribution in the SURT Framework), Section “Utterance-group evaluation on AMI”