Dice Question Streamline Icon: https://streamlinehq.com

Auxiliary CTC loss for improved alignment under heavy overlap

Determine whether adding an auxiliary CTC objective on the SURT encoder improves alignment between segments and audio during training and enhances modeling accuracy in highly overlapped speech conditions, thereby reducing word error rates.

Information Square Streamline Icon: https://streamlinehq.com

Background

In ablations, adding an auxiliary CTC loss to SURT’s encoder improved performance particularly on high-overlap conditions (e.g., OV30/OV40 in LibriCSS). The authors hypothesize the CTC objective helps align segmental structure to the audio during training.

Testing this conjecture would clarify the mechanism by which CTC regularization benefits overlapped-speech modeling and guide loss design.

References

We conjecture that an auxiliary CTC objective may be useful in aligning the segments to the corresponding audio during training, resulting in better modeling for high overlap sessions.

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives (2402.08932 - Raj, 14 Feb 2024) in Chapter 6 (Streaming Unmixing and Recognition Transducers), Section “Effect of auxiliary objectives”