Auxiliary CTC loss for improved alignment under heavy overlap
Determine whether adding an auxiliary CTC objective on the SURT encoder improves alignment between segments and audio during training and enhances modeling accuracy in highly overlapped speech conditions, thereby reducing word error rates.
References
We conjecture that an auxiliary CTC objective may be useful in aligning the segments to the corresponding audio during training, resulting in better modeling for high overlap sessions.
— Listening to Multi-talker Conversations: Modular and End-to-end Perspectives
(2402.08932 - Raj, 14 Feb 2024) in Chapter 6 (Streaming Unmixing and Recognition Transducers), Section “Effect of auxiliary objectives”