Deeper dual-path LSTM masking networks for reverberant unmixing
Determine whether increasing the depth of the dual-path LSTM masking network in SURT improves unmixing of reverberant features and reduces word error rates in replayed far-field conditions.
References
We conjecture that the larger masking network (6 DP-LSTM layers) may be better suited for unmixing reverberant features, leading to improved WERs.
— Listening to Multi-talker Conversations: Modular and End-to-end Perspectives
(2402.08932 - Raj, 14 Feb 2024) in Chapter 6 (Streaming Unmixing and Recognition Transducers), Section “Results on LibriCSS”