Dice Question Streamline Icon: https://streamlinehq.com

Deeper dual-path LSTM masking networks for reverberant unmixing

Determine whether increasing the depth of the dual-path LSTM masking network in SURT improves unmixing of reverberant features and reduces word error rates in replayed far-field conditions.

Information Square Streamline Icon: https://streamlinehq.com

Background

SURT with a larger masking network (more DP-LSTM layers) yielded larger gains on the replayed LibriCSS condition than on the anechoic condition. The authors hypothesize the deeper network is better at handling reverberation during unmixing.

Assessing this conjecture would inform capacity allocation for the masking module under far-field reverberant scenarios.

References

We conjecture that the larger masking network (6 DP-LSTM layers) may be better suited for unmixing reverberant features, leading to improved WERs.

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives (2402.08932 - Raj, 14 Feb 2024) in Chapter 6 (Streaming Unmixing and Recognition Transducers), Section “Results on LibriCSS”