Deeper dual-path LSTM masking networks for reverberant unmixing

Determine whether increasing the depth of the dual-path LSTM masking network in SURT improves unmixing of reverberant features and reduces word error rates in replayed far-field conditions.

Background

SURT with a larger masking network (more DP-LSTM layers) yielded larger gains on the replayed LibriCSS condition than on the anechoic condition. The authors hypothesize the deeper network is better at handling reverberation during unmixing.

Assessing this conjecture would inform capacity allocation for the masking module under far-field reverberant scenarios.

References

We conjecture that the larger masking network (6 DP-LSTM layers) may be better suited for unmixing reverberant features, leading to improved WERs.

— Listening to Multi-talker Conversations: Modular and End-to-end Perspectives (2402.08932 - Raj, 14 Feb 2024) in Chapter 6 (Streaming Unmixing and Recognition Transducers), Section “Results on LibriCSS”

Deeper dual-path LSTM masking networks for reverberant unmixing

Sponsor

Background

References

Related Problems