Papers
Topics
Authors
Recent
Search
2000 character limit reached

WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation

Published 27 Jun 2025 in eess.AS and cs.SD | (2506.22001v1)

Abstract: Current multi-channel speech enhancement systems mainly adopt single-output architecture, which face significant challenges in preserving spatio-temporal signal integrity during multiple-input multiple-output (MIMO) processing. To address this limitation, we propose a novel neural network, termed WTFormer, for MIMO speech enhancement that leverages the multi-resolution characteristics of wavelet transform and multi-dimensional collaborative attention to effectively capture globally distributed spatial features, while using Conformer for time-frequency modeling. A multi task loss strategy accompanying MUSIC algorithm is further proposed for optimization training to protect spatial information to the greatest extent. Experimental results on the LibriSpeech dataset show that WTFormer can achieve comparable denoising performance to advanced systems while preserving more spatial information with only 0.98M parameters.

Authors (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.