Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion (2308.02263v1)

Published 4 Aug 2023 in cs.SD, cs.CL, and eess.AS

Abstract: Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we present an improvement for speech enhancement models that maintains the expressiveness of self-attention while significantly reducing model complexity, which we have termed Spectrum Attention Fusion. We carefully construct a convolutional module to replace several self-attention layers in a speech Transformer, allowing the model to more efficiently fuse spectral features. Our proposed model is able to achieve comparable or better results against SOTA models but with significantly smaller parameters (0.58M) on the Voice Bank + DEMAND dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jinyu Long (1 paper)
  2. Jetic Gū (3 papers)
  3. Binhao Bai (1 paper)
  4. Zhibo Yang (43 papers)
  5. Ping Wei (26 papers)
  6. Junli Li (24 papers)

Summary

We haven't generated a summary for this paper yet.