Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mamba in Speech: Towards an Alternative to Self-Attention (2405.12609v5)

Published 21 May 2024 in eess.AS and cs.SD

Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and computer vision tasks, but its superiority has rarely been investigated in speech signal processing. This paper explores solutions for applying Mamba to speech processing by discussing two typical speech processing tasks: speech recognition, which requires semantic and sequential information, and speech enhancement, which focuses primarily on sequential patterns. The experimental results show the superiority of bidirectional Mamba~(BiMamba) for speech processing to vanilla Mamba. Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task. The crucial technologies for transferring Mamba to speech are then summarized in ablation studies and the discussion section to offer insights for future research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xiangyu Zhang (328 papers)
  2. Qiquan Zhang (20 papers)
  3. Hexin Liu (35 papers)
  4. Tianyi Xiao (4 papers)
  5. Xinyuan Qian (30 papers)
  6. Beena Ahmed (14 papers)
  7. Eliathamby Ambikairajah (11 papers)
  8. Haizhou Li (285 papers)
  9. Julien Epps (15 papers)
Citations (28)