Exploring the Capability of Mamba in Speech Applications (2406.16808v1)

Published 24 Jun 2024 in cs.SD and eess.AS

Abstract: This paper explores the capability of Mamba, a recently proposed architecture based on state space models (SSMs), as a competitive alternative to Transformer-based models. In the speech domain, well-designed Transformer-based models, such as the Conformer and E-Branchformer, have become the de facto standards. Extensive evaluations have demonstrated the effectiveness of these Transformer-based models across a wide range of speech tasks. In contrast, the evaluation of SSMs has been limited to a few tasks, such as automatic speech recognition (ASR) and speech synthesis. In this paper, we compared Mamba with state-of-the-art Transformer variants for various speech applications, including ASR, text-to-speech, spoken language understanding, and speech summarization. Experimental evaluations revealed that Mamba achieves comparable or better performance than Transformer-based models, and demonstrated its efficiency in long-form speech processing.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

Authors (3)

Koichi Miyazaki (6 papers)
Yoshiki Masuyama (30 papers)
Masato Murata (4 papers)

Citations (8)

View on Semantic Scholar

Tweets

https://twitter.com/ymas0315/status/1830320985599541320

Exploring the Capability of Mamba in Speech Applications (2406.16808v1)

Related Papers

Tweets