Multi-View Spectrogram Transformer for Respiratory Sound Classification (2311.09655v3)

Published 16 Nov 2023 in cs.SD, cs.CV, and eess.AS

Abstract: Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.

References (25)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/AudioAndSpeech/status/1796513209106518043

Multi-View Spectrogram Transformer for Respiratory Sound Classification (2311.09655v3)

Summary

Related Papers

Tweets