Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-View Spectrogram Transformer for Respiratory Sound Classification (2311.09655v3)

Published 16 Nov 2023 in cs.SD, cs.CV, and eess.AS

Abstract: Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. “An open access database for the evaluation of respiratory sound classification algorithms,” Physiological Measurement, vol. 40, no. 3, pp. 035001, 2019.
  2. “Lung sound classification using co-tuning and stochastic normalization,” IEEE Transactions on Biomedical Engineering, vol. 69, no. 9, pp. 2872–2882, 2022.
  3. “ARSC-Net: Adventitious respiratory sound classification network using parallel paths with channel-spatial attention,” in IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021, pp. 1125–1130.
  4. “LungBRN: A smart digital stethoscope for detecting respiratory disease using Bi-ResNet deep learning algorithm,” in IEEE Biomedical Circuits and Systems Conference (BioCAS), 2019, pp. 1–4.
  5. “LungAttn: advanced lung sound classification using attention mechanism with dual TQWT and triple STFT spectrogram,” Physiological Measurement, vol. 42, no. 10, pp. 105006, 2021.
  6. “RespireNet: A deep neural network for accurately detecting abnormal lung sounds in limited data setting,” in 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2021, pp. 527–530.
  7. “A domain transfer based data augmentation method for automated respiratory classification,” in ICASSP, 2022, pp. 9017–9021.
  8. Ziping Zhao et al., “Automatic respiratory sound classification via multi-branch temporal convolutional network,” in ICASSP, 2022, pp. 9102–9106.
  9. “A contrastive embedding-based domain adaptation method for lung sound recognition in children community-acquired pneumonia,” in ICASSP, 2023, pp. 1–5.
  10. “Pretraining respiratory sound representations using metadata and contrastive learning,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023.
  11. “Deep learning for audio signal processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 2, pp. 206–219, 2019.
  12. “PANNs: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020.
  13. Jianqiang Li et al., “Explainable CNN with fuzzy tree regularization for respiratory sound analysis,” IEEE Transactions on Fuzzy Systems, vol. 30, no. 6, pp. 1516–1528, 2022.
  14. “Regularized 2-D complex-log spectral analysis and subspace reliability analysis of micro-Doppler signature for UAV detection,” Pattern Recognition, vol. 69, pp. 225–237, 2017.
  15. “A three-step classification framework to handle complex data distribution for radar UAV detection,” Pattern Recognition, vol. 111, pp. 107709, 2021.
  16. Alexey Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021.
  17. “Hierarchical ConViT with attention-based relational reasoner for visual analogical reasoning,” in AAAI, 2023, vol. 37, pp. 22–30.
  18. “VLT: Vision-language transformer and query generation for referring segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7900–7916, 2023.
  19. “Attention-based dual-stream vision transformer for radar gait recognition,” in ICASSP, 2022, pp. 3668–3672.
  20. “Dual-stream siamese vision transformer with mutual attention for radar gait verification,” in ICASSP, 2023, pp. 1–5.
  21. “SSAST: Self-supervised audio spectrogram transformer,” in AAAI, 2022, vol. 36, pp. 10699–10709.
  22. “Patch-mix contrastive learning with audio spectrogram transformer on respiratory sound classification,” in INTERSPEECH, 2023, pp. 5436–5440.
  23. “Sound-event classification using robust texture features for robot hearing,” IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 447–458, 2016.
  24. “Cross-document attention-based gated fusion network for automated medical licensing exam,” Expert Systems With Applications, vol. 205, pp. 117588, 2022.
  25. “Layer normalization,” in NIPS – Deep Learning Symposium, 2016.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com