Multi-View Spectrogram Transformer for Respiratory Sound Classification (2311.09655v3)
Abstract: Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.
- “An open access database for the evaluation of respiratory sound classification algorithms,” Physiological Measurement, vol. 40, no. 3, pp. 035001, 2019.
- “Lung sound classification using co-tuning and stochastic normalization,” IEEE Transactions on Biomedical Engineering, vol. 69, no. 9, pp. 2872–2882, 2022.
- “ARSC-Net: Adventitious respiratory sound classification network using parallel paths with channel-spatial attention,” in IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021, pp. 1125–1130.
- “LungBRN: A smart digital stethoscope for detecting respiratory disease using Bi-ResNet deep learning algorithm,” in IEEE Biomedical Circuits and Systems Conference (BioCAS), 2019, pp. 1–4.
- “LungAttn: advanced lung sound classification using attention mechanism with dual TQWT and triple STFT spectrogram,” Physiological Measurement, vol. 42, no. 10, pp. 105006, 2021.
- “RespireNet: A deep neural network for accurately detecting abnormal lung sounds in limited data setting,” in 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2021, pp. 527–530.
- “A domain transfer based data augmentation method for automated respiratory classification,” in ICASSP, 2022, pp. 9017–9021.
- Ziping Zhao et al., “Automatic respiratory sound classification via multi-branch temporal convolutional network,” in ICASSP, 2022, pp. 9102–9106.
- “A contrastive embedding-based domain adaptation method for lung sound recognition in children community-acquired pneumonia,” in ICASSP, 2023, pp. 1–5.
- “Pretraining respiratory sound representations using metadata and contrastive learning,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023.
- “Deep learning for audio signal processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 2, pp. 206–219, 2019.
- “PANNs: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020.
- Jianqiang Li et al., “Explainable CNN with fuzzy tree regularization for respiratory sound analysis,” IEEE Transactions on Fuzzy Systems, vol. 30, no. 6, pp. 1516–1528, 2022.
- “Regularized 2-D complex-log spectral analysis and subspace reliability analysis of micro-Doppler signature for UAV detection,” Pattern Recognition, vol. 69, pp. 225–237, 2017.
- “A three-step classification framework to handle complex data distribution for radar UAV detection,” Pattern Recognition, vol. 111, pp. 107709, 2021.
- Alexey Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021.
- “Hierarchical ConViT with attention-based relational reasoner for visual analogical reasoning,” in AAAI, 2023, vol. 37, pp. 22–30.
- “VLT: Vision-language transformer and query generation for referring segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7900–7916, 2023.
- “Attention-based dual-stream vision transformer for radar gait recognition,” in ICASSP, 2022, pp. 3668–3672.
- “Dual-stream siamese vision transformer with mutual attention for radar gait verification,” in ICASSP, 2023, pp. 1–5.
- “SSAST: Self-supervised audio spectrogram transformer,” in AAAI, 2022, vol. 36, pp. 10699–10709.
- “Patch-mix contrastive learning with audio spectrogram transformer on respiratory sound classification,” in INTERSPEECH, 2023, pp. 5436–5440.
- “Sound-event classification using robust texture features for robot hearing,” IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 447–458, 2016.
- “Cross-document attention-based gated fusion network for automated medical licensing exam,” Expert Systems With Applications, vol. 205, pp. 117588, 2022.
- “Layer normalization,” in NIPS – Deep Learning Symposium, 2016.