Fused Audio Instance and Representation for Respiratory Disease Detection
Abstract: Audio-based classification techniques on body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of cough as the main biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on COVID-19 have shown that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes Fused Audio Instance and Representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conducted experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation.
- Song, I. Diagnosis of pneumonia from sounds collected using low cost cell phones. In 2015 International Joint Conference on Neural Networks (IJCNN), 1–8, DOI: 10.1109/IJCNN.2015.7280317 (IEEE, Killarney, 2015).
- COVID-19 Artificial Intelligence Diagnosis Using Only Cough Recordings. \JournalTitleIEEE Open Journal of Engineering in Medicine and Biology 1, 275–281, DOI: 10.1109/OJEMB.2020.3026928 (2020).
- Botha, G. H. R. et al. Detection of tuberculosis by automatic cough sound analysis. \JournalTitlePhysiological Measurement 39, 045005, DOI: 10.1088/1361-6579/aab6d0 (2018).
- Deep Learning on Computerized Analysis of Chronic Obstructive Pulmonary Disease. \JournalTitleIEEE Journal of Biomedical and Health Informatics 24, 1344–1350, DOI: 10.1109/JBHI.2019.2931395 (2020).
- Zhang, H. et al. PDVocal: Towards Privacy-preserving Parkinson’s Disease Detection using Non-speech Body Sounds. In The 25th Annual International Conference on Mobile Computing and Networking, 1–16, DOI: 10.1145/3300061.3300125 (ACM, Los Cabos Mexico, 2019).
- Kalkbrenner, C. et al. Apnea and heart rate detection from tracheal body sounds for the diagnosis of sleep-related breathing disorders. \JournalTitleMedical & Biological Engineering & Computing 56, 671–681, DOI: 10.1007/s11517-017-1706-y (2018).
- Astuti, I. & Ysrafil. Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2): An overview of viral structure and host response. \JournalTitleDiabetes & Metabolic Syndrome: Clinical Research & Reviews 14, 407–412, DOI: 10.1016/j.dsx.2020.04.020 (2020).
- Huang, Y. et al. The respiratory sound features of COVID-19 patients fill gaps between clinical data and screening methods. preprint, Infectious Diseases (except HIV/AIDS) (2020). DOI: 10.1101/2020.04.07.20051060.
- Detection of Covid-19 Through the Analysis of Vocal Fold Oscillations. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1035–1039, DOI: 10.1109/ICASSP39728.2021.9414201 (IEEE, Toronto, ON, Canada, 2021).
- Artificial intelligence enabled preliminary diagnosis for COVID-19 from voice cues and questionnaires. \JournalTitleThe Journal of the Acoustical Society of America 149, 1120–1124, DOI: 10.1121/10.0003434 (2021).
- Suppakitjanusant, P. et al. Identifying individuals with recent COVID-19 through voice classification using deep learning. \JournalTitleScientific Reports 11, 19149, DOI: 10.1038/s41598-021-98742-x (2021).
- Pahar, M. et al. Automatic cough classification for tuberculosis screening in a real-world environment. \JournalTitlePhysiological Measurement 42, 105014, DOI: 10.1088/1361-6579/ac2fb8 (2021).
- Multimedia Respiratory Database (RespiratoryDatabase@TR): Auscultation Sounds and Chest X-rays. \JournalTitleNatural and Engineering Sciences 2, 59–72, DOI: 10.28978/nesciences.349282 (2017).
- Huang, N. E. et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. \JournalTitleProceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 454, 903–995, DOI: 10.1098/rspa.1998.0193 (1998).
- Xu, X. et al. Listen2Cough: Leveraging End-to-End Deep Learning Cough Detection Model to Enhance Lung Health Assessment Using Passively Sensed Audio. \JournalTitleProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 1–22, DOI: 10.1145/3448124 (2021).
- An accurate deep learning model for wheezing in children using real world data. \JournalTitleScientific Reports 12, 22465, DOI: 10.1038/s41598-022-25953-1 (2022). Number: 1 Publisher: Nature Publishing Group.
- Petmezas, G. et al. Automated Lung Sound Classification Using a Hybrid CNN-LSTM Network and Focal Loss Function. \JournalTitleSensors 22, 1232, DOI: 10.3390/s22031232 (2022). Number: 3 Publisher: Multidisciplinary Digital Publishing Institute.
- Past and Trends in Cough Sound Acquisition, Automatic Detection and Automatic Classification: A Comparative Review. \JournalTitleSensors 22, 2896, DOI: 10.3390/s22082896 (2022). Number: 8 Publisher: Multidisciplinary Digital Publishing Institute.
- Exploring machine learning for audio-based respiratory condition screening: A concise review of databases, methods, and open issues. \JournalTitleExperimental Biology and Medicine 247, 2053–2061, DOI: 10.1177/15353702221115428 (2022). Publisher: SAGE Publications.
- The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. \JournalTitleScientific Data 8, 156, DOI: 10.1038/s41597-021-00937-4 (2021).
- Sharma, N. et al. Coswara — A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis. In Interspeech 2020, 4811–4815, DOI: 10.21437/Interspeech.2020-2768 (ISCA, 2020).
- Brown, C. et al. Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data. \JournalTitleProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 3474–3484, DOI: 10.1145/3394486.3412865 (2020).
- Fakhry, A. et al. Virufy: A Multi-Branch Deep Learning Network for Automated Detection of. \JournalTitlearXiv preprint arXiv:2103.01806 1–9 (2021).
- Audio feature ranking for sound-based COVID-19 patient detection. \JournalTitlearXiv:2104.07128 [cs, eess] 1–22 (2021).
- COVID-19 cough classification using machine learning and global smartphone recordings. \JournalTitleComputers in Biology and Medicine 135, 104572, DOI: 10.1016/j.compbiomed.2021.104572 (2021).
- Deep Learning with hyper-parameter tuning for COVID-19 Cough Detection. In 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), 1–5, DOI: 10.1109/IISA52424.2021.9555564 (IEEE, Chania Crete, Greece, 2021).
- Very Deep Convolutional Networks for Large-Scale Image Recognition. \JournalTitlearXiv:1409.1556 [cs] (2015). ArXiv: 1409.1556.
- Xia, T. et al. COVID-19 Sounds: A Large-Scale Audio Dataset for Digital Respiratory Screening. In Proceedings of the 35th Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 1–13 (2021).
- A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs. \JournalTitleSensors 22, 5566, DOI: 10.3390/s22155566 (2022). Number: 15 Publisher: Multidisciplinary Digital Publishing Institute.
- Harvill, J. et al. Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation. In Interspeech 2021, 926–930, DOI: 10.21437/Interspeech.2021-799 (ISCA, 2021).
- Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification. \JournalTitlearXiv:2105.07566 [cs, eess] 1–9 (2021).
- Pinkas, G. et al. SARS-CoV-2 Detection From Voice. \JournalTitleIEEE Open Journal of Engineering in Medicine and Biology 1, 268–274, DOI: 10.1109/OJEMB.2020.3026468 (2020).
- Muguli, A. et al. DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics. \JournalTitlearXiv:2103.09148 [cs, eess] (2021).
- Vaswani, A. et al. Attention is All you Need. \JournalTitleAdvances in neural information processing systems 30, 1–11 (2017).
- How Transferable are Self-supervised Features in Medical Image Classification Tasks? In Proceedings of Machine Learning for Health, 54–74 (PMLR, 2021). ISSN: 2640-3498.
- Signal estimation from modified short-time Fourier transform. \JournalTitleIEEE Transactions on Acoustics, Speech, and Signal Processing 32, 236–243, DOI: 10.1109/TASSP.1984.1164317 (1984). Conference Name: IEEE Transactions on Acoustics, Speech, and Signal Processing.
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. \JournalTitlearXiv:2006.11477 [cs, eess] 1–19 (2020).
- Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. \JournalTitlearXiv:2010.11929 [cs] (2021).
- Touvron, H. et al. Training data-efficient image transformers & distillation through attention. In Proceedings of Machine Learning Research, vol. 139, 10347–10357 (2021).
- Bhattacharya, D. et al. Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection. \JournalTitleScientific Data 10, 397, DOI: 10.1038/s41597-023-02266-0 (2023). Number: 1 Publisher: Nature Publishing Group.
- Decoupled Weight Decay Regularization. \JournalTitlearXiv:1711.05101 [cs, math] 1–19 (2019).
- Scheiblauer, H. et al. Comparative sensitivity evaluation for 122 CE-marked rapid diagnostic tests for SARS-CoV-2 antigen, Germany, September 2020 to April 2021. \JournalTitleEurosurveillance 26, 1–13, DOI: 10.2807/1560-7917.ES.2021.26.44.2100441 (2021).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.