Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model (2310.13010v1)
Abstract: We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders. We combine this classifier with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings. Our model compresses long sequences into a small set of class-specific latent representations and a factorized projection is used to predict different attributes of the disordered input speech. The benefit of our approach is that it allows us to model different regions of the input for different classes and is at the same time data efficient. We evaluated the proposed model extensively on a curated corpus from the Mayo Clinic. Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%. With limited task-specific data, we find that pretraining is important and surprisingly pretraining with the unrelated automatic speech recognition (ASR) task is also beneficial. Encodings from the middle layers provide a mix of both acoustic and phonetic information and achieve best prediction results compared to just using the final layer encodings (83.1% vs. 79.6%). The results are promising and with further refinements may help clinicians detect speech abnormalities without needing access to highly specialized speech-language pathologists.
- “Global, regional, and national burden of neurological disorders during 1990–2015: a systematic analysis for the global burden of disease study 2015,” The Lancet Neurology, 2017.
- “Who/wfn survey of neurological services: a worldwide perspective,” Journal of the neurological sciences, 2006.
- “Progressive apraxia of speech: delays to diagnosis and rates of alternative diagnoses,” Journal of neurology, vol. 268, no. 12, pp. 4752–4758, 2021.
- “Google USM: Scaling automatic speech recognition beyond 100 languages,” 2023.
- “Perceiver: General perception with iterative attention,” in ICML, 2021.
- M. Vashkevich and Yu. Rushkevich, “Classification of ALS patients based on acoustic analysis of sustained vowel phonations,” Biomedical Signal Processing and Control, vol. 65, pp. 102350, Mar. 2021.
- “Detecting Bulbar Involvement in Patients with Amyotrophic Lateral Sclerosis Based on Phonatory and Time-Frequency Features,” Sensors, 2022.
- “Development of digital voice biomarkers and associations with cognition, cerebrospinal biomarkers, and neural representation in early Alzheimer’s disease,” Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 2023.
- “Robust Detection of Parkinson’s Disease Using Harvested Smartphone Voice Data: A Telemedicine Approach,” Telemedicine and e-Health, 2020, Publisher: Mary Ann Liebert.
- “Fully automated assessment of the severity of parkinson’s disease from speech,” Computer Speech & Language, 2015.
- “Inferring clinical depression from speech and spoken utterances,” in Proc. MLSP, 2014.
- “Vocal markers of autism: Assessing the generalizability of machine learning models,” Autism Research, 2022.
- “Emotional Speech Recognition with Pre-trained Deep Visual Models,” 2022, arXiv:2204.03561.
- “End-to-end deep learning approach for Parkinson disease detection from speech signals,” Biocybernetics and Biomedical Engineering, vol. 42, no. 2, 2022.
- “Automatic Speech Classifier for Mild Cognitive Impairment and Early Dementia,” ACM Trans. Comput. Healthcare, 2021.
- “Data2vec: A general framework for self-supervised learning in speech, vision and language,” in ICML, 2022.
- F. Agbavor and H. Liang, “Artificial Intelligence-Enabled End-To-End Detection and Assessment of Alzheimer’s Disease Using Voice,” Brain Sciences, 2023.
- “Dysarthric Speech Database for Universal Access Research,” Proc. Interspeech, 2008.
- “The TORGO database of acoustic and articulatory speech from speakers with dysarthria,” Language Resources and Evaluation, vol. 46, no. 4, pp. 523–541, Dec. 2012.
- “On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches,” Nov. 2022, arXiv:2211.08833 [eess].
- “Developing a large scale population screening tool for the assessment of Parkinson’s disease using telephone-quality voice,” The Journal of the Acoustical Society of America, 2019.
- “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL/HLT, June 2019.
- “Improving language understanding by generative pre-training,” Technical Report, 2018.
- “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, 2011.
- “Long short-term memory,” Neural computation, 1997.
- “Set transformer: A framework for attention-based permutation-invariant neural networks,” 2019.
- “Proposal-based few-shot sound event detection for speech and environmental sounds with perceivers,” in arXiv, 2021.
- Joseph R Duffy, Motor speech disorders: Substrates, differential diagnosis, and management, Elsevier Health Sciences, 2019.
- “Voxtester, software for digital evaluation of speech changes in parkinson disease,” in 2016 IEEE International Symposium on Medical Measurements and Applications (MeMeA). IEEE, 2016, pp. 1–6.
- “Assessment of speech intelligibility in parkinson’s disease using a speech-to-text system,” IEEE Access, vol. 5, pp. 22199–22208, 2017.
- “Conformer: Convolution-augmented transformer for speech recognition,” 2020.
- “Self-supervised learning with random-projection quantizer for speech recognition,” in arXiv, 2022.
- “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in Proc. ICML, 2006.
- “Attention is all you need,” in Advances in neural information processing systems, 2017.