Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition (2402.18923v1)
Abstract: Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)
- “World stroke organization (wso): global stroke fact sheet 2022,” International Journal of Stroke, vol. 17, no. 1, pp. 18–29, 2022.
- “The incidence, co-occurrence, and predictors of dysphagia, dysarthria, and aphasia after first-ever acute ischemic stroke,” Journal of communication disorders, vol. 46, no. 3, pp. 238–248, 2013.
- “The influence of acquired dysarthria on conversational turn-taking,” Clinical linguistics & phonetics, vol. 15, no. 5, pp. 383–398, 2001.
- “A feasibility randomized controlled trial of readyspeech for people with dysarthria after stroke,” Clinical Rehabilitation, vol. 32, no. 8, pp. 1037–1046, 2018.
- “Perceptual evaluation for automatic anomaly detection in disordered speech: Focus on ambiguous cases,” Speech Communication, vol. 105, pp. 23–33, 2018.
- HyangHee Kim, “Dysarthria evaluation,” Communication Sciences & Disorders, pp. 23–28, 2005.
- Leeseul Shim Jiyeon Han, Okbun Lee, “The study of breath group based on oral airflow in reading by healthy speakers,” Speech Sciences, vol. 15, no. 4, pp. 135–146, 2008.
- “Speech and pause characteristics following speech rate reduction in hypokinetic dysarthria,” Journal of Communication Disorders, vol. 29, no. 6, pp. 429–445, 1996.
- “Automatic method of pause measurement for normal and dysarthric speech,” Clinical Linguistics & Phonetics, vol. 24, no. 2, pp. 141–154, 2010.
- “Algorithmic estimation of pauses in extended speech samples of dysarthric and typical speech,” J. Med. Speech. Lang. Pathol., vol. 12, no. 4, pp. 149–154, 2004.
- “Improving automatic forced alignment for dysarthric speech transcription,” in Proc. Interspeech 2015, 2015, pp. 2991–2995.
- G. Diwakar and Veena Karjigi, “Improving speech to text alignment based on repetition detection for dysarthric speech,” Circuits, Systems, and Signal Processing, vol. 39, no. 11, pp. 5543–5567, 2020.
- Sim Hyun Sub Kim Ki Eun, “The reading rate characteristics of adults with cerebral palsy,” Journal of Special Education, vol. 34, no. 4, pp. 49–72, 2001.
- “Speech rate and pause characteristics in patients with parkinson’s disease,” Phonetics and Speech Sciences, vol. 2, no. 4, pp. 173–184, 2010.
- John S Garofolo, “Timit acoustic phonetic continuous speech corpus,” Linguistic Data Consortium, 1993, 1993.
- “The impact of parkinson’s disease on breath pauses and their relationship to speech impairment: A longitudinal study,” American Journal of Speech-Language Pathology, vol. 29, pp. 1–13, 07 2020.
- “Robust speech recognition via large-scale weak supervision,” arXiv preprint arXiv:2212.04356, 2022.
- “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2017.
- “Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi,” in Proc. Interspeech 2017, 2017, pp. 498–502.
- “Conformer: Convolution-augmented transformer for speech recognition,” in Proc. Interspeech 2020, 2020, pp. 5036–5040.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 12449–12460.