Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification? (2401.05551v1)

Published 10 Jan 2024 in cs.CL, cs.SD, and eess.AS

Abstract: \textbf{Objectives}: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the Cookie Theft'' picture description task. We aimed to assess whether imperfect ASR-generated transcripts could provide valuable information for distinguishing between language samples from cognitively healthy individuals and those with Alzheimer's disease (AD). \textbf{Methods}: We conducted experiments using various ASR models, refining their transcripts with post-editing techniques. Both these imperfect ASR transcripts and manually transcribed ones were used as inputs for the downstream dementia classification. We conducted comprehensive error analysis to compare model performance and assess ASR-generated transcript effectiveness in dementia classification. \textbf{Results}: Imperfect ASR-generated transcripts surprisingly outperformed manual transcription for distinguishing between individuals with AD and those without in theCookie Theft'' task. These ASR-based models surpassed the previous state-of-the-art approach, indicating that ASR errors may contain valuable cues related to dementia. The synergy between ASR and classification models improved overall accuracy in dementia classification. \textbf{Conclusion}: Imperfect ASR transcripts effectively capture linguistic anomalies linked to dementia, improving accuracy in classification tasks. This synergy between ASR and classification models underscores ASR's potential as a valuable tool in assessing cognitive impairment and related clinical applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. 2022 alzheimer’s disease facts and figures, ALZHEIMERS & DEMENTIA 18 (2022) 700–789.
  2. The dementia diagnosis: a literature review of information, understanding, and attributions, Psychogeriatrics 15 (2015) 218–225. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/psyg.12095. doi:https://doi.org/10.1111/psyg.12095. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/psyg.12095.
  3. The pros and cons of early diagnosis in dementia, British Journal of General Practice 63 (2013) e510–e512.
  4. Sooner or later? issues in the early diagnosis of dementia in general practice: a qualitative study, Family Practice 20 (2003) 376–381.
  5. S. R. Sabat, Language function in alzheimer’s disease: a critical review of selected literature, Language & Communication 14 (1994) 331–351.
  6. C. Crockford, R. Lesser, Assessing functional communication in aphasia: Clinical utility and time demands of three methods, International Journal of Language & Communication Disorders 29 (1994) 165–182.
  7. Analysis of spontaneous, conversational speech in dementia of alzheimer type: Evaluation of an objective technique for analysing lexical performance, Aphasiology 14 (2000) 71–91.
  8. Ten years of research on automatic voice and speech analysis of people with alzheimer’s disease and mild cognitive impairment: a systematic review article, Frontiers in Psychology 12 (2021) 620251.
  9. R. Gruetzemacher, D. Paradice, Deep transfer learning & beyond: Transformer language models in information systems research, ACM Computing Surveys (CSUR) 54 (2022) 1–35.
  10. To BERT or not to BERT: Comparing Speech and Language-Based Approaches for Alzheimer’s Disease Detection, in: Proc. Interspeech 2020, 2020, pp. 2167–2171. doi:10.21437/Interspeech.2020-2557.
  11. Manual and automatic transcriptions in dementia detection from speech., in: Interspeech, 2017, pp. 3117–3121.
  12. Speech Recognition in Alzheimer’s Disease and in its Assessment, in: Proc. Interspeech 2016, 2016, pp. 1948–1952. doi:10.21437/Interspeech.2016-1228.
  13. wav2vec 2.0: A framework for self-supervised learning of speech representations, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems, volume 33, Curran Associates, Inc., 2020, pp. 12449–12460.
  14. Hubert: Self-supervised speech representation learning by masked prediction of hidden units, IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021) 3451–3460.
  15. Self-supervised speech representation learning: A review, arXiv preprint arXiv:2205.10643 (2022).
  16. Self-supervised representation learning: Introduction, advances, and challenges, IEEE Signal Processing Magazine 39 (2022) 42–62.
  17. Fully automated detection of formal thought disorder with time-series augmented representations for detection of incoherent speech (tardis), Journal of Biomedical Informatics 126 (2022) 103998. URL: https://www.sciencedirect.com/science/article/pii/S1532046422000144. doi:https://doi.org/10.1016/j.jbi.2022.103998.
  18. Evaluating automatic speech recognition quality and its impact on counselor utterance coding, in: Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access, Association for Computational Linguistics, Online, 2021, pp. 159–168. URL: https://aclanthology.org/2021.clpsych-1.18. doi:10.18653/v1/2021.clpsych-1.18.
  19. Towards an automatic speech-based diagnostic test for alzheimer’s disease, Frontiers in Computer Science 3 (2021). URL: https://www.frontiersin.org/articles/10.3389/fcomp.2021.624594. doi:10.3389/fcomp.2021.624594.
  20. Influence of ASR and language model on alzheimer’s disease detection, CoRR abs/2110.15704 (2021). URL: https://arxiv.org/abs/2110.15704. arXiv:2110.15704.
  21. Impact of ASR on Alzheimer’s disease detection: All errors are equal, but deletions are more equal than others, in: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), Association for Computational Linguistics, Online, 2020, pp. 159–164. URL: https://aclanthology.org/2020.wnut-1.21. doi:10.18653/v1/2020.wnut-1.21.
  22. The far side of failure: Investigating the impact of speech recognition errors on subsequent dementia classification, arXiv preprint arXiv:2211.07430 (2022).
  23. Alzheimer’s dementia recognition through spontaneous speech: The ADReSS Challenge, in: Proceedings of INTERSPEECH 2020, 2020.
  24. Cohort profile: Wisconsin longitudinal study (wls), International journal of epidemiology 43 (2014) 34–41.
  25. K. E. Forbes-McKay, A. Venneri, Detecting subtle spontaneous language decline in early alzheimer’s disease with a picture description task, Neurological sciences 26 (2005) 243–254.
  26. B. MacWhinney, The CHILDES Project: Tools for Analyzing Talk (third edition): Volume I: Transcription format and programs, Volume II: The database, Computational Linguistics 26 (2000) 657–657. doi:10.1162/coli.2000.26.4.657.
  27. Trestle: Toolkit for reproducible execution of speech, text and language experiments, AMIA Summits on Translational Science Proceedings 2023 (2023) 360.
  28. The natural history of alzheimer’s disease: description of study cohort and accuracy of diagnosis, Archives of neurology 51 (1994) 585–594.
  29. “mini-mental state”: a practical method for grading the cognitive state of patients for the clinician, Journal of psychiatric research 12 (1975) 189–198.
  30. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks, in: Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, Association for Computing Machinery, New York, NY, USA, 2006, p. 369–376. URL: https://doi.org/10.1145/1143844.1143891. doi:10.1145/1143844.1143891.
  31. Attention is all you need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 30, Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  32. Handwritten digit recognition with a back-propagation network, Advances in neural information processing systems 2 (1989).
  33. Representation learning with contrastive predictive coding, arXiv preprint arXiv:1807.03748 (2018).
  34. Supervised contrastive learning, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems, volume 33, Curran Associates, Inc., 2020, pp. 18661–18673. URL: https://proceedings.neurips.cc/paper/2020/file/d89a66c7c80a29b1bdbab0f2a1a94af8-Paper.pdf.
  35. BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. doi:10.18653/v1/N19-1423.
  36. Librispeech: An asr corpus based on public domain audio books, in: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210. doi:10.1109/ICASSP.2015.7178964.
  37. Libri-light: A benchmark for asr with limited or no supervision, in: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7669–7673. https://github.com/facebookresearch/libri-light.
  38. Self-training and pre-training are complementary for speech recognition, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021) 3030–3034.
  39. Self-training for end-to-end speech recognition, in: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7084–7088. doi:10.1109/ICASSP40776.2020.9054295.
  40. End-to-end ASR: from supervised to semi-supervised learning with modern architectures, CoRR abs/1911.08460 (2019). URL: http://arxiv.org/abs/1911.08460. arXiv:1911.08460.
  41. Neural machine translation of rare words with subword units, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1715–1725. URL: https://aclanthology.org/P16-1162. doi:10.18653/v1/P16-1162.
  42. K. Heafield, KenLM: Faster and smaller language model queries, in: Proceedings of the Sixth Workshop on Statistical Machine Translation, Association for Computational Linguistics, Edinburgh, Scotland, 2011, pp. 187–197. URL: https://www.aclweb.org/anthology/W11-2123.
  43. S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 4765–4774. URL: http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
  44. K. M. Yorkston, D. R. Beukelman, An analysis of connected speech samples of aphasic and normal speakers, Journal of speech and hearing disorders 45 (1980) 27–36.
  45. Layer-wise analysis of a self-supervised speech representation model, in: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 914–921. doi:10.1109/ASRU51503.2021.9688093.
  46. An exploration of self-supervised pretrained representations for end-to-end speech recognition, in: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 228–235. doi:10.1109/ASRU51503.2021.9688137.
  47. E. Tjoa, C. Guan, A survey on explainable artificial intelligence (xai): Toward medical xai, IEEE transactions on neural networks and learning systems 32 (2020) 4793–4813.
  48. The false hope of current approaches to explainable artificial intelligence in health care, The Lancet Digital Health 3 (2021) e745–e750.
  49. What do we need to build explainable ai systems for the medical domain?, ArXiv abs/1712.09923 (2017).
  50. Performance on the boston cookie theft picture description task in patients with early dementia of the alzheimer’s type: missing information, Aphasiology 10 (1996) 395–408.
  51. Empty speech in alzheimer’s disease and fluent aphasia, Journal of Speech, Language, and Hearing Research 28 (1985) 405–410.
  52. Language disintegration in dementia: Effects of etiology and severity, Brain and Language 25 (1985) 117–133. URL: https://www.sciencedirect.com/science/article/pii/0093934X85901245. doi:https://doi.org/10.1016/0093-934X(85)90124-5.
  53. Semantic memory is impaired in both dementia with lewy bodies and dementia of alzheimer’s type: a comparative neuropsychological study and literature review, Journal of Neurology, Neurosurgery & Psychiatry 70 (2001) 149–156.
  54. A. Prasad, P. Jyothi, How accents confound: Probing for accent information in end-to-end speech recognition systems, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 3739–3753. URL: https://aclanthology.org/2020.acl-main.345. doi:10.18653/v1/2020.acl-main.345.
  55. G. Kiss, K. Vicsi, Comparison of read and spontaneous speech in case of automatic detection of depression, in: 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), 2017, pp. 000213–000218. doi:10.1109/CogInfoCom.2017.8268245.
Citations (5)

Summary

We haven't generated a summary for this paper yet.