Learning Co-Speech Gesture for Multimodal Aphasia Type Detection (2310.11710v2)
Abstract: Aphasia, a language disorder resulting from brain damage, requires accurate identification of specific aphasia types, such as Broca's and Wernicke's aphasia, for effective treatment. However, little attention has been paid to developing methods to detect different types of aphasia. Recognizing the importance of analyzing co-speech gestures for distinguish aphasia types, we propose a multimodal graph neural network for aphasia type detection using speech and corresponding gesture patterns. By learning the correlation between the speech and gesture modalities for each aphasia type, our model can generate textual representations sensitive to gesture information, leading to accurate aphasia type detection. Extensive experiments demonstrate the superiority of our approach over existing methods, achieving state-of-the-art results (F1 84.2\%). We also show that gesture features outperform acoustic features, highlighting the significance of gesture expression in detecting aphasia types. We provide the codes for reproducibility purposes.
- Melodic intonation therapy for aphasia. Archives of neurology, 29(2):130–131.
- American Psychological Association et al. 2002. Ethical principles of psychologists and code of conduct. American psychologist, 57(12):1060–1073.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460.
- To bert or not to bert: Comparing speech and language-based approaches for alzheimer’s disease detection. Proc. Interspeech 2020, pages 2167–2171.
- Aparna Balagopalan and Jekaterina Novikova. 2021. Comparing Acoustic-Based Approaches for Alzheimer’s Disease Detection. In Proc. Interspeech 2021, pages 3800–3804.
- Abstract meaning representation for sembanking. In Proceedings of the 7th linguistic annotation workshop and interoperability with discourse, pages 178–186.
- Helen Bird and Sue Franklin. 1996. Cinderella revisited: A comparison of fluent and non-fluent aphasic speech. Journal of neurolinguistics, 9(3):187–206.
- Mary Boyle. 2013. Aphasiabank english protocol pwa msu corpus.
- Paul Broca et al. 1861. Remarks on the seat of the faculty of articulated language, following an observation of aphemia (loss of speech). Bulletin de la Société Anatomique, 6:330–57.
- Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42:335–359.
- Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces, pages 205–211.
- Katy Bynek. 2013. Aphasiabank english protocol pwa acwt corpus.
- Gilson Capilouto. 2008. Aphasiabank english controls capilouto corpus.
- Alfonso Caramazza. 1984. The logic of neuropsychological research and the problem of patient classification in aphasia. Brain and language, 21(1):9–20.
- Zero-shot cross-lingual aphasia detection using automatic speech recognition. arXiv preprint arXiv:2204.00448.
- Automatic detection of alzheimer’s disease using spontaneous speech only. In Proceedings of Conference of the International Speech Communication Association (Interspeech 2021).
- Multimodal phased transformer for sentiment analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2447–2458.
- Melinda Corwin. 2013. Aphasiabank english protocol pwa star corpus.
- Identifying indicators of vulnerability from short speech segments using acoustic and textual features. In Interspeech, pages 1569–1573.
- Predicting severity in people with aphasia: A natural language processing and machine learning approach. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 2299–2302. IEEE.
- Does gesture improve the communication success of people with aphasia?: A systematic review. Aphasiology, pages 1–25.
- Michael Dean. 2021. Aphasiabank english protocol pwa ucl corpus.
- Detecting speech impairments from temporal visual facial features of aphasia patients. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 103–108. IEEE.
- Screening tests for aphasia in patients with stroke: a systematic review. Journal of neurology, 264:211–220.
- Roberta J Elman. 2011. Starting an aphasia center? In Seminars in speech and language, volume 32, pages 268–272. © Thieme Medical Publishers.
- Roberta J Elman. 2016. Aphasia centers and the life participation approach to aphasia. Topics in Language Disorders, 36(2):154–167.
- The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE transactions on affective computing, 7(2):190–202.
- Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia, pages 1459–1462.
- Yasmeen Faroqi-Shah and Lisa Milman. 2018. Comparison of animal, action and phonemic fluency in aphasia. International journal of language & communication disorders, 53(2):370–384.
- Aphasiabank: A resource for clinicians. In Seminars in speech and language, volume 33, pages 217–222. Thieme Medical Publishers.
- Automated classification of primary progressive aphasia subtypes from narrative speech transcripts. cortex, 55:43–60.
- Julius Fridriksson. 2013. Aphasiabank english protocol pwa fridriksson corpus.
- Kathryn Garrett. 2013. Aphasiabank english protocol pwa garrett corpus.
- Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1025–1035.
- Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1122–1131.
- Nancy Helm-Estabrooks. 2002. Cognition and aphasia: a discussion and a study. Journal of communication disorders, 35(2):171–186.
- Fabiane Hirsch Kruse. 2013. Aphasiabank english protocol pwa tucson corpus.
- Elizabeth Hoover. 2013. Aphasiabank english protocol pwa bu corpus.
- Judy Illes. 1989. Neurolinguistic features of spontaneous language production dissociate three forms of neurodegenerative disease: Alzheimer’s, huntington’s, and parkinson’s. Brain and language, 37(4):628–642.
- Susan Jackson. 2013. Aphasiabank english protocol pwa kansas corpus.
- Daniel Kempler. 2013. Aphasiabank english protocol pwa kempler corpus.
- Andrew Kertesz. 2007. Western aphasia battery–revised.
- Jacquie Kurland. 2013. Aphasiabank english protocol pwa kurland corpus.
- Lucette Lanyon and Miranda L Rose. 2009. Do the hands have it? the facilitation effects of arm and hand gesture on word retrieval in aphasia. Aphasiology, 23(7-8):809–822.
- Automatic quantitative analysis of spontaneous aphasic speech. Speech Communication, 100:1–12.
- Duc Le and Emily Mower Provost. 2016. Improving automatic recognition of aphasic speech with aphasiabank. In Interspeech, pages 2681–2685.
- Modeling intra- and inter-modal relations: Hierarchical graph contrastive learning for multimodal sentiment analysis. In Proceedings of the 29th International Conference on Computational Linguistics, pages 7124–7135, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Jérôme Louradour. 2023. whisper-timestamped. https://github.com/linto-ai/whisper-timestamped.
- Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172.
- Brian MacWhinney. 2013. Aphasiabank english protocol pwa cmu corpus.
- Aphasiabank: Methods for studying discourse. Aphasiology, 25(11):1286–1307.
- Denise McCall. 2013. Aphasiabank english protocol pwa scale corpus.
- Emotion detection deficits and changes in personality traits linked to loss of white matter integrity in primary progressive aphasia. NeuroImage: Clinical, 16:447–454.
- Maria Muñoz. 2013. Aphasiabank english protocol pwa tcu corpus.
- Sabiha Parveen and Siva Priya Santhanam. 2021. Speech-language pathologists’ perceived competence in working with culturally and linguistically diverse clients in the united states. Communication Disorders Quarterly, 42(3):166–176.
- A review of the application of multi-modal deep learning in medicine: Bibliometrics and future directions. International Journal of Computational Intelligence Systems, 16(1):1–20.
- Multimodal communication in aphasia: Perception and production of co-speech gestures during face-to-face conversation. Frontiers in human neuroscience, 12:200.
- Automatic speech assessment for people with aphasia using tdnn-blstm with multi-task learning. In Interspeech, pages 3418–3422.
- Automatic speech assessment for aphasic patients based on syllable-level embedding and supra-segmental duration features. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5994–5998. IEEE.
- Automatic assessment of speech impairment in cantonese-speaking people with aphasia. IEEE journal of selected topics in signal processing, 14(2):331–345.
- Combining phone posteriorgrams from strong and weak recognizers for automatic speech assessment of people with aphasia. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6420–6424. IEEE.
- Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
- Integrating multimodal information in large pretrained transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2359–2369, Online. Association for Computational Linguistics.
- Amy Ramage. 2013. Aphasiabank english protocol pwa unh corpus.
- General Data Protection Regulation. 2018. General data protection regulation (gdpr). Intersoft Consulting, Accessed in October, 24(1).
- Janet Richardson. 2008. Aphasiabank english controls richardson corpus.
- Alzheimer’s Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs. In Proc. Interspeech 2021, pages 3820–3824.
- Gender differences in aphasia outcomes: Evidence from the aphasiabank. International Journal of Language & Communication Disorders, 54(5):806–813.
- Maura Silverman. 2013. Aphasiabank english protocol pwa tap corpus.
- Brielle Stark. 2022. Aphasiabank english cc stark corpus.
- Brielle Stark. 2023. Aphasiabank english aphasia neural corpus.
- Classification of aphasia: Wab type versus clinical impression. In Clinical Aphasiology: Proceedings of the Conference 1984, pages 48–54. BRK Publishers.
- Automated screening for alzheimer’s dementia through spontaneous speech. Proc. Interspeech 2020, pages 2222–2226.
- Gretchen Szabo. 2013. Aphasiabank english protocol pwa adler corpus.
- Multi-label patent categorization with non-local attention-based graph convolutional network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9024–9031.
- Cynthia Thompson. 2013. Aphasiabank english protocol pwa thompson corpus.
- Improving aphasic speech recognition by using novel semi-supervised learning methods on aphasiabank for english and spanish. Applied Sciences, 11(19):8872.
- Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6558–6569.
- Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315.
- Stroke in south asian countries. Nature reviews neurology, 10(3):135–143.
- Manual and automatic transcriptions in dementia detection from speech. In Interspeech, pages 3117–3121.
- Janet Whiteside. 2013. Aphasiabank english protocol pwa whiteside corpus.
- Darlene Williamson. 2013. Aphasiabank english protocol pwa williamson corpus.
- Stephen Wilson. 2021. Aphasiabank english protocol pwa aprocsa corpus.
- A quick aphasia battery for efficient, reliable, and multidimensional assessment of language function. PloS one, 13(2):e0192773.
- Linda Wozniak. 2013. Aphasiabank english protocol pwa wozniak corpus.
- Heather Wright. 2013. Aphasiabank english protocol pwa wright corpus.
- Mtag: Modal-temporal attention graph for unaligned human multimodal language sequences. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1009–1021.
- Relationship of post-stroke aphasic types with sex, age and stroke types. World Journal of Neuroscience, 5(01):34.
- D-vlog: Multimodal vlog dataset for depression detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12226–12234.
- Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250.
- Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, 31(6):82–88.
- HEGEL: Hypergraph transformer for long document summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10167–10176, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.