Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data (2307.02514v1)
Abstract: Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained LLMs and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data.
- Comprehensive Review on Alzheimer’s Disease: Causes and Treatment. Molecules, 25(24):5789, January 2020. Number: 24 Publisher: Multidisciplinary Digital Publishing Institute.
- Survival in Alzheimer disease: a multiethnic, population-based study of incident cases. Neurology, 71(19):1489–1495, November 2008.
- Alzheimer Disease. In StatPearls. StatPearls Publishing, Treasure Island (FL), 2022.
- Liana G. Apostolova. Alzheimer Disease. Continuum (Minneapolis, Minn.), 22(2 Dementia):419–434, April 2016.
- Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 7(3):280–292, May 2011.
- A Transfer Learning Method for Detecting Alzheimer’s Disease Based on Speech and Natural Language Processing. Frontiers in Public Health, 10:772592, April 2022.
- Speech Processing for Early Alzheimer Disease Diagnosis: Machine Learning Based Approach. In 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), pages 1–8, October 2018. ISSN: 2161-5330.
- Semantic Feature Extraction Using SBERT for Dementia Detection. Brain sciences, 12(2), February 2022. Publisher: Brain Sci.
- Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech. BMC Medical Informatics and Decision Making, 21(1):92, March 2021.
- An automatic Alzheimer’s disease classifier based on spontaneous spoken English. Computer Speech & Language, 72:101298, March 2022.
- Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer’s Diagnosis Based on Spontaneous Speech. Frontiers in Aging Neuroscience, 13, 2021.
- Artificial Intelligence-Enabled End-To-End Detection and Assessment of Alzheimer’s Disease Using Voice. Brain Sciences, 13(1):28, January 2023.
- CLIPPO: Image-and-Language Understanding from Pixels Only, April 2023.
- Graph neural networks: A review of methods and applications. AI Open, 1:57–81, 2020.
- Yoav Goldberg. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57:345–420, 2016.
- Predicting alzheimer’s disease by hierarchical graph convolution from positron emission tomography imaging. In 2019 IEEE international conference on big data (big data), pages 5359–5363. IEEE, 2019.
- Clinicalradiobert: Knowledge-infused few shot learning for clinical notes named entity recognition. In International Workshop on Machine Learning in Medical Imaging. Springer, 2022.
- On the automatic generation of medical imaging reports. arXiv preprint arXiv:1711.08195, 2017.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- The natural history of Alzheimer’s disease. Description of study cohort and accuracy of diagnosis. Archives of Neurology, 51(6):585–594, June 1994.
- Graph Neural Networks for Natural Language Processing: A Survey, June 2021. arXiv:2106.06090 [cs].
- A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 740–750, Doha, Qatar, October 2014. Association for Computational Linguistics.
- Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings. In Advances in Neural Information Processing Systems, volume 33, pages 19314–19326. Curran Associates, Inc., 2020.
- Inductive Representation Learning on Large Graphs, September 2018. arXiv:1706.02216 [cs, stat].
- Gated Graph Sequence Neural Networks, September 2017. arXiv:1511.05493 [cs, stat].
- George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
- That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using# petpeeve tweets. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 2557–2563, 2015.
- Generating Natural Language Adversarial Examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium, 2018. Association for Computational Linguistics.
- Sosuke Kobayashi. Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 452–457, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
- Contextualized Perturbation for Textual Adversarial Attack, March 2021.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- AugGPT: Leveraging ChatGPT for Text Data Augmentation, March 2023. arXiv:2302.13007 [cs].
- WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518, October 2022.
- Deep & Cross Network for Ad Click Predictions, August 2017.
- SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing, May 2022.
- Brian MacWhinney. The childes project: Tools for analyzing talk: Volume i: Transcription format and programs, volume ii: The database, 2000.
- Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126, 2020.
- Hongmin Cai (18 papers)
- Xiaoke Huang (16 papers)
- Zhengliang Liu (91 papers)
- Wenxiong Liao (9 papers)
- Haixing Dai (39 papers)
- Zihao Wu (100 papers)
- Dajiang Zhu (68 papers)
- Hui Ren (37 papers)
- Quanzheng Li (122 papers)
- Tianming Liu (161 papers)
- Xiang Li (1002 papers)