A multi-modal approach for identifying schizophrenia using cross-modal attention (2309.15136v3)
Abstract: This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.
- “Automated speech-and text-based classification of neuropsychiatric conditions in a multidiagnostic setting,” arXiv preprint arXiv:2301.06916, 2023.
- Institute of Health Metrics and Evaluation, “Global health data exchange,” 2021.
- N.C. Andreasen and S. Olsen, “Negative v Positive Schizophrenia:Definition and Validation,” Archives of General Psychiatry, vol. 39, pp. 789–794, 07 1982.
- “Reflections on the nature of measurement in language-based automated assessments of patients’ mental state and cognitive function,” Schizophrenia Research, 2022.
- “Using language processing and speech analysis for the identification of psychosis and other disorders,” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 5, no. 8, pp. 770–779, 2020.
- “Multimodal approach for assessing neuromotor coordination in schizophrenia using convolutional neural networks,” in Proc. of the ICMI, 2021, pp. 768–772.
- “Inverted vocal tract variables and facial action units to quantify neuromotor coordination in schizophrenia,” in 12th ISSP, 2021.
- “Attention is all you need,” 2017.
- “Effective approaches to attention-based neural machine translation,” in Proc. of the Conference on Empirical Methods in NLP, Lisbon, Portugal, Sept. 2015, pp. 1412–1421, Association for Computational Linguistics.
- “Hierarchical attention networks for document classification,” in Proc. of the conf. of the North American chapter of the association for computational linguistics: human language technologies, 2016, pp. 1480–1489.
- “Depressnet: A multimodal hierarchical attention mechanism approach for depression detection,” Int. J. Eng. Sci., vol. 15, no. 1, pp. 24–32, 2022.
- “Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables,” in Proc. Interspeech, 2019, pp. 1448–1452.
- “Blinded clinical ratings of social media data are correlated with in-person clinical ratings in participants diagnosed with either depression, schizophrenia, or healthy controls,” Psychiatry Research, vol. 294, pp. 113496, 2020.
- Y.M. Siriwardena and C. Espy-Wilson, “The secret source: Incorporating source features to improve acoustic-to-articulatory speech inversion,” in IEEE ICASSP, 2023, pp. 1–5.
- “Use of temporal information: detection of periodicity, aperiodicity, and pitch in speech,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 776–786, 2005.
- “Extended study on the use of vocal tract variables to quantify neuromotor coordination in depression.,” in Interspeech, 2020, pp. 4551–4555.
- “Avec workshop and challenge: State-of-mind, detecting depression with ai, and cross-cultural affect recognition,” in Proc. of the 9th International on Audio/Visual Emotion Challenge and Workshop. 2019, p. 3–12, Association for Computing Machinery.
- “The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016.
- “Snore Sound Classification Using Image-Based Deep Spectrum Features,” in Proc. Interspeech, 2017, pp. 3512–3516.
- “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proc. of the 34th International Conference on Neural Information Processing Systems, 2020.
- “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” CoRR, vol. abs/2106.07447, 2021.
- “Openface 2.0: Facial behavior analysis toolkit,” in 13th IEEE international conf. on automatic face & gesture recognition. IEEE, 2018, pp. 59–66.
- “Facial action coding system,” 2015.
- “Exploiting vocal tract coordination using dilated cnns for depression detection in naturalistic environments,” in ICASSP. IEEE, 2020, pp. 6549–6553.
- “Glove: Global vectors for word representation,” in Proc. of the conference on EMNLP, 2014, pp. 1532–1543.
- “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
- S. Syed and M. Spruit, “Full-text or abstract? examining topic coherence scores using latent dirichlet allocation,” in IEEE International conf. on data science and advanced analytics (DSAA). IEEE, 2017, pp. 165–174.
- “Automated analysis of free speech predicts psychosis onset in high-risk youths,” npj Schizophrenia, vol. 1, no. 1, pp. 1–7, 2015.
- Gowtham Premananth (6 papers)
- Yashish M. Siriwardena (12 papers)
- Philip Resnik (20 papers)
- Carol Espy-Wilson (34 papers)