Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A multi-modal approach for identifying schizophrenia using cross-modal attention (2309.15136v3)

Published 26 Sep 2023 in eess.SP, cs.MM, cs.SD, eess.AS, and eess.IV

Abstract: This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Automated speech-and text-based classification of neuropsychiatric conditions in a multidiagnostic setting,” arXiv preprint arXiv:2301.06916, 2023.
  2. Institute of Health Metrics and Evaluation, “Global health data exchange,” 2021.
  3. N.C. Andreasen and S. Olsen, “Negative v Positive Schizophrenia:Definition and Validation,” Archives of General Psychiatry, vol. 39, pp. 789–794, 07 1982.
  4. “Reflections on the nature of measurement in language-based automated assessments of patients’ mental state and cognitive function,” Schizophrenia Research, 2022.
  5. “Using language processing and speech analysis for the identification of psychosis and other disorders,” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 5, no. 8, pp. 770–779, 2020.
  6. “Multimodal approach for assessing neuromotor coordination in schizophrenia using convolutional neural networks,” in Proc. of the ICMI, 2021, pp. 768–772.
  7. “Inverted vocal tract variables and facial action units to quantify neuromotor coordination in schizophrenia,” in 12th ISSP, 2021.
  8. “Attention is all you need,” 2017.
  9. “Effective approaches to attention-based neural machine translation,” in Proc. of the Conference on Empirical Methods in NLP, Lisbon, Portugal, Sept. 2015, pp. 1412–1421, Association for Computational Linguistics.
  10. “Hierarchical attention networks for document classification,” in Proc. of the conf. of the North American chapter of the association for computational linguistics: human language technologies, 2016, pp. 1480–1489.
  11. “Depressnet: A multimodal hierarchical attention mechanism approach for depression detection,” Int. J. Eng. Sci., vol. 15, no. 1, pp. 24–32, 2022.
  12. “Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables,” in Proc. Interspeech, 2019, pp. 1448–1452.
  13. “Blinded clinical ratings of social media data are correlated with in-person clinical ratings in participants diagnosed with either depression, schizophrenia, or healthy controls,” Psychiatry Research, vol. 294, pp. 113496, 2020.
  14. Y.M. Siriwardena and C. Espy-Wilson, “The secret source: Incorporating source features to improve acoustic-to-articulatory speech inversion,” in IEEE ICASSP, 2023, pp. 1–5.
  15. “Use of temporal information: detection of periodicity, aperiodicity, and pitch in speech,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 776–786, 2005.
  16. “Extended study on the use of vocal tract variables to quantify neuromotor coordination in depression.,” in Interspeech, 2020, pp. 4551–4555.
  17. “Avec workshop and challenge: State-of-mind, detecting depression with ai, and cross-cultural affect recognition,” in Proc. of the 9th International on Audio/Visual Emotion Challenge and Workshop. 2019, p. 3–12, Association for Computing Machinery.
  18. “The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016.
  19. “Snore Sound Classification Using Image-Based Deep Spectrum Features,” in Proc. Interspeech, 2017, pp. 3512–3516.
  20. “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proc. of the 34th International Conference on Neural Information Processing Systems, 2020.
  21. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” CoRR, vol. abs/2106.07447, 2021.
  22. “Openface 2.0: Facial behavior analysis toolkit,” in 13th IEEE international conf. on automatic face & gesture recognition. IEEE, 2018, pp. 59–66.
  23. “Facial action coding system,” 2015.
  24. “Exploiting vocal tract coordination using dilated cnns for depression detection in naturalistic environments,” in ICASSP. IEEE, 2020, pp. 6549–6553.
  25. “Glove: Global vectors for word representation,” in Proc. of the conference on EMNLP, 2014, pp. 1532–1543.
  26. “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
  27. S. Syed and M. Spruit, “Full-text or abstract? examining topic coherence scores using latent dirichlet allocation,” in IEEE International conf. on data science and advanced analytics (DSAA). IEEE, 2017, pp. 165–174.
  28. “Automated analysis of free speech predicts psychosis onset in high-risk youths,” npj Schizophrenia, vol. 1, no. 1, pp. 1–7, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Gowtham Premananth (6 papers)
  2. Yashish M. Siriwardena (12 papers)
  3. Philip Resnik (20 papers)
  4. Carol Espy-Wilson (34 papers)
Citations (3)