A multi-modal approach for identifying schizophrenia using cross-modal attention (2309.15136v3)

Published 26 Sep 2023 in eess.SP, cs.MM, cs.SD, eess.AS, and eess.IV

Abstract: This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.

References (28)

Authors (4)

Gowtham Premananth (6 papers)
Yashish M. Siriwardena (12 papers)
Philip Resnik (20 papers)
Carol Espy-Wilson (34 papers)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/ArxivSound/status/1782259062500888791

https://twitter.com/AudioAndSpeech/status/1782301613689864346

https://twitter.com/ArxivSound/status/1754732861012287774

A multi-modal approach for identifying schizophrenia using cross-modal attention (2309.15136v3)

Summary

Related Papers

Tweets