Modality-Agnostic fMRI Decoding of Vision and Language (2403.11771v1)

Published 18 Mar 2024 in cs.CV and cs.CL

Abstract: Previous studies have shown that it is possible to map brain activation data of subjects viewing images onto the feature representation space of not only vision models (modality-specific decoding) but also LLMs (cross-modal decoding). In this work, we introduce and use a new large-scale fMRI dataset (~8,500 trials per subject) of people watching both images and text descriptions of such images. This novel dataset enables the development of modality-agnostic decoders: a single decoder that can predict which stimulus a subject is seeing, irrespective of the modality (image or text) in which the stimulus is presented. We train and evaluate such decoders to map brain signals onto stimulus representations from a large range of publicly available vision, language and multimodal (vision+language) models. Our findings reveal that (1) modality-agnostic decoders perform as well as (and sometimes even better than) modality-specific decoders (2) modality-agnostic decoders mapping brain data onto representations from unimodal models perform as well as decoders relying on multimodal representations (3) while language and low-level visual (occipital) brain regions are best at decoding text and image stimuli, respectively, high-level visual (temporal) regions perform well on both stimulus types.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (55)

Authors (5)

Mitja Nikolaus (7 papers)
Milad Mozafari (10 papers)
Nicholas Asher (26 papers)
Leila Reddy (5 papers)
Rufin VanRullen (32 papers)

Citations (1)

View on Semantic Scholar

Tweets

https://twitter.com/BrainUploading/status/1798547363004584370

Modality-Agnostic fMRI Decoding of Vision and Language (2403.11771v1)

Related Papers

Tweets