Do sentence captions recruit the same neural populations as viewing natural scenes?

Determine whether reading the sentence captions corresponding to the Natural Scenes Dataset images recruits the same neural populations that are engaged during visual scene viewing of those images, specifically the cortical regions identified via searchlight representational similarity analysis as encoding visually evoked responses during natural scene viewing.

Background

The study shows that similarity judgements for images and their corresponding sentence captions converge behaviorally and predict similar patterns of visually evoked fMRI activity in mid- and high-level visual cortex. However, the fMRI data analyzed involve only visual scene viewing; no fMRI was collected during sentence reading by the same participants for the same stimuli. Thus, while linguistic similarity judgments align with brain activity elicited by viewing images, it remains unresolved whether reading the captions elicits activity in the same cortical populations identified during visual scene viewing.

The authors note related work suggesting cross-modal decoding between vision and language and partial convergence of representations, but emphasize that directly testing sentence-reading fMRI with these specific stimuli and analyses is necessary to address whether the same neural populations are recruited.

References

This leaves open the question of whether reading the captions would recruit the same neural populations that we identified in fMRI evoked by viewing the natural scene images.

— Representations in vision and language converge in a shared, multidimensional space of perceived similarities (2507.21871 - Simkova et al., 29 Jul 2025) in Discussion (paragraph beginning “In this study, we do not test how behavioural representational geometries align with neural activity during sentence reading.”)

Do sentence captions recruit the same neural populations as viewing natural scenes?

Background

References

Related Problems