Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex (2505.15813v1)

Published 21 May 2025 in cs.LG and q-bio.NC

Abstract: Understanding functional representations within higher visual cortex is a fundamental question in computational neuroscience. While artificial neural networks pretrained on large-scale datasets exhibit striking representational alignment with human neural responses, learning image-computable models of visual cortex relies on individual-level, large-scale fMRI datasets. The necessity for expensive, time-intensive, and often impractical data acquisition limits the generalizability of encoders to new subjects and stimuli. BraInCoRL uses in-context learning to predict voxelwise neural responses from few-shot examples without any additional finetuning for novel subjects and stimuli. We leverage a transformer architecture that can flexibly condition on a variable number of in-context image stimuli, learning an inductive bias over multiple subjects. During training, we explicitly optimize the model for in-context learning. By jointly conditioning on image features and voxel activations, our model learns to directly generate better performing voxelwise models of higher visual cortex. We demonstrate that BraInCoRL consistently outperforms existing voxelwise encoder designs in a low-data regime when evaluated on entirely novel images, while also exhibiting strong test-time scaling behavior. The model also generalizes to an entirely new visual fMRI dataset, which uses different subjects and fMRI data acquisition parameters. Further, BraInCoRL facilitates better interpretability of neural signals in higher visual cortex by attending to semantically relevant stimuli. Finally, we show that our framework enables interpretable mappings from natural language queries to voxel selectivity.

Summary

Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex: A Formal Overview

The paper "Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex" presents a novel framework, BraInCLAM, which utilizes a meta-learning approach combined with in-context learning to predict voxelwise neural responses in higher visual cortex from limited fMRI data. This paper is groundbreaking in its approach to overcome the challenges of individual differences in cortical organization, which are typically seen in the fine-grained semantic selectivity within visual cortex. The authors employ a transformer model architecture, capitalizing on the inherent flexibility of transformers to handle variable input and effectively generalize across subjects and stimuli even when data is sparse.

Methodology and Results

The model BraInCLAM is designed specifically to predict neural responses without requiring finetuning on new subjects or stimuli, leveraging few-shot learning strategies. Throughout the training, the model is optimized to excel in in-context learning, essentially mimicking human adaptability to new tasks by learning from a minimal set of examples. The model demonstrated superior performance compared to existing voxelwise encoder frameworks, especially in low-data regimes, highlighting its capacity for efficient data utilization.

Outperformance in Low-Data Regimes: BraInCLAM exhibited improved prediction accuracy over alternative voxelwise models when evaluated on novel visual stimuli, suggesting superior generalization capabilities with limited data. The test-time scaling behavior was noted to be robust, with model performance improving in tandem with increased in-context support set size. This behavior indicates the model's efficiency and applicability in real-world scenarios where extensive data collection may be unfeasible.
Cross-Dataset Generalization: The BraInCLAM model not only displayed effective generalization on a new fMRI dataset collected with different subjects and apparatus but also showed reliable performance across datasets highlighting its robustness against variations in data acquisition settings.
Interpretability and Semantic Relevance: The model's ability to attend to semantically relevant stimuli introduces an interpretability aspect to understanding neural signals, offering insights into the function of higher visual cortex regions concerning specific categories of visual stimuli.
Natural Language Query Mapping: An intriguing aspect of the framework is its provision for zero-shot mapping from natural language queries to voxel selectivity. This allows for a more granular and interpretable functional mapping of the visual cortex, potentially informing further research into linguistic processing and neural representation frameworks.

Implications and Future Directions

The BraInCLAM framework offers several critical implications and potential future developments:

Scalability and Practical Applications: Its ability to adapt dynamically to new subjects and stimuli without extensive data collection could significantly reduce the cost and time associated with fMRI investigations, particularly in clinical settings. This scalability implies broader applicability across populations and better personalized models of brain function.
Theoretical Insights into Neural Processing: By intertwining meta-learning principles with in-context learning paradigms, BraInCLAM provides a tool to explore shared functional motifs across human subjects and further investigate the hierarchical organization of the human visual cortex.
Advancements in AI-Neuroscience Interactions: The successful application of transformer models in this domain may herald new methods of studying human neural processes using advanced AI models, potentially bridging the gaps between artificial intelligence and biological process modeling.
Further Research Directions: The paper opens avenues for exploring BraInCLAM's application in other sensory modalities or cognitive domains, especially considering the adaptability of transformer-based architectures to diverse task forms. It also encourages exploration of extended frameworks that could accommodate more dynamic stimuli such as video sequences.

To conclude, the paper “Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex” delineates a compelling advancement in the field of computational neuroscience, applying contemporary AI models to deeply embedded neurological questions. Its findings have the potential to enhance both theoretical understanding and practical approaches to studying the human brain's intricate visual processing mechanisms.

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Find Related Papers

Authors (9)

Tweets

https://twitter.com/BioPapers/status/1925497606165111016