Foundation Models for Neuroimaging
- Foundation models for neuroimaging are large-scale, pre-trained architectures that extract universal features from diverse brain imaging modalities using self-supervised and contrastive learning techniques.
- They utilize advanced methods like masked autoencoding, graph representations, and transformers to enable rapid adaptation to tasks such as diagnosis, segmentation, and anomaly detection.
- These models improve clinical and research outcomes by offering high diagnostic accuracy, interpretable anatomical mappings, and efficient transfer learning with minimal fine-tuning.
Foundation models for neuroimaging are a new class of large-scale, pre-trained deep learning models designed to extract universal, generalizable representations from diverse brain imaging modalities and tasks. These models capitalize on massive, heterogeneous datasets, often using self-supervision, multimodal fusion, and architectural advances to support a wide range of neuroimaging applications—including diagnosis, segmentation, anomaly detection, age prediction, and neuroscientific modeling—across clinical and research settings.
1. Concept and Paradigm of Foundation Models in Neuroimaging
Foundation models (FMs) in neuroimaging leverage principles similar to those that have driven advances in natural language processing and computer vision. These models are typically pre-trained on extensive datasets—often spanning thousands to hundreds of thousands of subjects and covering multiple imaging modalities (e.g., MRI, fMRI, EEG, PET). Their main goals are to produce universal embeddings capable of rapid adaptation to downstream neuroimaging tasks, to support transfer and zero/few-shot learning scenarios, and to serve as computational tools for simulating or interpreting human brain function.
Key characteristics include:
- Large-scale pre-training: Models are trained on heterogeneously sourced imaging data, including varied pathologies, acquisition protocols, and subject demographics (Ghamizi et al., 16 Jun 2025, Lyu et al., 23 Sep 2025).
- Self-supervision and contrastive/masked learning: Typical training objectives include contrastive losses (e.g., SimCLR, NT-Xent), masked autoencoding, and reconstruction-based learning (Kaczmarek et al., 12 Sep 2025, Cox et al., 14 Jun 2024, Wang et al., 9 Aug 2025).
- Universal backbone architectures: Vision transformers, convolutional neural networks, graph neural networks, and Mixture-of-Experts transformers tailored for 2D, 3D, or even 4D neuroimaging data (Wang et al., 11 Jun 2025, Wei et al., 31 May 2025, Chen et al., 29 Sep 2025, Dong et al., 29 Sep 2025).
- Multimodal and task-agnostic design: Some models are explicitly designed to fuse information from multiple imaging types (structural, functional, EEG) and can support multiple clinical or neuroscientific tasks in a single model (Dong et al., 29 Sep 2025, Liu et al., 30 Aug 2025).
2. Core Methodologies and Architectural Innovations
The methods underpinning neuroimaging foundation models are driven by several technical and architectural innovations:
- Self-supervised and Masked Learning: Masked autoencoders (MAE), local masked reconstruction, and masked prediction of signals or patches are common, e.g., in 3D MAE for MRI (Cox et al., 14 Jun 2024, Wang et al., 9 Aug 2025), and dual-domain masked reconstruction in EEG (Chen et al., 29 Sep 2025).
- Contrastive Learning: Contrastive objectives, such as those in SimCLR and InfoNCE, align augmented views, modalities, or subject presentations in a shared latent space (Kaczmarek et al., 12 Sep 2025, Lu et al., 2022).
- Graph-based Representations: Some models process data as graphs (e.g., nodes = brain regions, edges = functional or structural connectivity) and use graph contrastive learning or masked graph autoencoding (Wei et al., 31 May 2025).
- Mixture-of-Experts (MoE) and Prompt Tuning: To improve scaling and task adaptation, models such as Uni-NTFM use MoE transformers, while others employ parameter-efficient prompt-based adaptation strategies for rapid transfer and generalization (Chen et al., 29 Sep 2025, Wei et al., 31 May 2025, Wang et al., 11 Jun 2025).
- Multimodal Fusion: Models such as Brain Harmony unify MRI and fMRI into compact 1D token spaces, using modality-specialized encoders and joint cross-modal transformers (Dong et al., 29 Sep 2025).
- Distributionally Robust Optimization: Designed for clinical robustness, these strategies (e.g., Group-DRO) mitigate institutional or class imbalance, improving invariance to real-world data heterogeneity (Bhattacharya et al., 18 Sep 2025).
- Hierarchical and Modular Designs: Architectures such as Prima use hierarchical transformers to separately process imaging sequences and then aggregate across a patient’s paper, enabling both local and global reasoning (Lyu et al., 23 Sep 2025).
3. Applications in Diagnosis, Prediction, and Neuroscience
Foundation models for neuroimaging have demonstrated applicability in a wide spectrum of scenarios:
Application | Representative Models | Technical Highlights |
---|---|---|
Brain age and cognition | NeuroVNN, AnatCL | Covariance NNs, anatomical contrastive learning |
Segmentation and anatomy | BrainSegFounder, BrainFM | 3D transformers, multi-task U-Nets, mix-up |
Anomaly detection | Unsupervised generative models | VAEs, GANs, diffusion for pseudo-healthy recon |
Graph-based connectivity | BrainGFM | Graph contrastive learning, mask autoencoding |
Clinical diagnosis | Prima, NeuroRAD-FM | Hierarchical vision transformers, DRO, Grad-CAM |
Multimodal integration | Brain Harmony | Fusion of sMRI/fMRI, geometric harmonics |
EEG decoding | Uni-NTFM, EEG-FMs | Decoupled time/freq representations, MoE |
Examples:
- Brain age and cognition: NeuroVNN uses covariance-based graph convolutions and scale-free design for transfer across parcellation schemes, offering strong anatomical interpretability (Sihag et al., 12 Feb 2024); AnatCL supplements contrastive learning with regional metrics, improving diagnosis of neurodegeneration and prediction of clinical scores (Barbano et al., 7 Aug 2024).
- Multimodal fusion: Brain Harmony compresses both sMRI and fMRI into unified 1D tokens, successfully integrating structure and function, with pre-alignment using geometric harmonics and adaptive patching for fMRI with heterogeneous TRs (Dong et al., 29 Sep 2025).
- Clinical functionality: Prima processes full real-world MRI studies and radiology reports, achieving a mean diagnostic AUROC of 92% and providing explainable, fair, and generalizable reasoning across health system–scale patient populations (Lyu et al., 23 Sep 2025).
- Unsupervised anomaly detection: VAEs, GANs, and diffusion models trained on healthy brains detect pathologies as deviations from learned representations, offering interpretable pseudo-healthy reconstructions that are especially valuable when annotated data are scarce (Mahé et al., 16 Oct 2025).
4. Generalization, Adaptation, and Evaluation
Foundation models are engineered for adaptability and transfer across modalities, tasks, and clinical settings:
- Transfer learning and fine-tuning: Pretrained foundation models are used as backbones and are either fine-tuned fully or via parameter-efficient adaptations such as LoRA or prompt tuning (up to 3% of total parameters), enabling sample-efficient downstream learning (Ghamizi et al., 16 Jun 2025, Wang et al., 11 Jun 2025).
- Meta-learning and prompt strategies: Graph and language prompts (adapted via meta-learning) allow models such as BrainGFM to support zero/few-shot generalization to new disorders, parcellations, or task settings (Wei et al., 31 May 2025).
- Domain robustness and bias: Approaches such as DRO (Bhattacharya et al., 18 Sep 2025) and equalized odds frameworks (Lyu et al., 23 Sep 2025) explicitly address site and demographic heterogeneity, targeting equitable performance as measured by metrics like TPR disparity.
- Benchmarking: To date, most evaluation relies on classic machine learning metrics (Dice, AUROC, AUC, MAE). A noted gap is the consistent use of clinically-relevant and standardized benchmarks with human expert validation, particularly for assessing interpretability and impact on diagnostic workflow (Ghamizi et al., 16 Jun 2025, Mahé et al., 16 Oct 2025).
5. Interpretability, Biological Plausibility, and Anatomical Mapping
Interpretability and alignment with neuroanatomical or neuroscientific priors are prioritized:
- Built-in interpretability: Models such as NeuroVNN and AnatCL are designed to maintain a bijective mapping from learned representations to anatomical regions, enabling projection of model outputs or feature attributions onto cortical or subcortical maps (Sihag et al., 12 Feb 2024, Barbano et al., 7 Aug 2024).
- Attribution methods: Grad-CAM is used to visualize focus on tumor and peri-tumoral regions, validating clinical relevance (Bhattacharya et al., 18 Sep 2025). Other models use LIME or intrinsic hub tokens to align model decisions with meaningful neurobiological substrates (Lyu et al., 23 Sep 2025, Dong et al., 29 Sep 2025).
- Pseudo-healthy reconstructions: In unsupervised generative models, counterfactual reconstructions provide interpretability, as differences between the generated healthy image and the input highlight anomalies in a manner consistent with radiological reasoning (Mahé et al., 16 Oct 2025).
6. Limitations, Open Challenges, and Prospects
Despite their promise, foundation models for neuroimaging are in an early phase of development:
- Data diversity and representativeness: Imbalances remain in modality coverage (e.g., limited PET, less emphasis on mental health and rare pathologies) and some datasets may suffer from duplication or non-uniform labeling (Ghamizi et al., 16 Jun 2025).
- Evaluation bottlenecks: Lack of standardized, clinically meaningful evaluation protocols hinders assessment of real-world utility and comparability.
- Scaling and architecture: Computational costs for scaling to billions of parameters (noted in EEG foundation models) are substantial, raising barriers to entry (Chen et al., 29 Sep 2025).
- Interpretability and trust: The black-box nature of deep neural models remains a cross-cutting limitation. The trend is toward incorporating more biological priors, region-specific mapping, or explanations via visualizations.
- Ethics, bias, and privacy: Further work is required in federated learning, de-biasing strategies, and audits of model outcomes to support safe deployment, especially across diverse populations and health systems.
7. Future Directions and Integration
Several research avenues are highlighted for advancing the field:
- Multimodal, cross-task, and multi-center integration: Further fusion of data across imaging types, clinical texts, and genomic or behavioral data is expected (Wei et al., 31 May 2025, Dong et al., 29 Sep 2025).
- Anatomy-aware and domain-adaptive modeling: Incorporation of symmetry, connectome constraints, atlas guidance, and design of domain-agnostic models to boost transferability (Mahé et al., 16 Oct 2025, Liu et al., 30 Aug 2025).
- Efficient and scalable parameter adaptation: Exploration of adapters, prompt-generation, and hybrid architectures for low-resource settings or rapid domain shifts (Ghamizi et al., 16 Jun 2025, Lyu et al., 23 Sep 2025).
- Clinical translation and digital twins: Application to digital twin brain models, improved triage or referral workflows, longitudinal monitoring, and integration with electronic health systems (Lyu et al., 23 Sep 2025, Dong et al., 29 Sep 2025).
- Evaluation and validation: Prospective, preregistered reader studies, clinical deployment pilots, and benchmarking against expert decision-making remain essential (Bhattacharya et al., 18 Sep 2025, Mahé et al., 16 Oct 2025).
In summary, foundation models for neuroimaging constitute a rapidly evolving paradigm wherein large pre-trained architectures, often with self-supervision and multimodal integration, learn representations that generalize across imaging modalities, pathologies, and tasks. These models drive gains in accuracy, robustness, and interpretability, while also uncovering new avenues to link advanced AI systems with neuroscientific understanding and clinical application (Ghamizi et al., 16 Jun 2025, Zhou et al., 1 Mar 2025, Dong et al., 29 Sep 2025, Cox et al., 14 Jun 2024, Lu et al., 2022, Lyu et al., 23 Sep 2025, Mahé et al., 16 Oct 2025).