Electroencephalography Foundation Models

Updated 6 March 2026

EEG-FMs are large, self-supervised neural architectures that learn transferable representations from massive unlabeled EEG data, enabling robust neural decoding and multimodal analysis.
They integrate techniques from NLP and computer vision using CNN–Transformer hybrids to address cross-subject variability and improve tasks such as motor imagery and seizure prediction.
Advanced SSL strategies, including contrastive and masked modeling with hybrid objectives, combined with architectural innovations drive enhanced performance and broad applicability.

Electroencephalography Foundation Models (EEG-FMs) represent a class of large, self-supervised neural architectures trained on extensive unlabeled EEG corpora to learn generic, transferable representations of brain activity. By leveraging modern deep learning paradigms—especially techniques adapted from NLP and computer vision—EEG-FMs have rapidly shifted the landscape of neural decoding, brain-computer interface (BCI) design, and neuroscientific analysis. These models are built to generalize across subjects, devices, paradigms, and clinical tasks, aiming to overcome both the scarcity of labeled data and the massive variability intrinsic to EEG signals (Xiong et al., 25 Aug 2025, Shen et al., 12 Feb 2026, Liu et al., 25 Jan 2026, Li et al., 21 Aug 2025).

1. Taxonomy and Modalities of EEG-FMs

Recent surveys categorize EEG-FMs by their output modalities, reflecting application diversity and representational goals (Li et al., 21 Aug 2025):

Native EEG Decoding: Models that ingest raw multi-channel EEG to classify or regress brain states (motor imagery, sleep stage, seizure prediction). These typically employ CNN or Transformer front-ends with self-supervised objectives, such as contrastive learning or masked signal modeling, to learn subject-invariant features.
EEG-Text Alignment: Architectures projecting EEG into a semantic space aligned with natural language via dual encoders and contrastive objectives, enabling retrieval or generative text description of neural content.
EEG-Vision Alignment: Systems mapping EEG to image embedding spaces (e.g., CLIP), supporting zero-shot image retrieval and, with generative backbones, EEG-driven image synthesis.
EEG-Audio Mapping: Pipelines integrating self-supervised speech models (wav2vec, HuBERT) to decode EEG into spectrograms, phonemes, or condition neuro-music generation.
Multimodal Fusion: Frameworks learning joint representations across three or more modalities (e.g., EEG + text + vision), utilizing cross-attention or hypergraph adapters.

This functional taxonomy underpins both domain-specific designs and the development of unified architectures with broad scientific and technological relevance.

2. Core Pre-training Objectives and Self-supervised Strategies

EEG-FMs rely on self-supervised learning (SSL) to exploit massive unlabeled datasets (Shen et al., 12 Feb 2026, Li et al., 21 Aug 2025, Liu et al., 25 Jan 2026). Three central SSL paradigms dominate:

Contrastive SSL: Models contrast positive pairs (augmented views or cross-modal pairs) against negatives:

$\mathcal{L}_{\text{con}} = -\sum_{i} \log \frac{\exp(\mathrm{sim}(z_i, z^+_i) / \tau)}{\sum_{j} \exp(\mathrm{sim}(z_i, z_j) / \tau)}$

Masked Modeling (MAE, MIM): Random patches, channels, or tokens are masked; the network reconstructs missing segments in waveform, spectral, or token space (e.g., [EEGFormer], [LaBraM]). Variants include masked spectrogram and masked codebook prediction.
Generative and Hybrid SSL: Autoregressive modeling (Neuro-GPT), joint masked reconstruction plus contrastive loss (CBraMod), or dual-domain (time + frequency) masked objectives (Uni-NTFM (Chen et al., 29 Sep 2025)).

The choice of SSL strategy and its physiological alignment (e.g., frequency-band masking) directly impacts downstream transferability and interpretability (Shen et al., 12 Feb 2026, Chen et al., 7 Aug 2025).

3. Architectural Innovations and Inductive Biases

EEG-FMs exhibit several architectural motifs and adaptations to the unique properties of electrophysiological signals:

CNN–Transformer Hybrids: Early spatio-temporal convolutions (EEGNet, ShallowConvNet) capture local oscillations, while Transformers model global and cross-channel dependencies. The Large Cognition Model (LCM) further splits attention along temporal and spectral axes (Chen et al., 11 Feb 2025).
Graph-enhanced Modules: Models like GEFM integrate Graph Neural Networks to encode inter-electrode spatial connectivity, using geodesic distances on the scalp or learnable graphs (Wang et al., 2024).
Coordinate-based Embeddings: HEAR projects heterogeneous 3D electrode layouts into a common latent space, supporting arbitrary montages and scaling up to 1,132 channels (Chen et al., 14 Oct 2025).
Manifold and Tokenized Representations: MENDR encodes multi-band EEG into symmetric positive definite matrices for geometric interpretability (Chen et al., 7 Aug 2025), while others (CodeBrain) employ decoupled time-frequency tokenization to enhance representational granularity (Ma et al., 10 Jun 2025).
Mixture-of-Experts (MoE), Adapter, and Attention Mechanisms: Uni-NTFM and NeurIPT scale network capacity via MoE Transformers specialized for functionally distinct patterns, and SCOPE fine-tunes pretrained encoders with prototype-guided adapters under label scarcity (Chen et al., 29 Sep 2025, Fang et al., 18 Oct 2025, Ma et al., 19 Feb 2026).

These building blocks serve not only to encode EEG’s multi-scale spatiotemporal structure, but also to facilitate real-world robustness to device, montage, and task variation.

4. Standard Benchmarks, Evaluation Protocols, and Performance

Unified benchmarking efforts (EEG-FM-Bench (Xiong et al., 25 Aug 2025), Brain4FMs (Shen et al., 12 Feb 2026)) and systematic reviews (Liu et al., 25 Jan 2026, Kuruppu et al., 15 Jul 2025) now allow standardized comparisons:

Preprocessing: Channel selection, bandpass filtering, artifact removal, z-scoring or re-referencing, epoching to fixed-length windows (e.g., 4–30 s @ 64–512 Hz).
Tasks Covered: Motor imagery, sleep staging, mental workload, emotion, seizure/abnormality/event detection, age/diagnosis regression.
Evaluation Regimes:
- Cross-subject generalization (leave-N-out, LOSO).
- Within-subject/few-shot calibration.
- Zero-shot and linear probing (frozen backbone).
- Full fine-tuning.
Metrics: Balanced accuracy, macro/weighted F₁, Cohen's κ, AUROC, AUC-PR, RMSE (regression).

State-of-the-art EEG-FMs (e.g., LaBraM, EEGPT, CodeBrain, BrainWave, Uni-NTFM, NeurIPT, BioSerenity-E1, FoME, MENDR, HEAR) routinely surpass or match specialist models on large benchmarks in balanced accuracy and F1, with generative/MAE objectives demonstrating a performance edge over contrastive SSL for classification tasks (Xiong et al., 25 Aug 2025, Shen et al., 12 Feb 2026, Chen et al., 11 Feb 2025).

5. Cross-domain and Multimodal Extensions

EEG-FMs now routinely extend beyond unimodal decoding:

EEG-Text and EEG-Vision Alignment: Dual encoders and contrastive objectives enable mapping EEG to semantic or image spaces, as in WaveMind (EEG + CLIP-BERT/ViT + LLM) (Zeng et al., 26 Sep 2025). Retrieval-augmented generation and conversational interpretation are increasingly feasible.
Auditory and Multimodal Fusion: Models integrate EEG with audio (wav2vec alignment), behavioral (eye-tracking), and other physiological modalities.
Synthetic Data and Inverse Modeling: Initial steps have been taken toward X→EEG synthesis for data augmentation and digital twin simulation (Li et al., 21 Aug 2025).
Instruction Tuning and LLMs: LLM conditioning with EEG enables open-ended, explainable brain–AI interfaces (WaveMind, NeuroLM, UniMind).

This cross-domain generalization is accelerated by scalable architectures and the unification of representational spaces, but often remains limited by benchmarking heterogeneity and a lack of robust physiological validation protocols.

6. Challenges, Limitations, and Best Practices

Despite rapid advances, EEG-FMs face several persistent obstacles (Li et al., 21 Aug 2025, Shen et al., 12 Feb 2026, Liu et al., 25 Jan 2026):

Generalization and Robustness: Models can overfit individual-level or session-level idiosyncrasies; cross-subject and cross-device transfer remains imperfect. Label-efficient prototypes and adapters (SCOPE) and domain-agnostic encodings are active research directions (Ma et al., 19 Feb 2026).
Interpretability and Validation: Strong generative priors in LLMs/diffusion models risk plausible but unfounded outputs. Few models incorporate principled explainability or physiologically grounded saliency.
Benchmarking and Data Scarcity: Absence of universally adopted benchmarks across paradigms inhibits fair comparison and scaling-law discovery. Open-source resources (EEG-FM-Bench, Brain4FMs, WaveMind-Instruct) partially address this gap (Xiong et al., 25 Aug 2025, Shen et al., 12 Feb 2026, Zeng et al., 26 Sep 2025).
Scaling Laws: Larger models do not guarantee superior generalization. Performance saturates with current data scale and objectives, unlike observed in NLP/vision (Liu et al., 25 Jan 2026).
Device Heterogeneity: Spatial encoding strategies (coordinate-based embeddings, channel unification, topological representations) increasingly mitigate hardware variability (Chen et al., 14 Oct 2025, Chen et al., 29 Sep 2025, Fang et al., 18 Oct 2025).
Few-shot and Zero-shot Performance: High-performing models still require substantial target data for full generalization, with linear probing described as insufficient in many contexts.

Best practices identified include favoring generative/MAE pretraining for classification tasks, integrating multi-modality corpora, leveraging explicit spatial inductive biases, masking in the frequency domain to capture oscillatory biomarkers, and cautious use of tokenizers with hierarchical quantization (Shen et al., 12 Feb 2026, Chen et al., 14 Oct 2025, Chen et al., 29 Sep 2025).

7. Future Directions

Ongoing research and proposed advancements in EEG-FMs emphasize:

Neurophysiologically-Inspired Encodings: Incorporate connectivity graphs, biologically-motivated attention, and region-wise pooling (IILP, 3D encoding) (Fang et al., 18 Oct 2025, Chen et al., 7 Aug 2025).
Hierarchical and Multiscale Modeling: Decoupled path encoding (time, frequency, region) and MoE Transformers (Uni-NTFM, NeurIPT) promise increased scalability (Chen et al., 29 Sep 2025).
Federated and Continual Learning: Distributed training to handle private clinical corpora and adapt to novel paradigms without catastrophic forgetting.
Open Benchmarking and Model Cards: Shared leaderboards, zero/few-shot split protocols, and standard reporting are essential for reproducibility and transparency (Xiong et al., 25 Aug 2025, Kuruppu et al., 15 Jul 2025).
Explainable AI and Visual Analytics: Manifold-based representations (SPD, Riemannian geometry), saliency mapping, and analytical visualization support clinical and neuroscientific trust (Chen et al., 7 Aug 2025).
Task- and Domain-Adaptation: Lightweight adapters, prototype-guided fine-tuning, and continual pretraining to address label scarcity and BCI calibration constraints (Ma et al., 19 Feb 2026).

EEG-FMs stand at the frontier of AI-driven neural signal analysis, with foundational architectures now enabling cross-paradigm decoding, multimodal semantic alignment, robust generalization across hardware and population, and increasing transparency—while significant challenges in evaluation, clinical translation, and neuroscientific validity remain. Continuous innovation in model design, open benchmarks, and interdisciplinary collaboration is required to realize their full potential (Li et al., 21 Aug 2025, Shen et al., 12 Feb 2026, Liu et al., 25 Jan 2026, Fang et al., 18 Oct 2025, Xiong et al., 25 Aug 2025).