Subject-Agnostic Brain Decoding

Updated 12 November 2025

Subject-agnostic brain decoding is the process of inferring cognitive and perceptual states from neuroimaging data using models trained on pooled, diverse datasets without subject-specific tuning.
It employs unified representation learning techniques—such as functional alignment, tokenization, and adversarial feature disentanglement—to overcome inter-subject anatomical and functional heterogeneity.
This approach enables scalable brain-computer interfaces and rapid neurofeedback applications, enhancing diagnostics and clinical interventions across diverse populations.

Subject-agnostic brain decoding is defined as the inference of cognitive or perceptual states from neuroimaging or electrophysiological measurements in previously unseen subjects, using models trained with data pooled across many individuals without subject-specific fine-tuning. The central challenge is overcoming inter-subject physiological, anatomical, and cognitive heterogeneity so that a single model generalizes robustly. Recent advances combine functional alignment, subject-invariant representation learning, mixture-of-experts architectures, domain-adversarial modules, and cross-modal contrastive objectives. Applications span fMRI, EEG/MEG, sEEG/ECoG, vision, language, and multimodal decoding. Below, major conceptual and methodological axes are elucidated.

1. Sources of Inter-Subject Variability and the Subject-Agnostic Decoding Problem

Inter-subject variability in brain decoding arises from structural anatomical differences (e.g., folding, parcellation, voxel counts), functional diversity (individual strategies, baseline activity), and task-specific response heterogeneity. Anatomical alignment via surface templates (fsaverage5/7, 32k_fs_LR, MSMAll) yields only coarse vertex-wise correspondence; functional responses remain idiosyncratic, limiting naive pooling or transfer. The subject-agnostic challenge is to learn models where the decoder’s performance on a held-out subject (never seen during training) is comparable to its within-subject accuracy, without subject-specific retraining.

Functional alignment, exemplified by the Fused Unbalanced Gromov–Wasserstein (FUGW) optimal transport (Thual et al., 2023), addresses feature correspondences by minimizing a hybrid loss: $\mathcal{L}(P) = (1-\alpha)\sum_{ij} \|X^{\mathrm{out}}_i - X^{\mathrm{ref}}_j\|_2^2\,P_{ij} + \alpha\sum_{ijkl} |D^{\mathrm{out}}_{ik} - D^{\mathrm{ref}}_{jl}|^2\,P_{ij}P_{kl} + \rho \left[\mathrm{KL}(P_{\#1}||\mu^{\mathrm{out}}) + \mathrm{KL}(P_{\#2}||\mu^{\mathrm{ref}})\right] + \varepsilon\,\mathrm{H}(P)$ where $P$ is a soft correspondance matrix between vertices, $X^{\mathrm{out}}, X^{\mathrm{ref}}$ are functional timeseries, and $D$ encode anatomical distances.

A plausible implication is that large-scale functional alignment enables the transfer of decoding pipelines and sharper scaling laws for subject-agnostic decoding.

2. Unified Representation Learning: Architectures and Preprocessing

Recent subject-agnostic frameworks project all subjects’ neuroimaging data into unified spaces—2D cortical flatmaps (Qian et al., 2023, Wang et al., 31 Oct 2025), MNI or MSMAll surfaces (Dahan et al., 27 Jan 2025), region-aggregated graphs (Wu et al., 30 May 2025, Wu et al., 6 Aug 2025), or tokenizer-Perceiver pipelines (Xia et al., 2024). Preprocessing typically consists of:

Anatomical surface projection to fsaverage or equivalent
ROI selection (visual, language, or whole-brain), often retaining only 8–40k vertices
Frame-wise normalization, temporal compensation for BOLD lag
Tokenization via 1D/2D convolution, positional encoding, or patching (icosahedral, grid, or graph)

E.g., in fMRI-PTE (Qian et al., 2023), 3D volumes $V^{(s)}(x,y,z,t)$ for each subject undergo surface mapping $S^{(s)}(v,t)=\Phi(V^{(s)}(\cdot,t))$ , normalization, and 2D charting $I^{(s)}_{i,j}(t) = \bar S^{(s)}(\Psi^{-1}(i,j), t)$ .

This suggests unified spatial or token bases are a core requisite for scalable, subject-agnostic decoding.

3. Subject-Invariant Feature Extraction and Disentanglement

Architectures designed for subject-agnosticity encode subject-invariant and subject-specific features via explicit decomposition, adversarial objectives, or bottleneck layers. Disentanglement is central in ZEBRA (Wang et al., 31 Oct 2025), which decomposes fMRI features $E = E^{\mathrm{inv}} + E^{\mathrm{spe}}$ , applies a Gradient Reversal Layer (GRL) to suppress subject identity in $E^{\mathrm{inv}}$ , and aligns semantic features ( $F^{\mathrm{sem}}$ ) only to universal (CLIP) image/text spaces.

Similarly, MindLLM (Qiu et al., 18 Feb 2025) leverages neuroscience-informed cross-attention keyed only on spatial priors and parcellation embeddings, excluding voxel intensity from keys, thereby enforcing subject-invariant semantic extraction.

In MoE-based frameworks (Wills Aligner (Bao et al., 2024); Neuro-MoBRE (Wu et al., 6 Aug 2025)), each brain region or subject is handled by specialized, low-rank experts or region-specific gating, but the overall feature space is unified via alignment and semantic loss.

A plausible implication is that adversarial regularization and mixture-of-experts routing facilitate the filtering of idiosyncratic signals, dramatically improving transfer across brains.

4. Training Strategies: Cross-Subject Supervision, Functional Alignment, Contrastive Losses

Subject-agnostic designs rely on cross-subject data utilization, contrastive or reconstruction-based supervision, and functional domain adaptation. Notable approaches include:

Contrastive loss between brain and multimodal (CLIP, VideoMAE, wav2vec, BERT) embeddings, typically: $\mathcal{L}_{\mathrm{CLIP}} = -\log\frac{\exp(\mathrm{sim}(F_I^i, F_V^i)/\tau)}{\sum_j \exp(\mathrm{sim}(F_I^i, F_V^j)/\tau)} + \dots$
BiMixCo or InfoNCE joint loss, aligning corresponding tokens or time-series across subjects and stimuli (Lu et al., 4 Nov 2025)
Fourier-based supervision, which matches amplitude/phase spectra between new and pretrained subject fMRI signals (Jiang et al., 2024)
Region-masked autoencoder pretraining, masked functional/region tokens recovered from learned prototypes (Wu et al., 30 May 2025, Wu et al., 6 Aug 2025)
Domain-adversarial minimax objectives to strip out subject identity (Wang et al., 31 Oct 2025)

Batch sampling and pooling strategies wash out subject-specific signals, enabling scaling laws where increasing the number of pooled subjects monotonically improves unseen-subject accuracy (Kong et al., 2024, Dahan et al., 27 Jan 2025).

5. Quantitative Performance and Scaling Laws

Empirical studies demonstrate large accuracy gains in leave-one-subject-out and zero-shot protocols. Representative metrics and results:

Framework	Paradigm	Task	Unseen-subj Top-1 (%)	Baseline (%)
SIM (Dahan et al., 27 Jan 2025)	fMRI+Video+Audio	Movie clip retrieval	76.8	15.6
fMRI-PTE (Qian et al., 2023)	fMRI-to-image	AlexNet ident. (7→1)	72.82	70.39
ZEBRA (Wang et al., 31 Oct 2025)	fMRI-to-image	PixCorr retrieval	0.131 (zero-shot)	0.074
Wills Aligner (Bao et al., 2024)	fMRI-to-image	mAP (ViT+MoBE)	0.424	0.258
MindLLM (Qiu et al., 18 Feb 2025)	fMRI-to-text	BLEU-1 caption	61.75	59.44
Neuro-MoBRE (Wu et al., 6 Aug 2025)	sEEG phoneme	Top-1 articulation	28	4.3 (chance)
VCFlow (Lu et al., 4 Nov 2025)	fMRI-to-video	Top-50 semantics	14.0	11.6

Scaling laws (Kong et al., 2024) show that top-1 accuracy for unseen subjects rises linearly with number of training subjects ( $A(n)$ : 2% ( $n=1$ )→45% ( $n=167$ )). Generalization is unaffected by architecture: MLP, CNN, and Transformer all scale similarly.

A plausible implication is that with increasing pooled datasets (>100 subjects), true foundation decoders approaching within-subject performance are feasible.

6. Applications and Clinical Implications

Subject-agnostic decoding underlies core applications in:

fMRI/EEG/sEEG-based BCIs: robust decoding of vision, language, and motor intentions in previously unseen individuals (Wu et al., 30 May 2025, Zhang et al., 2024)
Rapid neurofeedback/diagnosis: VCFlow (Lu et al., 4 Nov 2025) reconstructs 8 min of video stimuli in 10 s per subject with only ~7% accuracy loss, eliminating subjective retraining. This suggests clinical scalability for bedside monitoring, rehabilitation, and cognitive assessment.
Multimodal spatial grounding and retrieval: UMBRAE (Xia et al., 2024) supports captioning, object localization, and image/text retrieval from brain embeddings.
Cross-task, cross-subject adaptation: Neuro-MoBRE (Wu et al., 6 Aug 2025) achieves generalization not only across individuals but multiple cognitive and clinical tasks (language, seizure).
Adapter-based few-shot deployment: MindShot (Jiang et al., 2024) demonstrates that lightweight, HRF-based adapters plus spectral alignment suffice for class-level generalization with minimal new data.

7. Limitations, Challenges, and Future Directions

While subject-agnostic brain decoding delivers substantial advances, notable limitations persist:

Current frameworks rely on tens to hundreds of subjects on the same acquisition protocol; inter-scanner and inter-dataset transfer remains open (Bao et al., 2024).
Functional alignment requires shared stimuli or parallel acquisition, limiting transfer to truly arbitrary subjects (Thual et al., 2023).
While decoding accuracy approaches within-subject benchmarks in large pooled datasets, semantic fidelity (e.g. rare object detection, paraphrase precision) lags behind.
Adapter-based and disentanglement approaches are not yet universally scalable to imaging modalities outside vision and language.
Clinical translation requires integration with higher-temporal-resolution modalities (EEG/MEG), richer spatial acquisition, and population-diverse cohorts (Zhang et al., 2024).
Meta-learning, dynamic expert addition/removal, and joint functional-anatomical alignment are active research directions (Bao et al., 2024).

A plausible implication is that large-scale, multimodal, functionally-aligned, and contrastively supervised models—augmented by small adapters or expert routers—will underpin future clinical-grade, subject-agnostic decoders, with minimal data requirements and robust population-wide deployment.