NSD-Imagery: Mental fMRI Benchmark

Updated 3 March 2026

NSD-Imagery is a benchmark resource linking high-resolution 7T fMRI data with explicit mental imagery tasks to study the decoupling between perception and internal visual experiences.
It features controlled tasks for vision, imagery, and attention, capturing critical differences in signal-to-noise and receptive field properties between perceived and imagined stimuli.
Benchmarking reveals that linear multimodal decoders generalize better to imagery, offering actionable insights for improving BCI designs and clinical neuroimaging methods.

NSD-Imagery is a benchmark resource of human 7T fMRI paired with explicit mental imagery, developed to address the decoupling between externally driven and internally generated visual experiences in neural decoding research. It extends the experimental framework of the Natural Scenes Dataset (NSD), which is the largest existing set of fMRI responses to natural scene viewing, by introducing systematic measurements of brain activity when participants imagine specific stimuli rather than perceive them visually. NSD-Imagery provides a foundation for evaluating the generalization of state-of-the-art fMRI-to-image decoding models to mental image reconstruction, illuminating core challenges in cross-modal brain–computer interface (BCI) design, clinical neuroimaging, and computational cognitive neuroscience (Kneeland et al., 7 Jun 2025).

1. Dataset Protocol and Structure

The NSD-Imagery dataset consists of 7T fMRI data collected from the same eight subjects as the main NSD. The protocol matches technical parameters of NSD: 1.8 mm isotropic voxels, TR = 1.6 s, and GLMsingle-based single-trial analysis in subject-specific ROI parcellations (“nsdgeneral” masks). Each subject performed both vision and imagery tasks, with 576 imagery trials per participant.

The experiment differentiates three task modalities (vision, imagery, attention-detection) crossed by three stimulus domains (simple shapes, natural scenes, conceptual word cues). Trials are structured such that a one-letter cue is presented (linked to a specific learned stimulus), followed by 3 s of imagery or actual vision, and then fixation. During imagery trials, subjects imagine the cued target inside an 8.4° × 8.4° visual frame and rate vividness. Control (attention-detection) trials are included but not used for benchmarking. All responses are preprocessed and released in BIDS format; data organization tracks individual trials, β-weight timecourses, NIfTI volumes, and ROI masks (Kneeland et al., 7 Jun 2025).

2. Relationship to NSD and Signal Properties

NSD-Imagery was explicitly constructed to complement the NSD-core dataset, which contains over 10,000 unique visually presented images per subject with high SNR. While NSD measures visually evoked responses to retinal stimuli, NSD-Imagery captures internally generated imagery following symbolic cues. The resultant data exhibit:

Lower signal-to-noise ratios, particularly in early visual areas (V1–V4).
Broader population receptive fields and lower spatial frequency sensitivity in imagery responses versus vision.
Identical anatomical voxel coverage but fundamentally altered underlying neural encoding.

These properties make NSD-Imagery an essential testbed for cross-decoding, where vision-trained models must generalize to “out-of-distribution” neural codes that underlie mental imagery rather than direct perception (Kneeland et al., 7 Jun 2025).

3. Model Benchmarking and Cross-Decoding

NSD-Imagery is used to benchmark five open-source visual decoding architectures previously trained on NSD:

MindEye1: Two-stage CLIP-guided diffusion with Bayesian ridge multimodal decoding.
MindEye2: SDXL unCLIP-based model driven by ViT-bigG embeddings.
Brain Diffuser: CLIP-vision and CLIP-text decoded via linear ridge regression and diffusion sampling.
iCNN: VGG-19 feature decoding with ridge regression and DCNN-GAN generative model.
Takagi et al.: Multi-input text+image conditioned two-stage diffusion.

All models are trained only on NSD vision data (no mental imagery training), then evaluated without retraining on NSD-Imagery held-out trials. Inputs to each model are fMRI βs; targets are image reconstructions or feature predictions (Kneeland et al., 7 Jun 2025).

Cross-decoding demonstrates that performance on vision is only loosely predictive of performance on mental imagery. In particular:

All models experience a large drop in pixelwise and feature-based reconstruction metrics when shifting from vision to imagery.
The degradation is most pronounced in early visual cortex; representations in higher-order ROIs are more robust.
Linear decoders (Brain Diffuser, iCNN) and multimodal embeddings (especially those combining image and text) generalize better to the imagery domain than complex end-to-end models (MindEye2), which overfit to vision-driven distributions (Kneeland et al., 7 Jun 2025).

4. Evaluation Metrics

Multiple metrics quantify the quality of imagery decoding:

Pixel correlation (PixCorr): Pearson’s correlation of pixel intensities between reconstructed and ground truth.
SSIM: Structural similarity index.
2-way identification (2WC): Forced-choice identification based on feature similarity in various deep feature spaces (AlexNet, InceptionV3, CLIP ViT/L14).
High-level feature distance: Euclidean distance in EfficientNet-B and SwAV spaces (lower is better).
Brain correlation: For each ROI, Pearson’s $r$ between measured and predicted β responses.

All methods show consistent degradation in imagery versus vision on these benchmarks, with differences most apparent in early versus high-level ROIs. Linear, multimodal decoders exhibit superior robustness (Kneeland et al., 7 Jun 2025).

5. Architectural Insights and Generalization

A fundamental result is that decoder architecture substantially determines generalization to mental imagery:

Linear ridge regression models, especially when decoding to both image and text embeddings, are less susceptible to overfitting and retain more vision-domain performance when transferred to mental imagery.
End-to-end architectures that leverage deep, vision-specific priors often learn fine-grained details that do not transfer, resulting in poor imagery decoding.
Text guidance and contrastive representations appear to stabilize decoding under low-SNR conditions typical of internally generated imagery.

A plausible implication is that hybrid architectures, pairing robust linear decoders with flexible, learned generative priors, may offer improved performance in decoding both visually evoked and internally generated brain activity (Kneeland et al., 7 Jun 2025).

6. Applications and Significance

NSD-Imagery has immediate implications for brain–computer interfaces, clinical neuroimaging, and computational models of cognition:

Enables benchmarking and development of practical fMRI decoders for real-world, imagery-driven scenarios, including assistive communication for locked-in patients and assessment of covert consciousness.
Supports diagnostic explorations by revealing SNR and receptive field properties unique to mental imagery.
Demonstrates the necessity of collecting and leveraging mental imagery datasets—training exclusively on vision is insufficient for high-fidelity mental image decoding.

Future recommendations include scaling up mental imagery data collection, pursuing multi-subject pre-training, and developing feedback-optimized, real-time BCI pipelines (Kneeland et al., 7 Jun 2025).

7. Integration with OOD and Synthetic Benchmarks

The NSD-Imagery resource fits into a broader effort to benchmark neural coding and model robustness under distributional shift. Related datasets such as NSD-synthetic provide controlled OOD visual stimuli (artificial textures, geometric patterns, low-level feature manipulations) to probe generalization limits of neural encoding models (Gifford et al., 8 Mar 2025). Together, these resources define a standard for evaluating the transfer and robustness of neural decoding architectures in both externally and internally generated visual experience.

Markdown Report Issue Upgrade to Chat

References (2)

NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery (2025)

A 7T fMRI dataset of synthetic images for out-of-distribution modeling of vision (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NSD-Imagery.