NSD-synthetic: OOD Visual Neuroscience Dataset
- NSD-synthetic is a visual neuroscience dataset that provides ultra-high field 7T fMRI responses to synthetic images for out-of-distribution evaluation.
- It systematically manipulates visual features across multiple classes (e.g., noise, words, spirals) to probe the boundaries of neural processing.
- The dataset employs advanced preprocessing and GLM pipelines to benchmark neural encoding models, comparing task- and self-supervised networks.
NSD-synthetic is a visual neuroscience dataset comprising ultra-high field 7T functional MRI (fMRI) responses from human participants to a controlled battery of synthetic images. Designed explicitly for out-of-distribution (OOD) evaluation, the dataset augments the Natural Scenes Dataset (NSD-core) by introducing image classes and stimulus features not present in naturalistic visual exposure. NSD-synthetic enables the benchmarking of computational models of visual processing on neural data that systematically diverge from conventional datasets, facilitating robust model selection and advancing theoretical frameworks of human vision (Gifford et al., 8 Mar 2025).
1. Dataset Composition and Stimulus Design
NSD-synthetic utilizes the eight subjects (subj01–subj08) from NSD-core, each undergoing an additional high-field 7T session comprising eight functional runs (428 s per run, 268 TRs, TR = 1.6 s; alternating fixation and one-back tasks). Each subject completed 744 stimulus trials per session, with 80 (≈ 10.8%) "one-back" repeat trials for behavioral monitoring.
Synthetic Image Set
- Number of images: 284 distinct synthetic stimuli.
- Hierarchical taxonomy: 8 high-level classes subdivided into 71 subclasses, each represented by 4 images.
| Image Class | Example Subclasses | Manipulated Features |
|---|---|---|
| Noise | White noise, Pink noise | Spatial frequency spectrum |
| Natural scenes | Scene structure, natural content | |
| Manipulated scenes | Upside-down, Line-drawing | Inversion, low-level abstraction |
| Contrast modulation | 100%, 50%, 10%, 6%, 4% | |
| Phase-coherence | 75%, 50%, 25%, 0% phase randomness | |
| Single words | Positions, lengths | Eccentricity, word length |
| Spiral gratings | SF levels, phase shifts | Micropattern orientation, log-polar |
| Chromatic noise | 16 hues, pink noise | Hue angle (subject-calibrated), achromatic |
Feature dimensions are manipulated across low-level (contrast, hue, spatial frequency, phase coherence), mid-level (oriented spirals, log-polar patterns), and high-level (scene structure, orthographic word content) factors. The design systematically probes representation boundaries of visual cortical processing under deviations from naturalistic image distributions.
2. fMRI Acquisition and Preprocessing Pipeline
Scanning was performed on the Siemens Magnetom 7 T platform (Center for Magnetic Resonance Research, University of Minnesota) equipped with a 32-channel receive, single-channel transmit RF coil (Nova Medical). EPI data acquisition employed 1.8 mm isotropic voxels, 84 slices, TR = 1600 ms, TE = 22 ms, multi-band factor = 3, and standard NSD-core protocol flip angles. Dual-echo field maps (2.2×2.2×3.6 mm, TE₁ = 8.16 ms, TE₂ = 9.18 ms) enabled geometric distortion correction.
Preprocessing closely followed NSD-core:
- FreeSurfer anatomical reconstruction → fsaverage surface resampling.
- EPI corrections: slice timing, within/across-session motion correction, fieldmap unwarping, gradient nonlinearity correction.
- Two resampled outputs for subsequent GLM analysis: 1.8 mm/τ = 1.333 s and 1 mm/τ = 1 s.
- Single-trial GLM ("GLMsingle") via per-vertex HRF library fit, GLMdenoise (autosourced noise regressors [Charest et al. 2018]), and ridge regression regularization [Rokem & Kay 2020].
- Regions-of-interest (ROIs): V1, V2, V3, hV4 (retinotopy), PPA, VWFA (category localizers), with additional ROIs (EBA, FFA) available on request.
3. Dataset Structure, BIDS Compatibility, and Data Access
The dataset adheres to BIDS-compatible directory conventions:
- /sub-<ID>/anat/: T1w anatomical images, transforms.
- /sub-<ID>/func/: Runwise bold images (Nifti), events.tsv trial logs, fieldmaps.
- /derivatives/glm/: Single-trial beta weights (% signal change, per run/subject).
- /derivatives/roi/: Per-subject ROI masks (fsaverage).
Metadata for each run, including stimulus onset, duration, image ID/subclass, one-back flags, and behavioral/eye-tracking QC, is provided as events.tsv. Image stimuli (.png/.mat), subclass labels, and generation parameters are accessible in a dedicated folder. Demographic information (age, gender), behavioral results (% correct, d′), and eye-tracking quality control are included.
Public repositories:
- Dataset: http://naturalscenesdataset.org
- Data manual: https://cvnlab.slite.page/p/CT9Fwl4_hc/NSD-Data-Manual
- Data generation/processing code: https://github.com/cvnlab/nsddatapaper/, https://github.com/cvnlab/nsdcode/
- NSD-synthetic analysis scripts: https://github.com/gifale95/NSD-synthetic
4. Out-of-Distribution (OOD) Characterization
The experimental intent of NSD-synthetic is to enable rigorous OOD evaluation. The training ("in-distribution", ID) set comprises >70,000 real photographs from NSD-core. The OOD set consists of the 284 synthetic stimuli that systematically manipulate non-natural visual dimensions.
Empirical Verification
Multidimensional scaling (MDS) of concatenated single-trial fMRI responses demonstrates synthetic trials form a statistically distinct response cluster relative to NSD-core, controlling for session effects. Within the synthetic cluster, further subdivision by image class (e.g., noise, spirals, words) is observed.
A quantitative measure of distributional separation, the ratio
with and as the respective cluster centroids in MDS space, yielded , denoting an ~80% greater between-cluster distance than within-cluster spread.
5. Model Benchmarking and Neural Encoding Evaluation
Encoding Frameworks
Linear ridge regression was used to map visual feature embeddings to vertexwise fMRI response. Four pretrained networks were employed for feature extraction:
- AlexNet: Task-supervised CNN
- ResNet-50: Task-supervised CNN
- MoCo: Self-supervised CNN (ResNet-50 backbone)
- vit_b_32: Task-supervised Vision Transformer
Pipeline: Center crop and resize 224×224 → transform (NSD-synthetic) → standard ImageNet normalization → feature extraction at each network sublayer → PCA (250 components) → ridge regression.
Evaluation Metrics
- Pearson correlation across test images:
- Explained variance (setting to 0):
- Normalized explained variance:
- Group summaries: Mean ± SEM, restricted to vertices with ncsnr > 0.3.
In-distribution vs. Out-of-distribution Performance
On early visual cortex (V1–V3) vertices:
- ID test set (NSD-core, n=284): Mean
- OOD test set (NSD-synthetic, n=284): Mean
- ID–OOD difference: (paired across vertices)
- Statistical test (AlexNet, V1 vertices): , , Cohen’s
Model Comparison Reveals OOD-Sensitive Differences
NSD-synthetic enables OOD model comparison not apparent in ID regimes.
- vit_b_32 vs. AlexNet:
- ID: vit_b_32 marginally outperforms in ventral areas (peak ), underperforms in early areas (peak ).
- OOD: vit_b_32 outperforms AlexNet across all visual areas (peak ).
- OOD difference: , .
- ResNet-50 vs. MoCo:
- ID: ResNet-50 favored in higher-level ROIs (peak ), MoCo favored in early ROIs (peak ).
- OOD: MoCo outperforms throughout the hierarchy (peak ).
- OOD difference (V1, MoCo – ResNet-50): , , Cohen’s .
OOD Generalization Procedure
- Train encoding models on 9,000 subject-unique NSD-core images.
- Evaluate using identical regression weights on: (a) held-out NSD-core (ID), (b) NSD-synthetic (OOD).
- Use as the primary metric.
- Map contrasts onto cortical surfaces with FDR correction ().
A plausible implication is that self-supervised deep neural networks can better model biological vision in OOD regimes than task-supervised networks, a distinction undetectable with traditional ID benchmarks.
6. Context and Research Applications
NSD-synthetic enables systematic OOD generalization benchmarking for NeuroAI, complementing existing large-scale datasets restricted to natural images. It provides a standardized, rigorously annotated fMRI resource for investigating model-to-brain generalization under controlled stimulus perturbations. Public access to data, preprocessing pipelines, ROI definitions, and code ensures replicability and extensibility for evaluating new models and hypotheses regarding human visual processing (Gifford et al., 8 Mar 2025).
By foregrounding OOD differentiation—absent in ID-only paradigms—NSD-synthetic advances both the empirical assessment of neural network models and the formulation of computational theories with improved explanatory power for human vision.