Papers
Topics
Authors
Recent
Search
2000 character limit reached

NSD-synthetic: OOD Visual Neuroscience Dataset

Updated 3 March 2026
  • NSD-synthetic is a visual neuroscience dataset that provides ultra-high field 7T fMRI responses to synthetic images for out-of-distribution evaluation.
  • It systematically manipulates visual features across multiple classes (e.g., noise, words, spirals) to probe the boundaries of neural processing.
  • The dataset employs advanced preprocessing and GLM pipelines to benchmark neural encoding models, comparing task- and self-supervised networks.

NSD-synthetic is a visual neuroscience dataset comprising ultra-high field 7T functional MRI (fMRI) responses from human participants to a controlled battery of synthetic images. Designed explicitly for out-of-distribution (OOD) evaluation, the dataset augments the Natural Scenes Dataset (NSD-core) by introducing image classes and stimulus features not present in naturalistic visual exposure. NSD-synthetic enables the benchmarking of computational models of visual processing on neural data that systematically diverge from conventional datasets, facilitating robust model selection and advancing theoretical frameworks of human vision (Gifford et al., 8 Mar 2025).

1. Dataset Composition and Stimulus Design

NSD-synthetic utilizes the eight subjects (subj01–subj08) from NSD-core, each undergoing an additional high-field 7T session comprising eight functional runs (428 s per run, 268 TRs, TR = 1.6 s; alternating fixation and one-back tasks). Each subject completed 744 stimulus trials per session, with 80 (≈ 10.8%) "one-back" repeat trials for behavioral monitoring.

Synthetic Image Set

  • Number of images: 284 distinct synthetic stimuli.
  • Hierarchical taxonomy: 8 high-level classes subdivided into 71 subclasses, each represented by 4 images.
Image Class Example Subclasses Manipulated Features
Noise White noise, Pink noise Spatial frequency spectrum
Natural scenes Scene structure, natural content
Manipulated scenes Upside-down, Line-drawing Inversion, low-level abstraction
Contrast modulation 100%, 50%, 10%, 6%, 4%
Phase-coherence 75%, 50%, 25%, 0% phase randomness
Single words Positions, lengths Eccentricity, word length
Spiral gratings SF levels, phase shifts Micropattern orientation, log-polar
Chromatic noise 16 hues, pink noise Hue angle (subject-calibrated), achromatic

Feature dimensions are manipulated across low-level (contrast, hue, spatial frequency, phase coherence), mid-level (oriented spirals, log-polar patterns), and high-level (scene structure, orthographic word content) factors. The design systematically probes representation boundaries of visual cortical processing under deviations from naturalistic image distributions.

2. fMRI Acquisition and Preprocessing Pipeline

Scanning was performed on the Siemens Magnetom 7 T platform (Center for Magnetic Resonance Research, University of Minnesota) equipped with a 32-channel receive, single-channel transmit RF coil (Nova Medical). EPI data acquisition employed 1.8 mm isotropic voxels, 84 slices, TR = 1600 ms, TE = 22 ms, multi-band factor = 3, and standard NSD-core protocol flip angles. Dual-echo field maps (2.2×2.2×3.6 mm, TE₁ = 8.16 ms, TE₂ = 9.18 ms) enabled geometric distortion correction.

Preprocessing closely followed NSD-core:

  1. FreeSurfer anatomical reconstruction → fsaverage surface resampling.
  2. EPI corrections: slice timing, within/across-session motion correction, fieldmap unwarping, gradient nonlinearity correction.
  3. Two resampled outputs for subsequent GLM analysis: 1.8 mm/τ = 1.333 s and 1 mm/τ = 1 s.
  4. Single-trial GLM ("GLMsingle") via per-vertex HRF library fit, GLMdenoise (autosourced noise regressors [Charest et al. 2018]), and ridge regression regularization [Rokem & Kay 2020].
  5. Regions-of-interest (ROIs): V1, V2, V3, hV4 (retinotopy), PPA, VWFA (category localizers), with additional ROIs (EBA, FFA) available on request.

3. Dataset Structure, BIDS Compatibility, and Data Access

The dataset adheres to BIDS-compatible directory conventions:

  • /sub-<ID>/anat/: T1w anatomical images, transforms.
  • /sub-<ID>/func/: Runwise bold images (Nifti), events.tsv trial logs, fieldmaps.
  • /derivatives/glm/: Single-trial beta weights (% signal change, per run/subject).
  • /derivatives/roi/: Per-subject ROI masks (fsaverage).

Metadata for each run, including stimulus onset, duration, image ID/subclass, one-back flags, and behavioral/eye-tracking QC, is provided as events.tsv. Image stimuli (.png/.mat), subclass labels, and generation parameters are accessible in a dedicated folder. Demographic information (age, gender), behavioral results (% correct, d′), and eye-tracking quality control are included.

Public repositories:

4. Out-of-Distribution (OOD) Characterization

The experimental intent of NSD-synthetic is to enable rigorous OOD evaluation. The training ("in-distribution", ID) set comprises >70,000 real photographs from NSD-core. The OOD set consists of the 284 synthetic stimuli that systematically manipulate non-natural visual dimensions.

Empirical Verification

Multidimensional scaling (MDS) of concatenated single-trial fMRI responses demonstrates synthetic trials form a statistically distinct response cluster relative to NSD-core, controlling for session effects. Within the synthetic cluster, further subdivision by image class (e.g., noise, spirals, words) is observed.

A quantitative measure of distributional separation, the ratio

δ=meanyiμcoremeanyjμsynth\delta = \frac{\text{mean}\,\Vert y_i - \mu_{\text{core}}\Vert}{\text{mean}\,\Vert y_j - \mu_{\text{synth}}\Vert}

with μcore\mu_{\text{core}} and μsynth\mu_{\text{synth}} as the respective cluster centroids in MDS space, yielded δ1.8\delta \approx 1.8, denoting an ~80% greater between-cluster distance than within-cluster spread.

5. Model Benchmarking and Neural Encoding Evaluation

Encoding Frameworks

Linear ridge regression was used to map visual feature embeddings to vertexwise fMRI response. Four pretrained networks were employed for feature extraction:

  • AlexNet: Task-supervised CNN
  • ResNet-50: Task-supervised CNN
  • MoCo: Self-supervised CNN (ResNet-50 backbone)
  • vit_b_32: Task-supervised Vision Transformer

Pipeline: Center crop and resize 224×224 → RGB\sqrt{\text{RGB}} transform (NSD-synthetic) → standard ImageNet normalization → feature extraction at each network sublayer → PCA (250 components) → ridge regression.

Evaluation Metrics

  • Pearson correlation across test images:

r=cov(y,y^)σyσy^r = \frac{\mathrm{cov}(y, \hat{y})}{\sigma_y\,\sigma_{\hat{y}}}

  • Explained variance (setting r<0r < 0 to 0):

R2=1i(yiy^i)2i(yiyˉ)2=r2R^2 = 1 - \frac{\sum_i (y_i - \hat{y}_i)^2}{\sum_i (y_i - \bar{y})^2} = r^2

  • Normalized explained variance:

Rnorm2=R2Rceiling2,Rnorm2[0,1]R^2_{\text{norm}} = \frac{R^2}{R^2_{\text{ceiling}}}, \quad R^2_{\text{norm}} \in [0, 1]

  • Group summaries: Mean ± SEM, restricted to vertices with ncsnr > 0.3.

In-distribution vs. Out-of-distribution Performance

On early visual cortex (V1–V3) vertices:

  • ID test set (NSD-core, n=284): Mean Rnorm20.48±0.02R^2_{\text{norm}} \approx 0.48 \pm 0.02
  • OOD test set (NSD-synthetic, n=284): Mean Rnorm20.26±0.02R^2_{\text{norm}} \approx 0.26 \pm 0.02
  • ID–OOD difference: Δ0.22±0.01\Delta \approx 0.22 \pm 0.01 (paired across vertices)
  • Statistical test (AlexNet, V1 vertices): t(7)=15.3t(7) = 15.3, p<0.0001p < 0.0001, Cohen’s d5.4d \approx 5.4

Model Comparison Reveals OOD-Sensitive Differences

NSD-synthetic enables OOD model comparison not apparent in ID regimes.

  • vit_b_32 vs. AlexNet:
    • ID: vit_b_32 marginally outperforms in ventral areas (peak ΔRnorm20.10\Delta R^2_{\text{norm}} \approx 0.10), underperforms in early areas (peak 0.10-0.10).
    • OOD: vit_b_32 outperforms AlexNet across all visual areas (peak ΔRnorm20.25\Delta R^2_{\text{norm}} \approx 0.25).
    • OOD difference: t(7)=12.4t(7) = 12.4, p<0.0001p < 0.0001.
  • ResNet-50 vs. MoCo:
    • ID: ResNet-50 favored in higher-level ROIs (peak ΔRnorm20.05\Delta R^2_{\text{norm}} \approx 0.05), MoCo favored in early ROIs (peak Δ0.05\Delta \approx 0.05).
    • OOD: MoCo outperforms throughout the hierarchy (peak ΔRnorm20.10\Delta R^2_{\text{norm}} \approx 0.10).
    • OOD difference (V1, MoCo – ResNet-50): t(7)=8.7t(7) = 8.7, p<0.0005p<0.0005, Cohen’s d3.1d \approx 3.1.

OOD Generalization Procedure

  1. Train encoding models on 9,000 subject-unique NSD-core images.
  2. Evaluate using identical regression weights on: (a) held-out NSD-core (ID), (b) NSD-synthetic (OOD).
  3. Use Rnorm2R^2_{\text{norm}} as the primary metric.
  4. Map contrasts onto cortical surfaces with FDR correction (q<0.05q < 0.05).

A plausible implication is that self-supervised deep neural networks can better model biological vision in OOD regimes than task-supervised networks, a distinction undetectable with traditional ID benchmarks.

6. Context and Research Applications

NSD-synthetic enables systematic OOD generalization benchmarking for NeuroAI, complementing existing large-scale datasets restricted to natural images. It provides a standardized, rigorously annotated fMRI resource for investigating model-to-brain generalization under controlled stimulus perturbations. Public access to data, preprocessing pipelines, ROI definitions, and code ensures replicability and extensibility for evaluating new models and hypotheses regarding human visual processing (Gifford et al., 8 Mar 2025).

By foregrounding OOD differentiation—absent in ID-only paradigms—NSD-synthetic advances both the empirical assessment of neural network models and the formulation of computational theories with improved explanatory power for human vision.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NSD-synthetic.