Natural Scenes Dataset (NSD)
- Natural Scenes Dataset (NSD) is a large-scale fMRI resource capturing high-resolution brain responses to thousands of diverse natural scene images.
- It employs rigorous data acquisition and preprocessing, including motion correction and GLM-based β-estimation, to ensure high signal reliability.
- NSD supports innovative research in neural encoding, out-of-distribution generalization, and mental imagery decoding for both neuroscience and NeuroAI.
The Natural Scenes Dataset (NSD) is a large-scale, high-resolution functional MRI (fMRI) resource enabling precise modeling of human visual cortical responses to thousands of naturalistic scene images. NSD comprises single-trial fMRI measurements for 8 human subjects as they viewed tens of thousands of diverse natural scene images, offering a significant advance in both data volume and stimulus diversity relative to prior visual neuroscience datasets. It forms a foundation for benchmarking neural and computational models of vision and supports extensions that probe out-of-distribution (OOD) generalization and mental imagery.
1. Dataset Scope, Stimuli, and Acquisition
NSD was constructed by presenting each of 8 healthy adult participants—ages 21–35, balanced gender distribution—with approximately 10,000 distinct color natural scenes, amounting to ∼73,000 unique image-subject pairs (Gifford et al., 2023). All images are drawn from the Microsoft COCO (“Common Objects in Context”) dataset, ensuring diverse scene categories that span indoor and outdoor environments, multiple object types, and complex spatial compositions. Images were presented at ~8° visual angle, intensity-normalized and sRGB-calibrated for display consistency.
Each subject participated in 30–40 scanning sessions. Each session comprised 6 runs of approximately 10 minutes each, where the participant viewed around 40 images per run (3 s image presentation, followed by 1 s blank), with periodic “catch” trials for attentional monitoring. Stimulus sampling was stratified to maximize category and layout diversity while minimizing repeat viewings within and across subjects.
Functional data were acquired on a 7T Siemens Magnetom scanner with a 32-channel head coil. Imaging protocol parameters included 1.8 mm isotropic voxel size, repetition time (TR) of 1.6–1.8 s, and echo time (TE) 22 ms. Visual cortex coverage included occipital and ventral temporal regions. Surface reconstructions and cortical segmentations were performed using FreeSurfer. The raw and processed data are available for direct download, with responses stored as NumPy or MATLAB arrays and metadata encoded in JSON files.
2. Preprocessing, Quality Control, and Signal Metrics
The NSD preprocessing pipeline addresses both temporal and spatial confounds. Steps include:
- Slice timing and motion correction, using six-parameter rigid-body alignment
- Geometric distortion correction via dual-echo field maps
- Coregistration to individual high-resolution anatomical scans
- Projection of volumetric data to fsaverage cortical surfaces
- Single-trial β-weight estimation for each image using GLMsingle, which fits voxelwise HRFs and applies ridge-regularized GLM denoising (Gifford et al., 2023, Gifford et al., 8 Mar 2025)
- Run-wise z-scoring and averaging across repeated presentations to ensure reliability
Stringent quality control excluded runs with >1.5 mm translation or >1.5° rotation. Across all data, ∼98% of runs met these standards.
Noise and reliability metrics include median signal-to-noise ratio (SNR) in early visual cortex (V1–V3) of ∼50 (β mean/std) and split-half correlations averaging r ≈ 0.7 in early visual areas and r ≈ 0.5 in high-level regions. A per-vertex noise ceiling (NC), estimated via Spearman–Brown correction on split-half reliability, allows for noise-normalized model evaluation (Gifford et al., 2023).
3. Experimental Design and Behavioral Monitoring
Central fixation with a dot was maintained on the display throughout, with attentional task manipulation via a one-back repetition detection or luminance increment task. Behavioral compliance on these tasks was high; mean accuracy on the fixation task ranged 76–98%, and mean d′ in the one-back task was 1.6–3.3 (Gifford et al., 8 Mar 2025).
The extensive image sampling was structured such that each subject primarily viewed a non-overlapping image set, with a controlled subset of 284 images (“shared images”) viewed by all participants to enable population-level cross-subject analysis and benchmarking.
4. Variants and Extensions: NSD-synthetic and NSD-Imagery
Several derivatives of NSD expand its utility for experimental and modeling paradigms that require out-of-distribution or non-visual stimuli.
NSD-synthetic supplements the “NSD-core” dataset with one additional scan session per subject, measuring responses to 284 synthetic, systematically crafted images. Stimulus classes include various noise types (white, pink), contrast and phase manipulations, synthetic gratings, chromatic noise, manipulated natural scenes (e.g., Mooney, line drawings), and isolated words. These stimuli were not generated by learned models but parameterized using low- and mid-level feature manipulations, with log-polar spirals, modulated spatial frequencies, and phase-scrambling. Multidimensional scaling (MDS) of fMRI responses revealed that NSD-synthetic images sample a domain separable from the natural-scene manifold, confirming genuine OOD status. Single-trial variance in response to NSD-synthetic stimuli is predominantly stimulus-driven (36–80% in visual cortex), and representational similarity analyses demonstrate robust, stimulus-locked multivariate codes across subjects and ROIs (Gifford et al., 8 Mar 2025).
NSD-Imagery extends NSD into the domain of internally generated (“mental imagery”) visual experience. The same participants underwent additional sessions where they were cued to vividly imagine scene categories, complex scenes, simple shapes, or conceptual words, with matched visual and imagery trials. Preprocessing and GLM protocols matched those in NSD-core, and per-trial β-estimates were generated. As expected, mental imagery responses exhibit reduced SNR and signal dimensionality relative to direct visual presentation, requiring tailored analysis and modeling approaches (Kneeland et al., 7 Jun 2025).
5. Benchmarking, Encoding Models, Decoding, and Out-of-Distribution Evaluation
NSD is widely employed for voxel-wise encoding model benchmarking. Standard encoding models assume linearized mappings from DNN-derived features X to measured fMRI responses y:
where is optimized via regularized regression (ridge), and is Gaussian noise. For neural encoding, features X are extracted from layers of image-computable models such as AlexNet, ResNet, or Vision Transformers and concatenated, with dimensionality reduced by PCA. The established performance baseline is a linearized AlexNet encoding, achieving 40.48% mean noise-normalized explained variance (Gifford et al., 2023). The challenge metric is:
averaged across all cortical vertices , where and are ground-truth and predicted β-weights (Gifford et al., 2023).
NSD-synthetic facilitates rigorous OOD generalization testing: model performance declines 20–30 percentage points when tested on synthetic images versus held-out in-distribution images, exposing systematic failures in leading models. Self-supervised deep neural networks (e.g., MoCo, ViT models) show markedly smaller OOD drops compared to task-supervised networks, revealing robust alignment with neural responses on non-naturalistic stimuli that is not apparent in in-distribution benchmarking (Gifford et al., 8 Mar 2025).
NSD-Imagery enables benchmarking of fMRI-to-image decoding models under vision-to-imagery transfer, showing that high performance on seen-image decoding does not imply robust generalization to imagined images. Decoders with simpler, linear mappings into multimodal feature spaces (CLIP image+text embeddings with diffusion/GAN priors) outperform complex, highly-parameterized models on mental imagery, highlighting a trade-off between model flexibility and cross-domain robustness (Kneeland et al., 7 Jun 2025).
6. Applications and Research Impact
NSD enables a range of research directions in both neuroscience and NeuroAI. It supports:
- Fine-grained mapping of visual cortical representational hierarchies and retinotopy
- Model-based hypothesis testing for deep neural network alignment with human vision, using both in-distribution and OOD stimuli
- Data-driven development and evaluation of brain-to-image decoding and brain–computer interface (BCI) architectures
- Individual-difference and group-level analyses in human vision, including extensions to clinical, psychiatric, and neurodevelopmental investigation
The Algonauts Project challenge directly leverages NSD, providing a standardized evaluation platform with public leaderboards, facilitating rapid model development and transparent comparison (Gifford et al., 2023). Recommendations for extension include combining NSD with temporally resolved modalities (EEG, MEG), scaling up subject pools, and integrating behavioral or eye-tracking data.
7. Future Directions and Open Research Questions
Proposed future directions for the NSD ecosystem include:
- Expansion of synthetic and manipulated stimulus sets to probe semantic or high-level visual computation
- Development of modeling paradigms that explicitly minimize OOD generalization gaps, e.g., adversarial feature-space regularization
- Extension of NSD-derived approaches to real-time modalities and BCI applications, particularly in locked-in or clinical populations
- Refinement of mental imagery decoding including integration of human-in-the-loop evaluation protocols, hybrid transfer/fine-tuning strategies, and ethical frameworks for mind–computer interfacing
- Cross-dataset transfer, leveraging NSD as pretraining for more generalized visual brain–machine interfaces or neuroscience-informed neural architecture search (Gifford et al., 8 Mar 2025, Kneeland et al., 7 Jun 2025)
The full dataset, associated code, and extensive utility libraries are available at http://naturalscenesdataset.org, providing a foundation for ongoing methodological innovation and theory building in systems neuroscience, vision science, and NeuroAI.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free