TITD: THINGS Macaque IT Dataset
- TITD is a comprehensive neural dataset capturing detailed spike recordings from 512 channels in macaque IT cortex during visual presentation of thousands of natural images.
- It employs linear regression and paired encoding-decoding models to map spatiotemporal neural activity into interpretable latent visual features and imagery.
- The dataset facilitates advanced image reconstruction using diffusion generative models and supports cross-species comparisons in visual computational neuroscience.
The THINGS Macaque IT Dataset (TITD) is a large-scale resource designed to probe how complex visual information is encoded and represented in the primate inferior temporal (IT) cortex. Utilizing multi-electrode array recordings in macaques while presenting thousands of natural stimuli from the THINGS image set, TITD offers fine-grained spatial and temporal measurements of distributed neural population activity. This dataset is pivotal for bridging computational neuroscience, comparative neurobiology, and machine learning approaches to object recognition by leveraging paired encoding and decoding models, functional ensemble analyses, and state-of-the-art generative methods.
1. Dataset Structure and Data Acquisition
TITD comprises neural spike recordings from 512 channels across four chronically implanted Utah arrays placed in macaque IT cortex, arranged in an 11×12 configuration with certain corner electrodes omitted. Recordings are performed during visual presentation of thousands of natural images from the THINGS image set, which encompasses 720 object categories, ensuring a diverse sampling of perceptual space.
Spike activity is binned in 25-ms increments, capturing response dynamics from stimulus onset up to approximately 600 ms. This temporal resolution allows for analysis of both rapid, low-level perceptual coding and slower, higher-order semantic representation periods. The arrays span multiple IT subregions along the anterior–posterior axis, supporting both localized and distributed population analyses.
2. Decoding Neural Representations: Linear Regression and Latent Spaces
One core strategy in TITD is the “decoding” of population spiking activity into interpretable latent visual spaces. Linear regression (typically ridge regression with L₂ regularization) is employed to map input vectors of spike counts to latent embeddings :
Here, is the learned weight matrix, and is the bias term. Multiple latent spaces are used:
- CLIP model: Preserves high-level semantic content (e.g., category, object presence).
- VDVAE, PCA, ICA: Capture low-level image features (e.g., color, texture, brightness).
Decoding performance is evaluated based on the fidelity of the predicted latent embedding to the ground-truth visual representations.
3. Image Reconstruction with Diffusion Generative Models
Reconstruction of stimulus images from neural activity involves projecting decoded latent embeddings into pixel space using generative models, notably through the “unCLIP” diffusion framework. This two-stage process first generates initial VAE latents, then conditions generation on the CLIP guidance vector. The system can reconstruct both coarse and fine details, including object boundaries, textures, and semantic category attributes.
When using CLIP latents, reconstructions maintain semantic integrity, while VDVAE or lower-dimensional latents are especially effective at reproducing color, brightness, and texture. Evaluations use low-level metrics such as pixel-wise correlation and the Structural Similarity Index (SSIM):
with as image means, as variances, and as covariance.
High-level evaluation involves comparing feature correlations via activations in networks such as AlexNet, InceptionNet, CLIP, EfficientNet, and SwAV.
4. Encoding Models and Visualization of Neural Preferences
In addition to decoding, TITD uses encoding models to map visual latent features back into brain space. This reversed mapping predicts spike counts () from latent representations ():
The weights of the encoding matrices are used to generate “preferred stimuli” by inverting these representations using generative models, yielding images that theoretically most strongly activate individual electrodes or neural ensembles. This approach elucidates the feature sensitivity and preferred semantic attributes of IT cortex populations.
5. Spatiotemporal Dynamics and Functional Clustering
The high-density electrode arrangement permits detailed mapping of spatial and temporal dynamics in IT cortex. Binned spike recordings reveal temporal evolution from early encoding of primitive features (<100 ms) to later emergence of complex, semantic representations (125–150 ms onwards).
Electrodes are organized along the IT anterior–posterior axis, allowing for visualization of category-specific responses: animal stimuli elicit robust signals in anterior regions, while categories such as food evoke more distributed activation.
Cosine similarity–based hierarchical clustering is performed on encoding model weight vectors to detect functional ensembles—groups of electrodes sharing similar stimulus preferences. Such ensembles are visualized as contiguous regions in spatiotemporal maps.
Analytical Aspect | Methodology | Significance |
---|---|---|
Decoding to Latent Space | Linear regression (ridge) | Maps neural activity to interpretable features |
Reconstruction | unCLIP diffusion model | Generates perceptual images from spike data |
Functional Clustering | Cosine similarity, hierarchical | Identifies ensembles with shared feature selectivity |
6. Comparative Relevance and Integration with Human Datasets
TITD shares its stimulus base and analytical logic with datasets like CNeuroMod-THINGS (St-Laurent et al., 11 Jul 2025), enabling direct cross-species comparisons of object representation in high-level visual areas. While TITD focuses exclusively on macaque IT cortex, human datasets implement analogous recognition paradigms and response mapping. This alignment permits stringent tests of generalization for neuro-AI models and supports nuanced analysis of representational similarities and differences in semantic coding across species.
A plausible implication is that hierarchical organization and semantic clustering observed in TITD can be systematically compared against human fMRI data, providing convergent evidence for principles of visual processing.
7. Impact and Applications in Computational and Systems Neuroscience
TITD’s design enables the joint application of encoding and decoding frameworks, robust functional ensemble mapping, and generative image reconstruction directly from neural signals. These methods facilitate understanding of distributed coding, enable feature-based visualization of neural preferences, and inform cross-level computational models of object recognition. TITD thus serves as a benchmark for future studies into hierarchical organization, perceptual inference, and semantic generalization in primate vision.
The dataset’s capacity to recover both low-level and high-level features, together with its utility in reconstructing object categories from distributed population activity, underscores its value for linking neural computation and perceptual experience.