Generalization of brain–model alignment metrics to other architectures and objectives

Establish whether comparable spatial, temporal, and overall encoding alignment metrics between model representations and human brain activity would emerge when using architectures and training objectives other than the self-supervised, hierarchical DINOv3 vision transformer family evaluated in this study.

Background

The paper measures representational alignment between DINOv3 vision transformers and human brain activity using three metrics: an overall encoding score (linear predictability), a spatial score (hierarchical layer-to-cortical distance correspondence in fMRI), and a temporal score (layer-to-MEG response latency correspondence).

All results are based on a single family of self-supervised, hierarchical computer vision models (DINOv3). The authors therefore identify uncertainty about whether the observed alignment patterns would hold across different neural network architectures and training paradigms.

References

It thus remains an open question whether similar spatial, temporal and encoding scores would emerge with other architectures and training objectives.

— Disentangling the Factors of Convergence between Brains and Computer Vision Models (2508.18226 - Raugel et al., 25 Aug 2025) in Discussion — Limitations

Generalization of brain–model alignment metrics to other architectures and objectives

Sponsor

Background

References

Related Problems