Papers
Topics
Authors
Recent
2000 character limit reached

Estimating Dimensionality of Neural Representations from Finite Samples

Published 30 Sep 2025 in stat.ML, cs.LG, and q-bio.NC | (2509.26560v1)

Abstract: The global dimensionality of a neural representation manifold provides rich insight into the computational process underlying both artificial and biological neural networks. However, all existing measures of global dimensionality are sensitive to the number of samples, i.e., the number of rows and columns of the sample matrix. We show that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and propose a bias-corrected estimator that is more accurate with finite samples and with noise. On synthetic data examples, we demonstrate that our estimator can recover the true known dimensionality. We apply our estimator to neural brain recordings, including calcium imaging, electrophysiological recordings, and fMRI data, and to the neural activations in a LLM and show our estimator is invariant to the sample size. Finally, our estimators can additionally be used to measure the local dimensionalities of curved neural manifolds by weighting the finite samples appropriately.

Summary

  • The paper introduces a bias-corrected estimator that overcomes sampling and noise biases in measuring neural dimensionality.
  • The methodology employs disjoint index summations and advanced einsum operations to deliver unbiased estimates across various neural datasets.
  • Experiments on synthetic and real data, including mouse V1 recordings and large language models, demonstrate the estimator’s robustness and broad applicability.

Estimating Dimensionality of Neural Representations from Finite Samples

Introduction

The intricacies of neural manifolds have profound implications for understanding both biological and artificial neural networks. This comprehensive work explores the persistent challenge of accurately determining the dimensionality of neural representations, specifically addressing biases introduced by finite sample sizes and measurement noise. Traditional metrics like the participation ratio (PR) of eigenvalues, while common, suffer from biases exacerbated by limited sample sizes. This investigation elaborates on a bias-corrected estimator that maintains accuracy across various sample sizes and noise levels, thus promising significant advancements in fields such as neuroscience and machine learning.

Methodology

The core contribution of the paper is a robust estimation-theoretic approach to refine the global dimensionality metric known as the PR. This methodology is universally applicable to several contexts, ranging from synthetic data scenarios to actual neural recordings (e.g., calcium imaging, electrophysiological recordings, and fMRI data). By circumventing biases present in traditional methods, the proposed estimator can achieve unbiased results, even in noisy and sample-limited conditions.

Implementation of the estimator largely hinges on the notion of leveraging disjoint index summations to mitigate bias. The calculations, requiring sophisticated einsum operations for efficiency, allow for practical application in real-world datasets. This approach enables the estimation of dimensions with high fidelity without necessitating assumptions about underlying data distributions. Figure 1

Figure 1: Different dimensionality estimates of the linear model with d=50d=50 and noise variance σϵ2=0.2\sigma^2_\epsilon=0.2.

Evaluation

The efficacy of the bias-corrected estimator was thoroughly validated through experiments on both synthetic datasets and various brain data, demonstrating consistency and resilience against sample size alterations. Performance metrics reveal a significant reduction in bias compared to naive methods, particularly when examining diminishing returns in dimensionality estimates as sample sizes increase. For instance, experiments with mouse V1 and other neural data underscore the estimator's robustness across different recording modalities. Figure 2

Figure 2: Dimensionality estimates on four different neural recording datasets for varying number of stimuli PP, and neural activation units QQ, by subsampling from the full dataset.

Moreover, when applied to artificial neural network datasets, specifically LLMs, the estimator successfully navigated input limitations, revealing more nuanced insights into task dimensionality across hidden layers. This capability indicates its potential application in AI safety and interpretability initiatives. Figure 3

Figure 3: Estimating the task dimensionality of LLM features for different languages.

Extensions and Implications

Beyond estimating global dimensionality, the paper extends its methodology to local dimensionality assessments, retaining robustness against noise perturbations. By weighting samples according to their proximity in feature space, practitioners can discern local manifold complexities, offering deeper insights into neural computations and the encoding schemes of neural networks.

In practical terms, this research lays the groundwork for enhancing brain-computer interface (BCI) decoders, understanding neural computation in visual and other sensory cortices, and tuning neural networks' architecture and training regimes. The dimensionality estimation framework can directly inform the design and interpretability of complex machine learning models, making it an indispensable tool for future AI developments.

Conclusion

This paper presents a pivotal advancement in dimensionality estimation by addressing the notorious biases associated with finite samples and noise. Through theoretical rigor and experimental validation, the proposed estimator stands as a powerful tool for understanding the underlying structures of neural manifolds, both biological and artificial. This work not only provides immediate methodological benefits but also sets a precedent for future explorations in high-dimensional statistics within neural frameworks.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.