Neuronpedia: Catalog & Analysis of Neurons

Updated 11 December 2025

Neuronpedia is a platform for systematic cataloging and interpretability of both biological and artificial neurons using scalable, data-driven methods.
It employs high-throughput imaging, geometric and topological encoders, and sparse autoencoders to extract detailed morphological, connectomic, and feature representations with classification accuracies above 90%.
The platform provides interactive querying, mechanistic interventions, and multi-modal data fusion, enabling actionable insights for neuroscience and deep learning research.

Neuronpedia is a concept and platform for the systematic, data-driven cataloging, analysis, and interpretability of neurons—defined here both as biological neural units and as artificial components within deep neural networks. Its function is to provide scalable representations, searchable indices, and mechanistically grounded insight into the structure and function of neurons in natural and artificial systems. Neuronpedia leverages high-throughput imaging, geometric and topological encoders, sparse autoencoders, and structured APIs to enable researchers to probe, visualize, and manipulate neuronal data and representations across modalities and contexts.

1. Biological Neurons: Quantitative and Structural Foundations

The fundamental unit of the brain is the neuron, an electrically excitable cell specialized for the processing and transmission of information by electrochemical signals. In the adult human brain, there are approximately $1 \times 10^{11}$ neurons, each potentially forming up to $10^4$ synaptic connections, with total synapse numbers estimated between $10^{14}$ and $10^{15}$ (Zhang, 2019). The canonical vertebrate neuron has five principal components:

Dendrites: Receivers of synaptic inputs, characterized by extensive branching and dendritic spines.
Soma (Cell Body): Contains the nucleus and critical organelles; the axon hillock at the soma–axon junction is the spike initiation zone.
Axon: Conducts action potentials to synaptic terminals and may branch into multiple terminals.
Myelin Sheath: Increases conduction velocity via saltatory conduction, with myelination provided by oligodendrocytes in the CNS and Schwann cells in the PNS.
Nodes of Ranvier: Gaps in the myelin sheath concentrating voltage-gated Na⁺ and K⁺ channels for action potential regeneration.

The diversity in morphology and synaptic connectivity underlies the brain’s computational and adaptive power. Synapses are either chemical—comprised of presynaptic vesicle fusion, neurotransmitter diffusion, and postsynaptic receptor binding—or electrical, mediated by gap junctions enabling bidirectional ionic current flow (Zhang, 2019).

2. Neuronal Skeletons, Connectomics, and Topological Embedding

Morphological and connectomic data at whole-brain scale are central to contemporary neuronal classification and circuit analysis. The NeuNet framework, implemented for Neuronpedia, fuses high-resolution neuronal skeletons (3D point clouds) and brain-wide synaptic graphs (Liao et al., 2023). The Skeleton Encoder extracts global morphology embeddings from unordered point sets using permutation-invariant blocks: farthest-point sampling, radius-based grouping, shared Conv1D layers, and symmetric pooling.

Each neuron is thus represented by a vector $\mathbf x_s \in \mathbb{R}^d$ encoding its geometry. In parallel, the Connectome Encoder constructs a weighted graph where nodes represent neurons and edge weights denote synapse counts. A stack of graph convolutional layers, with normalization and identity mapping, yields topology embeddings $\mathbf x_c \in \mathbb{R}^d$ .

Final representations consist of concatenated morphological and connectomic vectors, forming a searchable, retrievable space in Neuronpedia. Empirical classification on Drosophila and human datasets yields accuracies of 91.69% and 93.63%, respectively, with ablations confirming substantial contributions of both skeleton and connectome encoders (Liao et al., 2023).

3. Artificial Neurons and Sparse Semantic Features

Neuronpedia extends to artificial neural networks by cataloging and interpreting internal features using sparse autoencoders (SAEs). SAEs are trained on hidden-state activations from LLMs to yield overcomplete, sparse, and interpretable representations. Formally, given activation vectors $x \in \mathbb{R}^d$ , an SAE learns encoder $E: \mathbb{R}^d \to \mathbb{R}^k$ and decoder $D: \mathbb{R}^k \to \mathbb{R}^d$ by minimizing

$L = \sum_{n=1}^{N} \left[ \| x^{(n)} - \hat{x}^{(n)} \|_2^2 + \lambda \| z^{(n)} \|_1 \right], \quad z^{(n)} = \mathrm{ReLU}(W_e x^{(n)} + b_e)$

where $\lambda$ controls the sparsity of the latent code $z$ (Simbeck et al., 22 Sep 2025). Each learned basis vector is then a "feature neuron" with context-dependent activation.

Neuronpedia’s API enables probabilistic and mechanistic querying of these features: given a prompt and selected SAE, the top-k activating latent features, their activation magnitudes, and representative example contexts are retrieved, facilitating direct semantic and statistical analysis of model internals (Simbeck et al., 22 Sep 2025).

4. Mechanistic Interpretability and Compositional Analysis

The compositional architecture of knowledge in LLMs is systematically probed by analyzing SAE feature coactivation and performing controlled interventions. Semantic modules are defined as weakly connected components in coactivation graphs constructed from correlated activation of sparse features across adjacent layers (with Pearson correlation exceeding a threshold, e.g., $\tau_{corr} = 0.9$ ). Modules exhibit identity as "country" or "relation" features, with layer localization and causal impact measured by the Kullback–Leibler divergence between next-token distributions under feature ablation/amplification (Deng et al., 22 Jun 2025).

Key findings include:

Country modules often involve features in early layers, relation modules in deeper layers.
Amplifying or ablating the activation of a semantic module steers model output as predicted, with steering success rates exceeding 90% across test prompts.
Counterfactual compositions (e.g., "Mexico-capital" + "France-currency") yield output reflecting the intervention.
Highlighted documentation for each module, including feature indices, activation densities, and contextual activation patterns, yields modular Neuronpedia entries (Deng et al., 22 Jun 2025).

5. Data Infrastructure and Interactive Tools

Neuronpedia integrates data infrastructure for both biological and artificial neuron representations. The platform provides:

Indexed Storage: Morphology and topology vectors for each neuron are stored in vector indices (e.g., Faiss), linked to metadata such as skeleton files, brain regions, and neuron classes (Liao et al., 2023).
Interactive Query and Visualization: Enables k-NN similarity search, embedding visualization (e.g., t-SNE plots), 3D rendering (via Vaa3D or Neuroglancer), and real-time inspection of feature activations and attention heatmaps (Liao et al., 2023).
API Access for LLMs: Allows programmatic probing of SAE feature activations, semantic overlap metrics, and retrieval of example contexts (Simbeck et al., 22 Sep 2025).
Live Experimentation: Via the Gemma Scope demo, users can select model, layer, SAE width, and sparsity, submit prompts, visualize per-token sparsity and reconstruction error, and interactively splice features into or out of model computation (Lieberum et al., 2024).

The infrastructure supports both manual exploration and automated routines for mechanistic interpretability, debugging, and alignment research.

6. Signal Processing Perspective and Theoretical Modeling

At the computational level, a neuron may be modeled as performing online sparse rank-1 matrix factorization of its input stream, unifying leaky integration, nonlinear thresholding, and Hebbian plasticity within a common optimization framework (Hu et al., 2014). For presynaptic input $x_t \in \mathbb{R}^M$ , the neuron computes:

Leaky integration: $\tilde{x}_t = \beta \tilde{x}_{t-1} + (1-\beta)x_t$ , with synaptic filtering parameter $\beta$ .
Activity update: $y_t = \mathrm{ST}(w_{t-1}^{\mathsf T} \tilde{x}_t, \lambda_y)/\|w_{t-1}\|_2^2$ using soft thresholding ST.
Correlation accumulator: $u_t = u_{t-1} + (y_t/Y_t)(\tilde{x}_t - u_{t-1} y_t)$ .
Synaptic weight update: $w_t = \mathrm{ST}(u_t, \lambda_w)$ componentwise.

This framework predicts heavy-tailed, sparse distributions of activity and weights, the existence of silent synapses, adaptation of plasticity rate with cumulative activity, and Gabor-like receptive fields for natural stimuli, providing both a computational account and design principles for neuromorphic engineering (Hu et al., 2014).

7. Extensions, Limitations, and Future Directions

Neuronpedia’s integrated pipelines support extensions across multiple axes:

Multi-modal fusion: Augment node records with molecular (e.g., single-cell RNA-seq), electrophysiological, or spatial transcriptomics embeddings (Liao et al., 2023).
Cross-modal attention: Refine simple vector concatenation with adaptive attention mechanisms for fusion of morphology and topology.
Broader feature axes: Expand SAE analysis to additional identity variables (gender, race), more languages, and functional circuit motifs (Simbeck et al., 22 Sep 2025).
Dynamic and compositional interventions: Systematically probe polysemous or hierarchical modes via multi-layer SAE architectures and dynamic prompting strategies (Simbeck et al., 22 Sep 2025, Lieberum et al., 2024).

Known limitations include sensitivity to the quality and completeness of training data, sparsity regularizer selection, reliance on single hidden-layer encodings (potentially missing more complex feature modes), and unaddressed multilingual or spike-timing specific contexts. The rank-1 assumption in the signal processing model is a simplification; extension to broader matrix factorization or multilayered circuits remains a subject of active research (Hu et al., 2014).

Neuronpedia, by synthesizing morphological, topological, and mechanistic perspectives, constitutes an evolving resource for the systematic representation and interrogation of neuronal structure and function across scales, modalities, and architectures.