Neuron-Level Representation Learning

Updated 2 December 2025

Neuron-level representation learning is a field that decodes individual neuron functions to extract invariant, interpretable features from both biological data and artificial neural networks.
It leverages contrastive, self-supervised, and graph-based methodologies to map neural activity onto semantically meaningful representations, supporting applications like cell classification and model explainability.
Practical applications include robust spike sorting, targeted domain adaptation, and morphology-based clustering, with empirical results demonstrating superior interpretability and generalization.

Neuron-level representation learning encompasses a set of methodologies and theoretical principles dedicated to extracting, understanding, and leveraging the information encoded at the level of individual neurons—biological or artificial—in neural systems and neural network models. This research aims to assign, interpret, or refine vectorial or functional representations that capture invariant, generalizable, or semantically meaningful properties of single neurons, supporting tasks from cell classification and system identification to downstream mechanistic interpretability and domain adaptation.

1. Theoretical Foundations and Motivations

Neuron-level representation learning is grounded in both neuroscientific and machine learning traditions. From a neuroscientific perspective, individual neurons are not mere components of a distributed substrate but may act as specialized substrates for semantic or cognitive entities—capturing the “neuron doctrine” at both micro- and macro-levels (Zheng, 2021). Biophysical models posit competitive and homeostatic mechanisms (activity-dependent survival, synaptic plasticity, STDP, and inhibitory feedback) that drive the formation of “specialist” neurons dedicated to salient patterns or features. Mathematical formulations describe these processes in terms of constrained optimization over firing rates, winner-take-all competition, and synaptic efficacy governed by local activity, capturing both Hebbian and anti-Hebbian updates.

In artificial neural networks, neuron-level learning and probing emerged from evidence that single hidden units within deep architectures often encode interpretable or task-relevant features (such as part-of-speech, syntactic roles, or semantic categories in NLP models (Sajjad et al., 2021, Foote et al., 2023, Bai et al., 21 Oct 2024)), with linear or nonlinear compositional operations further increasing the expressive capacity of the model (Li et al., 2019). From an information-theoretic standpoint, canonical alignment relations (“Canonical Representation Hypothesis”) formalize how weights, activations, and gradients co-evolve under the balance of SGD noise and regularization (Ziyin et al., 3 Oct 2024).

2. Methodological Approaches

Neuron-level representation learning bifurcates into empirical, model-driven, and biologically-inspired methodologies:

Contrastive and Self-Supervised Learning Frameworks: Approaches such as VICReg (Variance-Invariance-Covariance Regularization) and InfoNCE pull together multiple observations belonging to the same neuron across varying conditions, enforcing time- and context-invariance in the learned representations (Wu et al., 6 Feb 2025, Arora et al., 1 Dec 2025). For dynamic or population-based data, permutation- and population-size-invariant summaries (e.g., center-surround statistics) enable embedding of neuron “identity” even amidst session-to-session variability (Mi et al., 2023).
Structured Neuron Probing and Dissection: Probing methods train classifiers (linear, MLP, or k-NN) on either individual neuron activations or sets thereof to map units to interpretable concepts. Selectivity, mutual information, and ablation studies are used to score or validate the specificity of representations (Sajjad et al., 2021).
Matrix and Graph Construction: For LLMs, pipelines such as Neuron to Graph (N2G) automate the extraction of minimal input contexts and saliency graphs, visualizing and quantitatively evaluating neuron triggers and functions at scale (Foote et al., 2023). For morphological and anatomical data, graph-based encodings represent multi-scale features of neuron shape, connectivity, and compartments (Jiang et al., 2022, Ha et al., 15 Oct 2024).
Local Plasticity and Homeostatic Control: Hybrid learning rules (e.g., Neuron Activity Aware Hebbian learning) dynamically switch between potentiation and depression based on measured neuron usage, redistributing representational burden in deep unsupervised encoders for e.g., 3D object recognition (Kang et al., 2023).
Sparse Coding and White-Box Models: Architectures such as CRATE enforce explicit sparse, low-dimensional codes within transformer layers, replacing post-hoc dictionary learning with built-in information-theoretic compression and mono-semanticity of neuron activations (Bai et al., 21 Oct 2024).

3. Biological and Artificial Contexts

Biological Neurons

Neuron-level representation learning in biological data aims to extract cell-type, anatomical, or physiological identity from population recordings, spike waveforms, or morphological data. Techniques such as NeuPRINT (Mi et al., 2023) and NeurPIR (Wu et al., 6 Feb 2025) achieve robust, time-invariant embeddings by modeling each neuron's activity in the context of its population and dynamics, supporting accurate cell-type and region classification and strong generalization to out-of-domain subjects or recordings. Morphological analysis leverages unsupervised graph embeddings of reconstructed 3D neuron shapes for neuron-type clustering and feature extraction (Jiang et al., 2022). Self-supervised representation learning via denoising autoencoders and contrastive learning on spike waveforms further enables robust, session-invariant identification of single units amidst noise and drift (Cao et al., 23 Jul 2025).

Artificial Neural Networks

In deep models for vision, NLP, and multimodal tasks, neuron-level representation learning spans explicit design of interpretable units, neuron-wise intervention for domain adaptation, and probing for concept alignment:

Interpretability: Analysis pipelines are used to catalog the triggers, selectivity, and compositional functions of individual neurons, often in transformer-based LLMs (Sajjad et al., 2021, Foote et al., 2023).
Domain Adaptation: Neuron-level interventions, such as feature shift based on domain-activating neurons, enable efficient, training-free domain adaptation at inference (Antverg et al., 2022).
Representation Formation Theory: Layerwise alignment and collapse phenomena (as captured by the Canonical Representation Hypothesis and Neural Collapse) provide unifying frameworks for understanding how compact, task-invariant representations emerge (Ziyin et al., 3 Oct 2024).

4. Applications and Empirical Results

Applications of neuron-level representation learning include:

Semantic and Cell-Type Classification: Embeddings extracted from population recordings or spike data support accurate, label-efficient classification of neuron type and anatomical region, exceeding prior state-of-the-art models and exhibiting strong zero-shot animal generalization (Arora et al., 1 Dec 2025, Wu et al., 6 Feb 2025, Mi et al., 2023).
Robust Spike Sorting: Self-supervised, contrastively learned spike representations enable unsupervised spike sorting that outperforms or matches the best available neuroscience pipelines (KiloSort4, MountainSort5), and maintain robustness to low SNR, drift, and electrode variability (Cao et al., 23 Jul 2025).
Model Control and Explainability: Interpretable neuron extraction allows for targeted manipulation, pruning, or correction of neural models to audit, control, or improve behavior, including gender bias removal and style transfer in NLP (Sajjad et al., 2021).
Morphology-Based Clustering: Graph-based embeddings of neuron shapes facilitate unsupervised clustering and type discovery in morphological datasets, enabling downstream statistical analysis and multiscale organization (Jiang et al., 2022).
Architectural Advancements: White-box architectures embedding sparse coding (e.g., CRATE) achieve intrinsic transparency and consistent mono-semantic neuron interpretability across depth and model scale (Bai et al., 21 Oct 2024).
Cross-Modal and Multiscale Integration: Some recent works aim to unify dynamic, anatomical, and transcriptomic data in the same embedding space for comprehensive neuron profiling (Wu et al., 6 Feb 2025, Ha et al., 15 Oct 2024).

5. Cross-Volume, Cross-Session, and OOD Generalization

A central goal is the extraction of representations that are invariant—or at least robust—to experimental or environmental perturbations:

Cross-Volume Regularizations: In connectomics, cross-volume voxel-level objectives regularize segmentation features to ensure global consistency, improving downstream skeleton tracing (Wang et al., 2021).
Time and Population Invariance: Representation learning strategies explicitly factor out variability due to behavioral covariates, session identity, and population size, yielding embeddings that scale with pretraining data and support zero-shot transfer to new animals or brain areas (Mi et al., 2023, Wu et al., 6 Feb 2025, Arora et al., 1 Dec 2025).
Drift and Noise Robustness: Joint denoising and contrastive objectives in spike analysis maintain class discrimination across electrode drift and session changes, aligning with the need for robust brain–machine interfaces (Cao et al., 23 Jul 2025).

6. Challenges, Limitations, and Future Directions

Despite significant advances, several open challenges persist:

Granularity: Current methods discriminate broad cell types or classes but struggle to resolve fine-grained subtypes, requiring richer data or more targeted augmentations (Wu et al., 6 Feb 2025).
Representation Collapse and Dead Neurons: Without careful regularization or homeostasis, neuron-level learning risks “collapse,” where few units dominate the representational space, losing local detail (Kang et al., 2023).
Standardization: The lack of unified benchmarks and gold standards for neuron interpretability and identity hinders systematic progress and comparability across studies (Sajjad et al., 2021).
Unsupervised Discovery and Causality: Moving beyond predefined concepts and correlational probes to fully unsupervised, causally validated neuron-concept associations remains a major goal.
Biologically Plausible Mechanisms in Deep Models: Translating local plasticity and competition principles from the brain to deep, large-scale artificial networks (and vice versa) is ongoing, with hybrid mechanisms showing promise (Kang et al., 2023, Ziyin et al., 3 Oct 2024).

A plausible implication is that unified frameworks maximizing cross-neuron mutual information, enforcing sparse coding, and leveraging scaling laws for data diversity will jointly yield interpretable, generalizable, and functionally meaningful neuron representations in both biological and artificial networks. This convergence of techniques from neuroscience, machine learning, and information theory defines the contemporary landscape and future trajectory of neuron-level representation learning.