Papers
Topics
Authors
Recent
2000 character limit reached

Token Frequency Neurons in Transformers

Updated 25 November 2025
  • Token frequency neurons are specialized hidden units in the final MLP layer that disproportionately influence rare token predictions.
  • They are identified using intervention analysis and directional probes, revealing a sparse subnetwork with heavy-tailed weight statistics.
  • These neurons calibrate model outputs and enhance robustness by adjusting rare-token distributions via coordinated low-dimensional activations.

Token frequency neurons, also known as rare-token neurons, are specialized hidden units in the final MLP layer of transformer-based LLMs that exert a disproportionate causal effect on the prediction of low-frequency—or "rare"—tokens. These neurons have been systematically identified and analyzed in LLMs, revealing a distinctive, emergent organization and a mechanistic role in regulating rare-token outputs and fallback distributions. The paper of token frequency neurons elucidates core aspects of sparse specialization, activation geometry, and heavy-tailed synaptic statistics within neural architectures (Liu et al., 19 May 2025, Liu et al., 25 Sep 2025, Stolfo et al., 24 Jun 2024).

1. Definition and Mathematical Characterization

A token frequency neuron is defined as an individual hidden unit in a transformer's final MLP layer that, when intervened upon, induces a large expected change in the model’s output probability or loss for rare target tokens. A token's frequency f(r)f(r) is quantified as its unigram count divided by the total token count in the training corpus; rare tokens are those in the lower percentiles (e.g., below the median) of f(r)f(r) (Liu et al., 19 May 2025, Stolfo et al., 24 Jun 2024).

Mathematically, let xx denote the residual stream before the last MLP, with neuron ii’s activation nin_i. The mean-ablation intervention sets nin_i to its dataset mean nˉi\bar n_i, updating the downstream state:

x~(i)=x+(nˉini)wout(i)\tilde x^{(i)} = x + (\bar n_i - n_i) w^{(i)}_{\rm out}

where wout(i)w^{(i)}_{\rm out} is the ii-th output weight. The causal effect on the token-level loss is measured by

Δloss(i)=ExDL(LM(x),y)L(LM(x~(i)),y)\Delta \text{loss}(i) = \mathbb{E}_{x \sim D} \left| \mathcal{L}(\mathrm{LM}(x), y) - \mathcal{L}(\mathrm{LM}(\tilde x^{(i)}), y) \right|

while per-token influence

Ii,r=Ex:y=r[logp(rx)+logp(rx~(i))]I_{i, r} = \mathbb{E}_{x: y=r} [-\log p(r | x) + \log p(r | \tilde x^{(i)})]

quantifies neuron ii’s direct impact on rare token rr (Liu et al., 19 May 2025, Liu et al., 25 Sep 2025).

In an alternative formulation, token frequency neurons are defined as those whose MLP output weights, passed through the unembedding matrix WUW_U, are approximately parallel to the "log-frequency direction" vfreqv_{\text{freq}} given by

vfreq,j=logpfreq,jmeank[logpfreq,k]v_{\text{freq},j} = \log p_{\text{freq},j} - \text{mean}_k[\log p_{\text{freq},k}]

Thus, the neuron’s contribution to logit ljl_j is

Δljαnivfreq,j\Delta l_j \approx \alpha n_i v_{\text{freq},j}

with α\alpha a proportionality constant. Such neurons adjust the model’s pre-softmax logits toward or away from the empirical unigram distribution, effectively serving as calibration mechanisms (Stolfo et al., 24 Jun 2024).

2. Influence Measurement and Discovery Methodology

The influence of a candidate neuron on rare tokens is quantified by comparing the model’s output distribution before and after mean ablation, typically using either per-token logit differences, changes in output probabilities, or token-level cross-entropy change:

In(t)=ExD[zt(x)zt(x;n ablated)]I_n(t) = \mathbb{E}_{x \sim \mathcal{D}} \left[ z_t(x) - z_t(x; n~\text{ablated}) \right]

for token tt, or

In(t)=ExD[p(y=tx)p(y=tx;n ablated)]I_n(t) = \mathbb{E}_{x \sim \mathcal{D}} \left[ p(y = t | x) - p(y = t | x; n~\text{ablated}) \right]

(Liu et al., 25 Sep 2025).

Directional probes identify neurons whose output weights are highly aligned, in the unembedding space, with the corpus log-frequency vector vfreqv_{\text{freq}}, using cosine similarity:

cosθi=WUwout(i)vfreqWUwout(i) vfreq\cos \theta_i = \frac{W_U w^{(i)}_{\text{out}} \cdot v_{\text{freq}}}{\| W_U w^{(i)}_{\text{out}} \|~ \| v_{\text{freq}} \|}

(Stolfo et al., 24 Jun 2024). Combined with ablation-based mediation analysis—partitioning neuron effect into components along vfreqv_{\text{freq}} (“direct effect”) and orthogonal subspaces (“total effect”)—this yields both statistical and causal signatures for token frequency neurons.

Experimentally, LLMs such as Pythia-410M and GPT-2 exhibit small subsets of final-layer MLP units (typically 1–2% of total) whose ablation disproportionately shifts rare-token output probabilities and KL-divergence with respect to the unigram distribution (Liu et al., 19 May 2025, Stolfo et al., 24 Jun 2024).

3. Three-Regime Specialization in Neuron Influence

When neurons are ranked by decreasing rare-token influence and plotted in log-log coordinates, three persistent regimes emerge during training (Liu et al., 19 May 2025, Liu et al., 25 Sep 2025):

  • Plateau regime: The highest-influence (top \sim1–2%) neurons exhibit nearly flat influence, forming a plateau above the fitted power law. These are the rare-token neurons.
  • Power-law decay: The next \sim10% of neurons descend according to logΔloss(r)κlogr+β\log |\Delta \text{loss}(r)| \approx -\kappa \log r + \beta, where κ\kappa is the decay exponent.
  • Rapid-decay tail: The remaining vast majority (\sim87%) of neurons exhibit influence that drops off faster than the power law.

These regimes are detected using segmented linear fits on log-log plots and change-point methods for local slope estimation. The plateau’s upward deviation δ(r)\delta(r) quantifies its separation from the power-law bulk and grows throughout training. Notably, common-token prediction lacks this structured hierarchy, following a simple power-law decay (Liu et al., 25 Sep 2025).

4. Subnetwork Structure and Activation Geometry

Despite being distributed across the final MLP layer, rare-token neurons form a coordinated functional subnetwork:

  • Low Effective Dimensionality: Principal component analysis (PCA) of their activation matrices across diverse contexts reveals that rare-token neurons inhabit a low-dimensional subspace, as quantified by participation ratio DeffD_{\text{eff}}:

Deff=(i=1Kλi)2i=1Kλi2D_{\text{eff}} = \frac{\left( \sum_{i=1}^{K} \lambda_i \right)^2 }{ \sum_{i=1}^{K} \lambda_i^2 }

or by the rank at which cumulative eigenvalue mass passes a threshold (e.g., 95%).

  • Co-activation and Avoidance: Mean pairwise cosine similarity between rare-token neurons’ activations is high (≈0.41), while it is near zero (≈0.03) between rare-token and random neurons. Graph connectivity (edges for cosθij>0.5\cos \theta_{ij}>0.5) exposes a densely-connected rare-token subnetwork module, suggesting tightly coordinated specialization (Liu et al., 19 May 2025).

Importantly, community detection and spatial analysis show that rare-token neurons are not physically clustered; instead, they are spatially dispersed and do not form discrete layer-wise modules (Liu et al., 25 Sep 2025).

5. Heavy-Tailed Weight Statistics and Self-Organized Criticality

Under Heavy-Tailed Self-Regularization (HT-SR) theory, rare-token neuron groups exhibit weight-correlation matrices with heavy-tailed eigenvalue spectra. The tail index αHill\alpha_{\text{Hill}} is computed by the Hill estimator:

αHill=[1ki=1kln(λiλk)]1\alpha_{\mathrm{Hill}} = \left[ \frac{1}{k} \sum_{i=1}^k \ln \left( \frac{\lambda_i}{\lambda_k} \right) \right]^{-1}

with kk chosen to capture the power-law regime. Plateaus of rare-token neurons systematically have lower αHill\alpha_{\text{Hill}} (<2) compared to random neurons, compatible with self-organized criticality and the formation of sparse, specialist subnetworks in the absence of explicit architectural modularity (Liu et al., 19 May 2025, Liu et al., 25 Sep 2025).

6. Training Dynamics and Functional Emergence

Longitudinal analyses during pretraining reveal that the specialization of token frequency neurons is not present at initialization. Instead:

  • Early in training, influence curves lack a plateau, descending smoothly.
  • Around the midpoint, a nascent plateau emerges among the top few percent of neurons, which grows in both height and separation from the bulk as training advances.
  • Simultaneously, the Hill exponent for plateau neurons declines (from ∼3 to <2), and their effective activation dimension shrinks, indicating the crystallization of functional coordination and heavy-tailed specialization (Liu et al., 19 May 2025, Liu et al., 25 Sep 2025).

This emergence aligns with distributed specialization rather than mixture-of-experts or modular routing: rare tokens access the coordinated subnetwork via standard attention circuits, as evidenced by similar attention patterns and low modularity QQ compared to random baselines.

7. Functional Significance, Calibration, and Practical Implications

Token frequency (rare-token) neurons implement a mechanistically interpretable fallback system for low-frequency tokens:

  • Calibration and Confidence Regulation: These neurons shift the model’s output distribution toward the unigram (“default”) distribution, especially in high-uncertainty or low-signal contexts, mediating confidence hedging and error avoidance (Stolfo et al., 24 Jun 2024).
  • Editing and Robustness: Selectively intervening on the plateau subnetwork supports model editing for rare-token behaviors and enhances reliability—pruning strategies can remove the rapid-decay neurons with little loss of rare-token fidelity, while plateau neurons should be preserved (Liu et al., 25 Sep 2025).
  • Avoidance of Dedicated Modules: Token frequency neurons are universal, distributed, and do not rely on mixture-of-experts-style architectures or specialized routing pathways.

The functional behavior of token frequency neurons unifies theories of complementary learning systems and sparse coding, with rare-token handling localized to a compact, co-activated, heavy-tailed subnetwork emergent from standard training (Liu et al., 19 May 2025, Liu et al., 25 Sep 2025, Stolfo et al., 24 Jun 2024). These findings have direct implications for interpretability, model calibration, and the optimization of computational resources in large-scale neural LLMs.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Token Frequency Neurons.