Semantic Capacity Asymmetry Hypothesis

Updated 2 February 2026

Semantic Capacity Asymmetry Hypothesis is a framework defining how bounded memory and fixed model width create sharp asymmetries in semantic representations across different systems.
It demonstrates that projective parsing is strictly less powerful than nonprojective methods under identical memory constraints, while neural embeddings allocate capacity in an all-or-nothing fashion.
Additionally, it reveals that evaluative tasks require significantly less semantic capacity than generative ones, enabling smaller models to efficiently match larger models in quality judgment.

The Semantic Capacity Asymmetry Hypothesis formalizes the observation that, under realistic constraints—such as bounded memory, fixed model width, or compression—distinct mechanisms for semantic representation exhibit sharply asymmetric expressive power. Certain systems or tasks demand more “semantic capacity” than others, and this asymmetry manifests both in explicit parsing algorithms and in neural network representations. Specifically, nonprojective semantic composition, compressed or block-structured neural embeddings, and evaluation-vs-generation regimes all display provable or observed gaps in capacity, which have concrete implications for parsing, model design, interpretability, and model evaluation methodologies (Venant et al., 2019, Scherlis et al., 2022, Li et al., 30 Jan 2026).

1. Foundational Definitions and Formal Statement

The Semantic Capacity Asymmetry Hypothesis admits multiple precise formulations, as articulated in recent work across both formal and neural paradigms:

For composition mechanisms in semantic parsing, “capacity” refers to the class of meaning graphs (semantic representations) generable under a fixed memory or workspace constraint. Projective mechanisms—those enforcing noncrossing attachment of semantic edges—are strictly weaker than nonprojective ones under identical memory budgets. Formally, for every memory bound $M$ , the set of meaning graphs constructible by projective mechanisms, $C_{\text{proj}}(M)$ , is strictly contained in that of nonprojective mechanisms, $C_{\text{nonproj}}(k)$ , even for constant $k=2$ (Venant et al., 2019).
In neural architectures, feature capacity quantifies the fractional embedding dimension consumed per feature. The hypothesis posits an asymmetric allocation: the most important features are monosemantically allocated entire dimensions, unimportant features are entirely ignored, and intermediate features are embedded in superposition, i.e., polysemantically entangled within shared dimensions (Scherlis et al., 2022).
In evaluation tasks for LLMs, generative semantic capacity (required to synthesize high-quality open-ended output) is empirically far greater than evaluative capacity (required to judge or score output quality), enabling small models with weak generative ability to match large models on evaluative classification when probed appropriately (Li et al., 30 Jan 2026).

2. Capacity Asymmetry in Bounded-Memory Semantic Parsing

Formally, Venant & Koller demonstrated the first rigorous separation of projective and nonprojective semantic composition mechanisms subject to bounded memory (Venant et al., 2019). Consider a parser mapping sentences to meaning graphs via local compositional operations:

Projective mechanisms—which restrict combination of spans such that semantic edges never cross when projected as arches above the sentence—can only construct meaning graphs whose crossing complexity is bounded by the working memory size $M$ .
Nonprojective mechanisms—which lift the noncrossing constraint—retain the ability to construct arbitrary crossing meaning graphs for all sentence lengths, using only $O(1)$ memory, via “zip-and-unzip” algorithms that keep at most two spans active while producing cross-serial attachments.
The principal technical result: for every fixed $M$ , there exists an infinite family of “cross-serial” graphs $G_n$ (with $n$ nodes and edges $i\to n-i+1$ for $i\leq n/2$ ) that cannot be built projectively with $M$ memory, yet can be built nonprojectively with constant memory.
This establishes that $\bigcup_{M}C_{\text{proj}}(M)\subsetneq\bigcup_{M}C_{\text{nonproj}}(M)$ , and formalizes the Semantic Capacity Asymmetry Hypothesis for compositional models.
The implication is a strict expressivity gap in practical parsing systems: projective algorithms with fixed workspace cannot cover classes of meaning representations corresponding to non-local or cross-serial natural language phenomena, unlike their nonprojective counterparts.

3. Semantic Capacity Allocation and Polysemanticity in Neural Embeddings

The hypothesis generalizes to neural representational geometry. An explicit metric, feature capacity $C_i$ , is defined for each embedded feature $i$ in an embedding matrix $W\in \mathbb{R}^{D\times N}$ by

$C_i = \frac{(w_i\cdot w_i)^2}{\sum_{j=1}^N (w_i\cdot w_j)^2},$

where $w_i$ is the embedding vector for feature $i$ (Scherlis et al., 2022). This expresses the fraction of an embedding dimension “devoted” to $i$ , adjusted for interference/superposition with other features.

Loss minimization under a total capacity budget $\sum_i C_i \leq D$ $\sum_{i} C_{i} \leq D$ yields a striking phase diagram:
- Features with large importance $v_i$ are monosemantically embedded (allocated a full dimension, $C_i=1$ ).
- Features with small $v_i$ are ignored ( $C_i=0$ ).
- Only features at intermediate $v_i$ values are polysemantically embedded, sharing dimensions in superposition ( $0<C_i<1$ ).
The allocation is asymmetric: most capacity is concentrated at the extremes, with a “polysemantic band” whose width depends on the kurtosis and sparsity of the inputs. For higher input kurtosis, polysemanticity is more prevalent.
The embedding geometry is block–semi-orthogonal: an efficient $W$ can be decomposed into orthogonal blocks, within which dimensions are shared among polysemantic features, while monosemantic features form isolated (orthogonal) blocks.
This formalizes the asymmetry: feature representation, interference, and capacity allocation are not uniform but concentrate semantically crucial information in an all-or-nothing regime, while lesser features are forced into a lossy superposition or pruned entirely.

4. Capacity Asymmetry in LLM Evaluation and the Representation-as-a-Judge Paradigm

Empirical investigations into LLM evaluation unveil a further instance of semantic capacity asymmetry (Li et al., 30 Jan 2026):

For a fixed downstream evaluation task $\mathcal{T}$ $T$ , define generative capacity $G(M)$ $G (M)$ as end-to-end generation performance, and evaluative capacity $E(M)$ $E (M)$ as performance on predicting expert (LLM) judgments from internal representations.
- It is observed that $G(M_{\text{small}})\ll G(M_{\text{large}})$ —small models perform poorly at generation.
- However, there exists a probe $f_k$ such that $E_k(M_{\text{small}}; f_k) \approx E_k(M_{\text{large}})$ —evaluative information is well encoded even in small models’ hidden states.
This motivates a new paradigm: Representation-as-a-Judge, typified by the INSPECTOR framework, where small models, via linear probing of hidden layers, act as high-fidelity, efficient evaluators of output quality without explicit generation or prompt-based evaluation.
The core hypothesis: evaluation (judgment) tasks require significantly less semantic capacity than open-ended generation and can be supported by internal representations accessible to lightweight probing, even in models dramatically less powerful than those used to generate high-quality outputs.
Practical significance includes orders-of-magnitude efficiency gains, increased interpretability by virtue of linear probes, and robustness to prompt design, in contrast to traditional “LLM-as-a-Judge” autoregressive evaluation.

5. Theoretical Foundations and Quantitative Formulation

The semantic capacity asymmetry across domains emerges from universal properties of constrained optimization:

Under fixed resource constraints—bounded parser memory, total embedding dimension $D$ , or parameter count—loss minimization allocates capacity asymmetrically based on feature/task importance and statistical properties of the input.
In neural models, the optimal allocation $C_i$ is governed by input kurtosis $k$ and the global Lagrange multiplier $\lambda$ enforcing the budget:

$C_i = \text{clip}_{[0,1]} \left( \frac{k-1}{k-3}\frac{v_i}{\lambda} - \frac{2}{k-3} \right),$

with monosemantic, polysemantic, and null regions sharply separated by $v_i$ (Scherlis et al., 2022).
In bounded-memory parsing, the combination of projectivity constraints and memory limitations yields combinatorial bottlenecks in the construction of crossing or nonlocal semantic graphs—removable by relaxing projectivity (Venant et al., 2019).
For LLM evaluation, empirical distributions of semantic information across layers and the ability of shallow probes to extract gold judgments reveal that evaluative signals are compressed and localized, available in models far below the generative threshold required for high-quality judgment output (Li et al., 30 Jan 2026).

6. Implications and Extensions Across Domains

The consequences of the Semantic Capacity Asymmetry Hypothesis are multi-faceted:

Grammar-based Parsers: Mildly context-sensitive or CFG-based parsers constrained to bounded projective memory cannot capture entire families of semantically critical phenomena (e.g., cross-serial dependencies) unless memory grows with input length.
Neural Representations: Polysemanticity—features sharing representational subspaces—is an artifact of hard capacity limits and skewed feature importance distributions. Architectural choices (width, block structure, activation function) and data statistics (kurtosis, sparsity) modulate the position and width of the “polysemantic band.”
LLM Evaluation: Evaluation can be reframed as a pattern-recognition task leveraging latent representations, enabling compact, interpretable, and highly efficient evaluation strategies via the Representation-as-a-Judge paradigm.
Limitations & Open Questions: For LLM evaluation, transferability of multiclass quality prediction is limited, and all current ground truth derives from a single large-model judge. Extending the asymmetry framework to summarization, dialogue, code evaluation, and other domains remains an active area for further study (Li et al., 30 Jan 2026).

7. Summary Table: Domains and Manifestations of Semantic Capacity Asymmetry

Domain	Resource Constraint	Manifestation of Asymmetry
Bounded-memory semantic parsing	Working memory $M$	Projective $\ll$ nonprojective construction
Neural network embeddings	Embedding dimension $D$	Monosemantic/polysemantic feature allocation
LLM judging/evaluation vs. generation	Model size, parameters	Evaluation $\,\ll\,$ generation capacity

The Semantic Capacity Asymmetry Hypothesis thus unifies a spectrum of phenomena—spanning formal parsing, neural representational geometry, and LLM evaluation—under the principle that resource constraints induce asymmetric, often sharp, separations in the attainable semantic expressivity, allocation, and empirical performance across mechanisms, features, and tasks (Venant et al., 2019, Scherlis et al., 2022, Li et al., 30 Jan 2026).