State-Expressiveness Threshold Hypothesis
- The State-Expressiveness Threshold Hypothesis is a framework that defines a critical parameter beyond which a system’s expressive capacity sharply saturates.
- It is validated in diverse models such as polynomial neural networks, state space models, and SCOP cognitive frameworks, highlighting clear transitions in representational power.
- Understanding these thresholds aids in model selection and the design of hybrid architectures by pinpointing when additional complexity yields no further expressiveness.
The State-Expressiveness Threshold Hypothesis delineates discrete transitions in the expressive power of mathematical and computational systems—such as neural models, temporal logic recognizers, and concept representations—governed by tunable architectural, algorithmic, or formal thresholds. The hypothesis posits that a single control parameter (e.g., network activation degree, arithmetic precision, typicality threshold) induces a sharp increase—or “cliff”—in the range of functions, languages, or conceptual states that the system can represent as this parameter crosses a critical value. At or above this threshold, the system’s expressive set saturates, achieving its theoretical maximum, while below it, significant classes of behaviors remain inexpressible. Recent research across neural network architectures, state space models, and cognitive formalisms has both formalized and empirically validated such thresholds, clarifying the role of architectural constraints in governing model capabilities.
1. Definition and Formalization of Thresholds
The State-Expressiveness Threshold typically refers to a minimal setting of an architectural or algorithmic parameter beyond which increasing the parameter yields no further increase in the system’s expressive dimension or capability.
- Polynomial Neural Networks: For a feed-forward architecture and pure-power activation of degree , the activation (state-expressiveness) threshold $\ActThr(\mathbf d)$ is the smallest such that for all $r' > \ActThr(\mathbf d)$, the dimension equals the expected (maximal) dimension of the network’s neurovariety. This threshold marks the transition where the representable polynomial family is maximally expressive (Finkel et al., 2024).
- State Space Models (SSMs): For linear, finite-precision SSMs, there is a critical hidden-state width so that for , the model can recognize all star-free regular languages, while for , only languages with lower combinatorial complexity are within reach (Sarrof et al., 2024, Alsmann et al., 27 Jan 2026).
- SCOP Models in Concept Theory: The typicality threshold 0 is introduced to split the states (exemplars) of a concept under a context into “expressible” and “inexpressible” based on whether their transition probability 1 exceeds 2. This threshold modulates which conceptual states are considered activated in given contexts, directly affecting the perceived relevance and robustness of those contexts (Veloz et al., 2013).
2. Instances Across Modeling Frameworks
Threshold phenomena arise in diverse domains:
- Polynomial Networks: The expressiveness increases with the activation degree up to a critical 3, after which higher-degree activations introduce no new function dimensions. In equi-width architectures (all layers have the same width 4), 5, meaning quadratic or higher activations already saturate expressiveness (Finkel et al., 2024).
- State Space Models: With element-wise nonnegative gating, SSMs are limited to aperiodic (star-free) behaviors; only by increasing width (state dimension) sufficiently beyond the threshold or by introducing negative or complex-phase gates can the SSM represent non–star-free regular languages, such as those requiring modular counting (e.g., parity) (Sarrof et al., 2024).
- Arithmetic Precision in SSMs: Allowing only fixed-width (constant-bit) arithmetic confines SSMs to star-free/AC6-type languages. Increasing to 7-bit precision per input of length 8 enables full counting, permitting recognition of non-regular languages like 9 (Alsmann et al., 27 Jan 2026).
- SCOP and Cognitive Models: Varying the expressibility threshold $\ActThr(\mathbf d)$0 in conceptual spaces governs the transition between analytic (low $\ActThr(\mathbf d)$1) and associative (high $\ActThr(\mathbf d)$2) thought, i.e., from context-stable to context-divergent patterns of conceptual activation (Veloz et al., 2013).
3. Key Theorems, Results, and Proofs
- Existence and Bounds (Polynomial Networks): For architectures with no width-one bottlenecks, Finkel–Rodriguez–Wu–Yahl prove that $\ActThr(\mathbf d)$3, and for $\ActThr(\mathbf d)$4, $\ActThr(\mathbf d)$5 equals the maximal expected dimension (Finkel et al., 2024).
- Star-Free Languages in SSMs: Let $\ActThr(\mathbf d)$6 be the class of all star-free languages. There exists $\ActThr(\mathbf d)$7, dependent logarithmically on the size of the alphabet and the complexity of the language's syntactic monoid, such that finite-width SSMs of width $\ActThr(\mathbf d)$8 can recognize any $\ActThr(\mathbf d)$9 (Sarrof et al., 2024).
- Temporal Logic Expressiveness: Let SSM_fixed denote SSMs with constant-bit arithmetic and SSM_log those with 0-bits. The fixed-precision class coincides with pure-past LTL (1) or its modular extensions, while 2-bit precision enables extensions with global counting, and thereby non-context-free language recognition (Alsmann et al., 27 Jan 2026).
4. Mechanisms Creating Thresholds
The emergence of expressiveness thresholds is rooted in architectural or formal bottlenecks:
| Parameter | Below Threshold | At/Above Threshold |
|---|---|---|
| Activation Degree in Polynomials | Neurovariety dimension increases with 3 | Dimension saturates; no new expressivity |
| SSM Hidden Width | Only simple (star-free) sequences represented | All star-free languages realized |
| Arithmetic Precision (SSMs) | AC4, no counting, regular languages | Global counting, non-regular functions |
| Typicality Threshold (5 in SCOP) | Standard, analytic activation | Remapping to novel, associative patterns |
- Network Width and Depth: In SSMs, the minimal width required to simulate the Krohn–Rhodes decomposition of a star-free language determines the threshold.
- Activation Nonlinearities: High-degree activations in polynomial nets uncouple polynomials, making the function family maximally independent beyond a critical degree.
- Arithmetic Saturation: Fixed-width saturates early, preventing counters from exceeding certain values; logarithmic-width enables correct accumulation of counts for unbounded computation.
- Cognitive Parameterization: Raising the threshold 6 removes lower-typicality states from consideration, shifting the balance of which contexts and states are dominant.
5. Comparative Architectural Analysis
Research establishes parallels between state-expressiveness thresholds in SSMs, RNNs, transformers, and concept representation models:
- Transformers and SSMs: Unique hard-attention transformers without positional encodings and diagonal SSMs with fixed precision both match 7 expressiveness (star-free only). Allowing counting logic or positional encodings in transformers parallels the addition of logarithmic arithmetic precision or modular states in SSMs (Sarrof et al., 2024, Alsmann et al., 27 Jan 2026).
- RNNs: Traditional RNNs can, in principle, recognize all regular languages even at width 1, due to their unconstrained (signed) recurrences. There is no analogous finite expressiveness threshold (Sarrof et al., 2024).
- SCOP models and cognitive continuum: The parameter 8 traces a continuum between analytic (convergent) and associative (divergent) thought, describing a qualitative threshold in cognitive flexibility and context flexibility (Veloz et al., 2013).
6. Empirical and Theoretical Implications
The identification of state-expressiveness thresholds has multiple practical and theoretical ramifications:
- Model Selection: Knowledge of expressiveness plateaus enables informed architectural decisions; unnecessary increases in model complexity past the threshold yield no functional gain (Finkel et al., 2024).
- Design of SSMs: For applications needing modular counting or cyclic group behavior, SSM architectures must depart from the conventional nonnegative gating or limited arithmetic, allowing for signed or complex-phase parameters (Sarrof et al., 2024).
- Hybrid Architectures: Combining SSMs (star-free) with attention heads or complex-gated blocks (modular/counting) implements richer classes of sequence models, as suggested by hybrid designs (e.g., Jamba) (Sarrof et al., 2024).
- Cognitive Modeling: Varying the typicality threshold in SCOP models realizes shifts in creative cognition, modeling how divergent thinking can emerge from parameter sweeps over 9 (Veloz et al., 2013).
7. Limitations and Open Problems
Significant limitations and open areas arise from threshold phenomena:
- Learnability vs. Expressiveness: Even when architectures surpass the expressiveness threshold in theory, empirical training may still fail to realize complex functions, especially those requiring precise modular or group-theoretic computations.
- Robust SSM Upgrades: Realizing non-star-free tasks in SSMs requires architectural changes (negative/complex gates, increased or dynamic precision) that may introduce training or stability challenges.
- Threshold Sharpness: While the existence of thresholds is theoretically clear, the practical value of the threshold (precisely how much width, degree, or precision is needed) can vary depending on subtleties of task and input distribution.
- Extensions Beyond Regularity: The implications of thresholding in higher-order models—capable of context-free or context-sensitive behaviors—remain incompletely mapped.
References
- "Activation degree thresholds and expressiveness of polynomial neural networks" (Finkel et al., 2024)
- "The Expressive Capacity of State Space Models: A Formal Language Perspective" (Sarrof et al., 2024)
- "On the Expressiveness of State Space Models via Temporal Logics" (Alsmann et al., 27 Jan 2026)
- "Toward a Formal Model of the Shifting Relationship between Concepts and Contexts during Associative Thought" (Veloz et al., 2013)