Express Language Modeling
- Express Language Modeling is a framework that unifies formal expressivity, implicit signal evaluation, and efficient causal attention methods in language models.
- It rigorously characterizes Transformer capabilities using PTL equivalence and employs automated methods to gauge implicit communication and uncertainty calibration.
- Recent innovations, such as the Express meta-algorithm, achieve significant speedups in processing long-context sequences while preserving model quality.
Express language modeling encompasses a technically multifaceted set of advances that unify three distinct research themes: (1) the formal expressivity of neural LLMs, notably Transformers; (2) the design and quantification of a model’s capacity to express non-explicit, contextual, or probabilistic information in generated text; and (3) algorithmic and systems-level innovations to make these expressive capacities computationally efficient at scale. Recent developments under the rubric "Express Language Modeling" have provided principled, scalable solutions to the bottlenecks of causal attention, empirical methodologies for evaluating implicit and explicit communication in LLMs, and theoretically precise characterizations of what kinds of languages and reasoning models can and cannot generate.
1. Foundations: Expressivity in LLMs
The core of express language modeling originates with the formal study of the expressive power of LLMs, especially Transformer architectures. The expressivity of a LLM, in this context, is defined as the class of formal languages (sets of symbol sequences) recognizable or generable by a given model family under architectural and computational constraints.
"Characterizing the Expressivity of Transformer LLMs" (Li et al., 29 May 2025) provides an exact characterization for fixed-precision Transformers with strict causal masking and soft attention. This model class is provably equivalent to the logic PTL (past-only linear temporal logic), specifically, the class of languages known as left-deterministic polynomials—recognizable by partially ordered DFAs and associated with J-trivial syntactic monoids. Under this setting, every (fixed-precision, strictly-masked) Transformer recognizer LM can implement, and only implement, the set of PTL languages:
This boundary is strict: empirical experiments confirm that such models generalize perfectly to arbitrarily long sequences for languages in PTL, but fail for regular languages that are not left-deterministic polynomials. The theoretical instance is part of a broader hierarchy:
This work, corroborated by comprehensive surveys (Strobl et al., 2023), anchors the phrase "express language modeling" in the formal model theory of language generation.
2. Expressivity Beyond Syntax: Implicit and Explicit Information
Expressivity in practical LLMs entails not just which languages they can technically recognize or generate, but their ability to produce and interpret implicit (contextual/pragmatic) information in output text. "ExpressivityArena: Can LLMs Express Information Implicitly?" (Tint et al., 2024) formalizes this dimension, introducing a Python framework for evaluating how well LLMs can communicate domain-specific signals implicitly—"showing, not telling." Given a defined set of signals within a domain (e.g., style, emotion), an LLM generates texts intended to convey a signal without explicit markers. An automated or human grader attempts to recover the intended signal from generated text, and the expressivity rate is defined as:
Experiments conducted over domains such as poetry, code (skill level, paradigm), and conversation reveal a sharp dependence on the domain and signal type—for example, emotional expressivity is substantially higher in creative domains than in code. Importantly, sustained multi-turn dialogue can cause degradation in emotional expressivity, but can enhance the recognizability of signals such as profession due to iterative contextual hints.
3. Uncertainty, Faithfulness, and Natural Language Calibration
A critical axis of LLM expressivity is the model’s ability to faithfully communicate its own epistemic uncertainty—effectively, to express in words its internal confidence about assertions. This requirement is technically subtle, as natural language confidence markers often become decoupled from underlying probabilistic beliefs.
A series of works—"Teaching Models to Express Their Uncertainty in Words" (Lin et al., 2022), "Relying on the Unreliable" (Zhou et al., 2024), "Uncertainty Distillation" (Hager et al., 18 Mar 2025), and "Can LLMs Faithfully Express Their Intrinsic Uncertainty in Words?" (Yona et al., 2024)—provide a rigorous empirical and algorithmic foundation.
Uncertainty can be characterized as either:
- Lexical uncertainty: The probability assigned to the surface string generated (e.g., in an autoregressive model).
- Semantic uncertainty: The probability that the model knows the correct answer, regardless of verbalization. Approximated via Monte Carlo simulations with clustering over normalized responses.
A principal metric emerging from this line of research is faithful response uncertainty: for assertion produced in response to question 0, define model confidence 1 (from output variation) and linguistic decisiveness 2 (how strongly 3 asserts 4), with the example-wise faithfulness score
5
Empirical evaluations reveal that current LLMs are weakly faithful by this metric: hedging language and epistemic signals in output are only loosely coupled with the model’s true uncertainty distribution, even under explicit prompting and few-shot demonstration (Yona et al., 2024). Overconfidence is observed to be systematically induced by RLHF alignment procedures that penalize hedging language in human-preference data (Zhou et al., 2024). Expected Calibration Error (ECE), Brier Score, and conditional mean faithfulness summarize the miscalibration and overconfidence in linguistic expressions observed in practical systems.
4. Algorithmic and Systems Advancements: Express for Causal Attention
A distinctive practical barrier to the expressivity of LLMs is imposed by the computational bottlenecks associated with causal attention, especially for long contexts and high-throughput inference. "Express Language Modeling" (Gong et al., 9 Jun 2026) introduces Express, a meta-algorithm that transforms any non-causal attention approximation (such as Thinformer’s subquadratic thinning/halving) into a streaming, causal attention mechanism with matching approximation guarantees.
The main result of Express is that it achieves, for a sequence of length 6 and compression budget 7:
- Approximation error: 8
- Memory overhead: 9
- Computation (compression) cost: 0
- Per-token query cost: 1
These rates match the best known for non-causal (encoder-like) approximations but apply in the strict causal (decoder) regime required for autoregressive LMs. Using a custom Triton implementation with I/O-aware tiling and batch/head parallelism, Express delivers 2 to 3 speedups over prior engineered attention approximations (FlashAttention 2, HyperAttention) across long-context prefill, KV cache compression, memory-constrained, and compute-constrained decoding. Importantly, architectural fidelity is preserved; there is no quality loss versus full attention at suitable cache sizes (Gong et al., 9 Jun 2026).
5. Expressivity in Model Architectures: All-MLP and MoE Models
The relation between architectural class and expressivity is accentuated by comparing standard attention-based architectures and sparse all-MLP designs. "Efficient Language Modeling with Sparse all-MLP" (Yu et al., 2022) demonstrates that dense all-MLP LMs suffer from deficits in expressivity and downstream generalization. By introducing two types of sparsely activated mixture-of-experts (MoE)—token-wise (tMoE) and feature-wise (sMoE)—parameter capacity and expressivity are substantially increased without higher compute cost. The sMLP architecture surpasses Transformer-based MoEs in perplexity and generalizes as well as GPT-3 (but at lower pretraining volume), integrally relying on receptive and conditional computation variety to realize high expressive power in practical LMs.
6. Methodological Implications and Open Directions
The research trajectory in express language modeling highlights the following methodological imperatives and direction:
- Faithful uncertainty communication needs explicit supervision: Alignment objectives must penalize mismatch between linguistic uncertainty markers and internal confidence distributions, rather than maximizing next-token likelihood alone (Yona et al., 2024, Hager et al., 18 Mar 2025).
- Calibration and expressivity metrics must aggregate both correctness and faithfulness: Metrics such as ECE, Brier, and faithfulness-correctness separation are essential.
- Practical implementation of highly expressive attention: Techniques such as Express enable models to realize theoretical expressive potential at scales compatible with long-context, low-latency deployment (Gong et al., 9 Jun 2026).
- Expressivity is multi-dimensional: It's a function of model architecture (attention vs. all-MLP), training protocols (uncertainty distillation, implicit communications), and user-facing design (prompting schemes, epistemic markers, conversational strategies) (Tint et al., 2024, Zhou et al., 2024).
- Open questions: The exact boundaries of expressivity under various architectural and precision regimes remain under theoretical investigation, especially for softmax-attention encoders and practical chain-of-thought augmented decoders (Strobl et al., 2023, Li et al., 29 May 2025).
7. Summary Table: Representative Advances in Express Language Modeling
| Subtopic | Key Contribution | Reference |
|---|---|---|
| Formal expressivity of Transformers | PTL equivalence for fixed-precision, masked, soft-attention models | (Li et al., 29 May 2025) |
| Efficient causal attention via coresets | Express meta-algorithm, matching non-causal provable bounds | (Gong et al., 9 Jun 2026) |
| Uncertainty verbalization and calibration | Uncertainty distillation, faithfulness metrics, linguistic markers | (Hager et al., 18 Mar 2025, Yona et al., 2024) |
| Implicit signal expressivity | ExpressivityArena, automated implicit-signal accuracy | (Tint et al., 2024) |
| All-MLP expressivity with sparse MoEs | sMLP achieves high expressivity and efficiency via dual MoE routing | (Yu et al., 2022) |
| RLHF calibration and overconfidence | Systematic reward model bias against uncertainty markers | (Zhou et al., 2024) |
In sum, express language modeling integrates formal theory, empirical methodologies, systems algorithms, and practical calibration to characterize and expand the interpretive capacity of modern LLMs. The synergy of algorithmic expressivity, uncertainty quantification, and domain adaptation forms the foundation for research on next-generation, reliable, and faithful LLMs.