Papers
Topics
Authors
Recent
Search
2000 character limit reached

Phase transition on a context-sensitive random language model with short range interactions

Published 1 Apr 2026 in cs.CL, cond-mat.stat-mech, and stat.ML | (2604.00947v1)

Abstract: Since the random LLM was proposed by E. DeGiuli [Phys. Rev. Lett. 122, 128301], LLMs have been investigated intensively from the viewpoint of statistical mechanics. Recently, the existence of a Berezinskii--Kosterlitz--Thouless transition was numerically demonstrated in models with long-range interactions between symbols. In statistical mechanics, it has long been known that long-range interactions can induce phase transitions. Therefore, it has remained unclear whether phase transitions observed in LLMs originate from genuinely linguistic properties that are absent in conventional spin models. In this study, we construct a random LLM with short-range interactions and numerically investigate its statistical properties. Our model belongs to the class of context-sensitive grammars in the Chomsky hierarchy and allows explicit reference to contexts. We find that a phase transition occurs even when the model refers only to contexts whose length remains constant with respect to the sentence length. This result indicates that finite-temperature phase transitions in LLMs are genuinely induced by the intrinsic nature of language, rather than by long-range interactions.

Summary

  • The paper demonstrates that a context-sensitive random language model exhibits a finite-temperature phase transition with BKT-like characteristics, evidenced through Monte Carlo simulations.
  • It employs Potts model dynamics and finite-size scaling to quantify order parameters like magnetization and susceptibility, highlighting critical exponents such as ν ≈ 2.5 and γ ≈ 2.0.
  • Results challenge traditional views by showing that short-range context-sensitive rules can induce emergent long-range order, linking formal linguistics with statistical physics.

Phase Transition Phenomena in Context-Sensitive Random LLMs with Short-Range Interactions

Model Construction and Formal Foundations

This work systematically constructs a class of probabilistic generative LLMs—anchored in the framework of context-sensitive grammars (CSGs)—and analyzes them using statistical mechanics methodologies. The generative process consists of three rules: direct terminal production, binary expansion of non-terminals, and context-sensitive substitution, crucially restricting context length to be O(1)O(1) with respect to sentence length. Context sensitivity is parameterized via interactions inspired by the 1D KK-state Potts model, implemented through a Metropolis-Hastings dynamics with only nearest-neighbor couplings. The formal structure positions these grammars strictly above conventional context-free grammars in the Chomsky hierarchy, with production probabilities controlled by qq, tt, and a symbol-mixing parameter ϵ\epsilon.

The context-sensitive rules introduce local dependencies: each symbol replacement may depend on its immediate neighbors. The system's energy function is analogous to the Potts Hamiltonian, mapping symbol types to states, and the probabilistic rule application is temperature-gated, reflecting thermal fluctuations in analogy to physical systems. Figure 1

Figure 1: Schematic illustration of Potts vectors eke_k for K=2,3,4K=2,3,4, enforcing the kek=0\sum_k e_k = 0 constraint and equiprobable structure.

Statistical Physics Observables

The model's order-disorder dynamics are probed through established physical observables:

  • Magnetization MM: Defined as the norm of the mean Potts vector over the sentence, serving as an order parameter analogous to ferromagnetic order.
  • Susceptibility χ\chi: Capturing fluctuations of KK0, indicative of critical behavior.
  • Binder Cumulant KK1: Sensitive to the kurtosis of the KK2 distribution; in the BKT scenario, this is a diagnostic for distinguishing between critical and ordered/disordered phases.
  • Correlation Functions: Measuring connected two-point functions at fixed relative separation, which reveal spatial structure, particularly the emergence of power-law decay.
  • Finite-Size Scaling: Utilized to extract critical exponents and characterize the universality class of the transition.

Numerical Investigation of Phase Transitions

Comprehensive Monte Carlo simulations are conducted across a range of KK3, sentence lengths KK4 (up to KK5), and grammatically relevant parameters (KK6, KK7, KK8), primarily focusing on the regime KK9 (maximally context-sensitive).

The primary result is the unambiguous identification of a finite-temperature phase transition, even with rigorously restricted short-range interactions. At the critical temperature qq0, susceptibility diverges and Binder cumulant crosses zero. Notably, the transition is not conventional second-order, but displays characteristics of a Berezinskii-Kosterlitz-Thouless (BKT) transition: power-law correlations are sustained not only at qq1 but over an extended low-temperature phase. Figure 2

Figure 2

Figure 2

Figure 2: Temperature dependence of magnetization, showing singularity at qq2 across increasing qq3, indicative of phase transition.

Figure 3

Figure 3: Correlation function between distant sites, revealing power-law decay below qq4, contrasting with rapid exponential decay above.

Finite-size scaling analysis corroborates the BKT scenario, with extracted critical exponents (e.g., qq5, qq6 for qq7) differing from those associated with Ising/Potts universality classes in qq8. Figure 4

Figure 4: Data collapse of susceptibility illustrating reliable finite-size scaling and exponent extraction.

Phase Diagrams and Parameter Dependence

Extensive phase diagrams are mapped in both the qq9–tt0 and tt1–tt2 planes. The location of the BKT transition depends nontrivially on production probabilities. For tt3, sentences tend to elongate, sustaining extensive correlations and enabling the critical phase, while for tt4, grammar rules promote rapid termination, rendering global order impossible and annihilating the phase transition. Figure 5

Figure 5: tt5–tt6 phase diagram showing the locus of transition temperatures as a function of production rule likelihood.

Figure 6

Figure 6: tt7–tt8 phase diagram, demonstrating sharp loss of criticality past tt9.

For large alphabet sizes ϵ\epsilon0, the model recapitulates a Zipf-like law for symbol frequencies, bolstering its linguistic plausibility, even in a maximally symmetric parameterization. Figure 7

Figure 7

Figure 7: Rank-frequency plot demonstrating power-law (Zipfian) scaling for ϵ\epsilon1 and ϵ\epsilon2.

Theoretical Implications

These observations critically undermine the prevailing view—supported by classical arguments against phase transitions in ϵ\epsilon3D short-range systems—that long-range interactions are essential for criticality in generative models of language. The emergence of BKT-like transitions here must be attributed to the compositional, history-sensitive structure induced by grammatical expansion, where effective long-range order is mediated through recursive substitution histories, not explicit interaction terms.

The discovery that local context sensitivity, minimally extended from context-freeness, is sufficient for supporting nontrivial critical phenomena substantively informs both computational linguistics (by mapping formal grammars to physical universality classes) and the statistical mechanics of nonequilibrium or history-dependent processes. The effective memory induced by grammar expansion plays a central role, yielding phenomena usually associated with physically higher-dimensional or long-range coupled systems.

Practical and Future Directions

The approach opens a platform for formal analysis of the universality and nature of emergent transitions in real language data and artificial LLMs. It suggests that linguistic compositionality and context sensitivity—rather than superficial long-range dependencies or model sizes—are the driving primitives for emergent abilities and abrupt transitions, paralleling observations in LLMs and neural systems.

Future research may extend this formalism to more realistic, hierarchically compositional grammars, and leverage analytical tools from non-equilibrium statistical physics to model real-world language statistics and scaling behaviors. The correspondence between CSG dynamics and Potts-like models may foster new connections with learning representations in deep neural architectures, where effective memory and hierarchical recursion are central.

Conclusion

A context-sensitive random LLM with strictly short-range interactions exhibits a finite-temperature BKT phase transition, as evidenced by singular behavior in magnetization, susceptibility, Binder parameter variation, and power-law correlations. This demonstrates that phase transitions in formal LLMs originate from intrinsic linguistic structure rather than long-range interactions. These results connect the theory of formal languages with nontrivial universality classes in statistical mechanics, underscoring the role of compositional history and context as the origin of emergent criticality in models of language.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.