- The paper demonstrates that a context-sensitive random language model exhibits a finite-temperature phase transition with BKT-like characteristics, evidenced through Monte Carlo simulations.
- It employs Potts model dynamics and finite-size scaling to quantify order parameters like magnetization and susceptibility, highlighting critical exponents such as ν ≈ 2.5 and γ ≈ 2.0.
- Results challenge traditional views by showing that short-range context-sensitive rules can induce emergent long-range order, linking formal linguistics with statistical physics.
Phase Transition Phenomena in Context-Sensitive Random LLMs with Short-Range Interactions
This work systematically constructs a class of probabilistic generative LLMs—anchored in the framework of context-sensitive grammars (CSGs)—and analyzes them using statistical mechanics methodologies. The generative process consists of three rules: direct terminal production, binary expansion of non-terminals, and context-sensitive substitution, crucially restricting context length to be O(1) with respect to sentence length. Context sensitivity is parameterized via interactions inspired by the 1D K-state Potts model, implemented through a Metropolis-Hastings dynamics with only nearest-neighbor couplings. The formal structure positions these grammars strictly above conventional context-free grammars in the Chomsky hierarchy, with production probabilities controlled by q, t, and a symbol-mixing parameter ϵ.
The context-sensitive rules introduce local dependencies: each symbol replacement may depend on its immediate neighbors. The system's energy function is analogous to the Potts Hamiltonian, mapping symbol types to states, and the probabilistic rule application is temperature-gated, reflecting thermal fluctuations in analogy to physical systems.
Figure 1: Schematic illustration of Potts vectors ek for K=2,3,4, enforcing the ∑kek=0 constraint and equiprobable structure.
Statistical Physics Observables
The model's order-disorder dynamics are probed through established physical observables:
- Magnetization M: Defined as the norm of the mean Potts vector over the sentence, serving as an order parameter analogous to ferromagnetic order.
- Susceptibility χ: Capturing fluctuations of K0, indicative of critical behavior.
- Binder Cumulant K1: Sensitive to the kurtosis of the K2 distribution; in the BKT scenario, this is a diagnostic for distinguishing between critical and ordered/disordered phases.
- Correlation Functions: Measuring connected two-point functions at fixed relative separation, which reveal spatial structure, particularly the emergence of power-law decay.
- Finite-Size Scaling: Utilized to extract critical exponents and characterize the universality class of the transition.
Numerical Investigation of Phase Transitions
Comprehensive Monte Carlo simulations are conducted across a range of K3, sentence lengths K4 (up to K5), and grammatically relevant parameters (K6, K7, K8), primarily focusing on the regime K9 (maximally context-sensitive).
The primary result is the unambiguous identification of a finite-temperature phase transition, even with rigorously restricted short-range interactions. At the critical temperature q0, susceptibility diverges and Binder cumulant crosses zero. Notably, the transition is not conventional second-order, but displays characteristics of a Berezinskii-Kosterlitz-Thouless (BKT) transition: power-law correlations are sustained not only at q1 but over an extended low-temperature phase.


Figure 2: Temperature dependence of magnetization, showing singularity at q2 across increasing q3, indicative of phase transition.
Figure 3: Correlation function between distant sites, revealing power-law decay below q4, contrasting with rapid exponential decay above.
Finite-size scaling analysis corroborates the BKT scenario, with extracted critical exponents (e.g., q5, q6 for q7) differing from those associated with Ising/Potts universality classes in q8.
Figure 4: Data collapse of susceptibility illustrating reliable finite-size scaling and exponent extraction.
Phase Diagrams and Parameter Dependence
Extensive phase diagrams are mapped in both the q9–t0 and t1–t2 planes. The location of the BKT transition depends nontrivially on production probabilities. For t3, sentences tend to elongate, sustaining extensive correlations and enabling the critical phase, while for t4, grammar rules promote rapid termination, rendering global order impossible and annihilating the phase transition.
Figure 5: t5–t6 phase diagram showing the locus of transition temperatures as a function of production rule likelihood.
Figure 6: t7–t8 phase diagram, demonstrating sharp loss of criticality past t9.
For large alphabet sizes ϵ0, the model recapitulates a Zipf-like law for symbol frequencies, bolstering its linguistic plausibility, even in a maximally symmetric parameterization.

Figure 7: Rank-frequency plot demonstrating power-law (Zipfian) scaling for ϵ1 and ϵ2.
Theoretical Implications
These observations critically undermine the prevailing view—supported by classical arguments against phase transitions in ϵ3D short-range systems—that long-range interactions are essential for criticality in generative models of language. The emergence of BKT-like transitions here must be attributed to the compositional, history-sensitive structure induced by grammatical expansion, where effective long-range order is mediated through recursive substitution histories, not explicit interaction terms.
The discovery that local context sensitivity, minimally extended from context-freeness, is sufficient for supporting nontrivial critical phenomena substantively informs both computational linguistics (by mapping formal grammars to physical universality classes) and the statistical mechanics of nonequilibrium or history-dependent processes. The effective memory induced by grammar expansion plays a central role, yielding phenomena usually associated with physically higher-dimensional or long-range coupled systems.
Practical and Future Directions
The approach opens a platform for formal analysis of the universality and nature of emergent transitions in real language data and artificial LLMs. It suggests that linguistic compositionality and context sensitivity—rather than superficial long-range dependencies or model sizes—are the driving primitives for emergent abilities and abrupt transitions, paralleling observations in LLMs and neural systems.
Future research may extend this formalism to more realistic, hierarchically compositional grammars, and leverage analytical tools from non-equilibrium statistical physics to model real-world language statistics and scaling behaviors. The correspondence between CSG dynamics and Potts-like models may foster new connections with learning representations in deep neural architectures, where effective memory and hierarchical recursion are central.
Conclusion
A context-sensitive random LLM with strictly short-range interactions exhibits a finite-temperature BKT phase transition, as evidenced by singular behavior in magnetization, susceptibility, Binder parameter variation, and power-law correlations. This demonstrates that phase transitions in formal LLMs originate from intrinsic linguistic structure rather than long-range interactions. These results connect the theory of formal languages with nontrivial universality classes in statistical mechanics, underscoring the role of compositional history and context as the origin of emergent criticality in models of language.