Phase Transitions in Large Language Models
- Phase transitions in LLMs are abrupt, collective shifts in internal representations and behavior as model size, data scale, or training progress is tuned.
- Mathematical and statistical tools such as order parameters, f-divergences, and covariance spectral analysis rigorously characterize these non-analytic changes.
- Understanding these phase transitions provides actionable insights for engineering emergent abilities, diagnosing performance shifts, and guiding safe model alignment.
A phase transition in LLMs is an abrupt, collective shift in their internal representations or external behaviors as a function of a control parameter—such as model size, data scale, temperature, or training progression. These transitions, analogous to critical phenomena in statistical mechanics, are central to understanding “emergent abilities” and structural reorganizations underlying the performance and alignment characteristics of contemporary LLMs. They may manifest as non-analytic changes in order parameters, singularities in macroscopic observables, or sharp shifts in performance and statistical properties, often invisible to conventional loss curves. Phase transitions have now been rigorously characterized and quantitatively diagnosed across a range of axes and model regimes.
1. Formal Definitions and Theoretical Foundations
Phase transitions in LLMs are mathematically characterized by abrupt, non-analytic changes in one or more order parameters as a control variable is tuned. In analogy to physical systems, there exist two principal classes:
- Second-order (continuous) transitions: These are marked by the divergence of a susceptibility (e.g., variance of an internal statistic), without discontinuity in the order parameter itself, and by power-law scaling near a critical point (e.g., temperature or model depth). Diverging susceptibility is exemplified by the static susceptibility in (Nakaishi et al., 2024), where the integrated correlation of part-of-speech (POS) sequences diverges at a critical temperature .
- First-order (discontinuous or higher-depth) transitions: Here, an order parameter exhibits a finite jump at a critical value of the control variable, as in the sharp increase of logical reasoning error at a threshold complexity metric in (Zhang et al., 6 Jan 2026).
Mathematical tools are adapted from classical and quantum statistical mechanics (O() models, spin-glass theory, renormalization group flows), probability theory (Poisson and sub-Poisson statistics), and information theory (f-divergences, Fano factors). Across these frameworks, an “order parameter” is any model-internal or output-level statistic that changes non-trivially at the critical point, such as cross-channel coherence, overlap in spin-glass models, or task accuracy as a function of data complexity.
2. Critical Phenomena in Model Scaling, Training, and Data Regimes
Model Size and Capacity Transitions
- Sharp onsets of capability: Experiments reveal that as model size crosses a critical threshold—with all other factors held fixed—discrete emergent skills (such as knowledge memorization, arithmetic, logical reasoning) appear suddenly, not smoothly (Gu et al., 23 May 2025, Sun et al., 2024, Chang, 2023).
- Information-theoretic explanation: Capacity allocation in finite models can be rigorously described by a knapsack framework, with phase transitions occurring as optimal allocation shifts discontinuously across datasets (Gu et al., 23 May 2025). The memorized fraction transitions sharply from zero to as either mixing ratio or model size surpasses a critical value, governed by scaling relations such as .
Training Progression and Data Size
- Delayed generalization as phase transition: Under grokking and classic data scaling, there is a critical data size for each model size such that the test accuracy remains at chance until is crossed, after which rapid generalization occurs (Zhu et al., 2024). For fixed architecture, as , one observes an abrupt transition in test behavior, often with a characteristic delay (grokking time ), especially for modular or synthetic tasks.
- Three-regime dynamics: The triple phase transition framework links brain-model alignment, probing accuracy, and downstream performance: (1) rapid alignment with instruction following, (2) detachment and associated task stagnation, (3) realignment with emergent downstream mastery (Nakagi et al., 28 Feb 2025).
3. Statistical and Structural Transition Metrics
Output Distribution and Statistical Distances
- System-agnostic diagnostics: f-divergences (KL, JS, TV) and classifier-based “g-dissimilarities” robustly detect macroscopic reorganizations of the output distribution under parametric sweeps (temperature, prompt type, or epoch) (Arnold et al., 2024, Nakaishi et al., 2024, Arnold et al., 27 Aug 2025).
- Joint multi-metric analysis: Poisson and sub-Poisson statistics on word/event counts, dispersion indices, and KL divergence from Poisson capture phase transitions in the formation of lexical coherence and error suppression (Hong, 2023, Hong et al., 16 Nov 2025). These transitions are evidenced by narrow temporal windows where Fano factors for correct and incorrect words cross unity, aligning with coherence emergence in hidden-state dynamics.
Internal Representation and Hidden-State Structure
- Coherence as order parameter: Cross-channel correlation matrices (), auto-correlation manifolds, and the off-diagonal coherence serve as robust witnesses of phase transitions in internal wave dynamics (Hong, 2023).
- Topological and spectral signatures: The covariance spectrum of residual stream activations transitions from a random-matrix (Marchenko–Pastur) bulk (“liquid” phase) to a low-rank, spiked spectrum (“solid” phase), marked by a discontinuity in an analytically defined localization order parameter at critical layer depth (Alpay et al., 16 Jan 2026). This regime change corresponds to the appearance of Transient Class Objects (TCOs), stable deep-layer basins representing discrete semantic classes.
Logical and Algorithmic Phase Transitions
- Logical complexity collapse: In symbolic reasoning tasks, LLMs maintain stable accuracy across increasing logical depth or complexity until a sharply-defined “critical” LoCM is reached, at which point performance collapses to random chance (Zhang et al., 6 Jan 2026). These transitions, termed “Logical Phase Transitions,” are not smooth, paralleling capacity thresholds in physical phase transitions.
- Algorithmic instability: Mechanistic studies reveal abrupt flips between distinct problem-solving circuits as task specification (e.g., number of digits in arithmetic) is varied, indicating algorithmic phase transitions. Fine-grained analysis with activation patching and subcircuit clustering confirms that models do not smoothly interpolate but instead select from a few discrete algorithmic regimes (Sun et al., 2024).
4. Physical Mappings, Universality, and Statistical Models
- Spin-glass and O() models: Transformers can be rigorously mapped onto O() statistical field theories or spin-glass Hamiltonians, with temperature, model size, or depth serving as control variables (Sun et al., 27 Jan 2025, George et al., 5 May 2025). Observable phase transitions include:
- Temperature-driven transitions: At , the energy per token and specific heat undergo singularities, marking a transition from ordered (coherent) to disordered (noisy or creative) output.
- Parameter-size transitions: At , models acquire meta-awareness of incoherence, manifesting as a sign change in —a practical capability criterion.
- Self-organized criticality of language: Statistical analyses of LLM-generated outputs and natural text show that natural language operates at or near the critical point, characterized by power-law correlations, divergent susceptibility, and critical slowing down (Nakaishi et al., 2024).
5. Behavioral, Alignment, and Safety Transitions
- Fine-tuning phase transitions: Targeted or broad distributional changes may be detected during alignment or misalignment fine-tuning using rigorous statistical distances over the output distributions and a suite of order parameters (alignment, style, confidence, completeness, etc.) (Arnold et al., 27 Aug 2025). Notably, the main behavioral transition can lag behind classical warnings such as gradient-norm peaks.
- Emergent misalignment and order parameter decomposition: Only a fraction of the total behavioral shift during a phase transition is attributable to any one property (e.g., alignment), with style and confidence often explaining greater fractions of the transition. The residual variation points to latent or as-yet-unmonitored axes of behavioral change.
6. Broader Implications and Practical Takeaways
- Predictability and universality: Many phase transitions in LLMs can be quantitatively predicted by power-law scaling relations (e.g., critical mixing ratio vs. model size, critical temperature), enabling targeted resource allocation and mixture design (Gu et al., 23 May 2025, Sun et al., 27 Jan 2025).
- Diagnostic and engineering tools: Lightweight statistical probes—Poisson windowing, f-divergences, activation-patching, and order parameter tracking—afford early warnings for phase transitions, even where external benchmarks are insensitive (Hong et al., 16 Nov 2025, Arnold et al., 2024).
- Control and intervention: Regularization or curriculum schemes can be designed to delay, advance, or smooth phase transitions, potentially preserving desired “human-like” characteristics or extending the regime of stable logical/algorithmic generalization (Aoyama et al., 26 Feb 2025, Zhang et al., 6 Jan 2026).
- Emergent ability engineering: The non-ergodic, resource-constrained TAP framework formalizes the interplay of model, data, and context constraints in shaping the landscape of accessible capabilities; new emergent abilities arise precisely when combined constraints cross a critical threshold (Marín, 3 Jan 2025). Finite-size scaling and RG-style analyses provide guidance for anticipating and controlling stepwise emergent behaviors (Alpay et al., 16 Jan 2026).
| Transition Axis | Order Parameter(s) | Key Observable(s) | arXiv IDs |
|---|---|---|---|
| Model size/capacity | Memorized fraction, , | Sharp accuracy jump, energy, spectral collapse | (Gu et al., 23 May 2025, Sun et al., 27 Jan 2025, Alpay et al., 16 Jan 2026) |
| Training/data scale | Generalization accuracy, | Onset of generalization (grokking), stagnation | (Zhu et al., 2024, Nakagi et al., 28 Feb 2025) |
| Decoding temperature | Energy, overlap, susceptibility | Order/disorder, creativity, singularity in output | (Sun et al., 27 Jan 2025, George et al., 5 May 2025, Nakaishi et al., 2024) |
| Output distribution | f-divergence, dispersion indices | Peaks in , Poisson→sub-Poisson transitions | (Arnold et al., 2024, Hong et al., 16 Nov 2025, Hong, 2023) |
| Internal representation | Coherence, localization | Cross-channel correlation, covariance spectrum | (Hong, 2023, Alpay et al., 16 Jan 2026) |
| Logical/algorithmic depth | LoCM, circuit stability metrics | Performance collapse, circuit switching | (Zhang et al., 6 Jan 2026, Sun et al., 2024) |
| Behavioral alignment | Alignment, style, confidence OPs | Distributional break, OP decomposition | (Arnold et al., 27 Aug 2025, Aoyama et al., 26 Feb 2025) |
7. Limitations and Directions for Future Study
- Finite-size effects: Many phase transitions broaden or shift at larger model or data scales, necessitating careful finite-size scaling analyses to identify universal exponents and critical thresholds (Nakaishi et al., 2024, Alpay et al., 16 Jan 2026).
- Hidden transitions: Standard loss and validation curves may conceal phase transition points; tailored diagnostics are necessary for reliable detection (Hong et al., 16 Nov 2025, Hong, 2023).
- Black-box and scaling challenges: Output-only or sampled-model access limits the power of f-divergence-based diagnostics, especially in very large LLMs, and may necessitate advanced classifier-based estimators (Arnold et al., 2024).
- Generalization to other domains: While criticality and phase-transition frameworks have been validated in text-based LLMs, extension to vision, multi-modal, and RL models is an open area of research (Aoyama et al., 26 Feb 2025).
Phase transitions in LLMs provide a quantitative, predictive, and physically principled lens through which to study emergent abilities, internal reorganizations, and failure modes. The formalization of these phenomena brings interpretability, methodology, and new training and safety pathways to the frontier of language modeling.