Entropy-Memorization Law Overview

Updated 18 March 2026

The Entropy-Memorization Law is a principle that quantifies how system constraints imprint irreversible entropy, linking memory and disorder across physical and computational domains.
It is derived from thermodynamics and extended to quantum measurements and neural network training, delineating the trade-off between entropy gain and recoverable information.
The law underpins empirical scaling laws in statistical mechanics and machine learning, offering actionable insights for designing systems and addressing data privacy challenges.

The Entropy-Memorization Law (EM-Law) encompasses a set of rigorous, quantitative relationships that govern the interplay between entropy, structure, information retention, and memorization. Originally derived in statistical thermodynamics to clarify ambiguities around residual entropy and the third law, the EM-Law framework has emerged across physics, quantum information, neural computation, and large-scale machine learning. In all domains, the law formalizes how constraints, representations, or data regularities are “memorized” as enduring entropy features—whether manifested as irreducible disorder in materials, information loss in quantum measurements, scaling laws in model training, or sharp boundary effects in neural generative decoding.

1. Formal Foundations and Thermodynamic Origin

The EM-Law was first introduced as a refinement to the thermodynamic third law to resolve long-standing paradoxes concerning residual entropy in systems with internal constraints, such as glasses, random alloys, or defect crystals (Shirai, 2018). Each equilibrium state is characterized by a set of thermodynamic coordinates $\vec{q}=(q_1,\dots,q_m)$ , with entropy a well-defined function $S = S(\vec{q})$ . Internal constraints (e.g., rigid barriers, frozen atomic sites) fix additional “frozen” coordinates $\hat{r}$ , partitioning systems into distinct thermodynamic classes $\mathcal{C}_A$ , $\mathcal{C}_B$ . Classes separated by frozen coordinates lack a one-to-one mapping of all state variables.

Within a single class, the zero of entropy at $T\to 0$ is unique and shared. However, across classes separated by a frozen coordinate, the entropy offset $S_0^{AB}$ (residual entropy) is the memorized difference, encoded irreversibly when constraints are lifted:

$S_0^{AB} = S_0^{A} - S_0^{B} = s_r(r_A) - s_r(0)$

where $s_r(r)$ is the entropy contribution of the frozen coordinate. The process of unfreezing $r$ irreversibly reconstructs the entropy origin, observed as residual entropy, and constitutes the law’s eponymous “memorization” of system history.

The EM-Law rigorously connects all residual entropy phenomena—configurational, orientational, and defect-based—to a unified, constraint-driven mechanism, eliminating the need to invoke metastability or kinetic irreversibility. For example, in a binary alloy $S = S(\vec{q})$ 0, the mixing entropy per site at $S = S(\vec{q})$ 1,

$S = S(\vec{q})$ 2

is the signature of class transition and memorization of composition labels (Shirai, 2018).

2. Information-Theoretic and Quantum Generalizations

Quantum information theory makes the EM-Law explicit as a trade-off between entropy gain and retrievable information during measurements (Wang, 2019). For a quantum state with density matrix $S = S(\vec{q})$ 3 subject to a measurement (or any irreversible channel) inducing a transition $S = S(\vec{q})$ 4, the von Neumann entropy increases:

$S = S(\vec{q})$ 5

The retrievable (unlost) fraction of the original information is $S = S(\vec{q})$ 6, while the irretrievable fraction is $S = S(\vec{q})$ 7. This yields the core EM-Law identity for quantum systems:

$S = S(\vec{q})$ 8

In pure-to-mixed measurement (e.g., an $S = S(\vec{q})$ 9-qubit system projected onto basis states), the entropy gain $\hat{r}$ 0 and $\hat{r}$ 1, signifying exponential decay of recoverable information with entropy production.

The law applies equally to entangled states: for a maximally entangled Bell pair, local measurement produces entropy $\hat{r}$ 2 and $\hat{r}$ 3, independent of the chosen basis. In multipartite generalizations (GHZ and W states), the fractional information loss upon partial measurement follows precisely from the corresponding entropy gain, giving a universal characterization of quantum information destruction and retrieval (Wang, 2019).

3. Statistical, Dynamical, and Constraint-Induced Memorization

In first-principles statistical mechanics, the EM-Law is further sharpened: entropy is not a monotonic function but a stochastic variable described by a probability distribution $\hat{r}$ 4. The system's constraints $\hat{r}$ 5 (geometry, energy, access rules) shape this long-time entropy distribution $\hat{r}$ 6 by modulating the set of accessible macrostate volumes $\hat{r}$ 7 (Peng, 17 Feb 2026).

The law’s formal statement:

The entropy distribution $\hat{r}$ 8 encodes a lasting memory of constraints via $\hat{r}$ 9.
Any change in $\mathcal{C}_A$ 0 that modifies $\mathcal{C}_A$ 1 non-uniformly transforms $\mathcal{C}_A$ 2 structurally, except if all volumes scale by a common factor (resulting only in a translation in $\mathcal{C}_A$ 3).
This “memory” is permanent: even under time-reversal invariant dynamics, the shape of $\mathcal{C}_A$ 4 evidences past constraints.

Illustrative examples include gas partitioning (memory of wall presence/absence) and double-well potentials (memory of barrier height). Only uniform rescaling of accessible phase-space volumes preserves the shape of $\mathcal{C}_A$ 5; all other constraint changes imprint persistent statistical structure on the entropy distribution (Peng, 17 Feb 2026).

4. Neural Representation, Memorization, and Generalization

In neural networks and LLMs, the EM-Law governs both the statistical cost of memorization and the generalization capabilities of learned representations:

Representation entropy $\mathcal{C}_A$ 6 (Shannon or matrix-based) controls the generalization gap via

$\mathcal{C}_A$ 7

where $\mathcal{C}_A$ 8 is the number of training samples, and $\mathcal{C}_A$ 9 quantifies the internal representation entropy (Yu, 13 May 2025).

Alternating cycles of memorization (lowering cross-entropy, often increasing $\mathcal{C}_B$ 0) and compression (minimizing $\mathcal{C}_B$ 1 at the cost of cross-entropy slack) emerge naturally during training. Gated-Phase Transition (GAPT) algorithms can explicitly orchestrate these cycles, achieving improved test loss, out-of-distribution performance, and disentanglement of conflicting memories (Yu, 13 May 2025).

Information-theoretic analyses reveal a dichotomy in memorization patterns: shortcut (heuristic) memorization leads to low entropy and high mutual information between neural activations, while example-level memorization manifests as high entropy and low inter-neuron mutual information (Bansal et al., 2022). Monitoring activation entropy and MI allows robust, unlabeled detection of memorization regimes and supports more reliable model selection.

5. Entropy–Memorization Boundary Effects in Generative Models

Recent work demonstrates that in generative LLMs, memorized and unmemorized output segments are sharply separated by a discontinuity in decoding entropy (Chen et al., 2024). Formally:

For each token $\mathcal{C}_B$ 2,

$\mathcal{C}_B$ 3

At the memorization boundary $\mathcal{C}_B$ 4, a significant entropy jump $\mathcal{C}_B$ 5 nats is observed:

$\mathcal{C}_B$ 6

This sharp transition is stable across model scales and provides an empirical law to detect memorized (training-data) continuations versus novel generation.

Practically, low-entropy plateaus in generated text signal verbatim memorization and are exploitable for privacy or IP risk monitoring. Model-induced entropy regularization can be applied to mitigate unintended memorization without sacrificing generative diversity (Chen et al., 2024).

6. Scaling Laws, Memorization Difficulty, and Data Privacy

Empirical studies reveal the difficulty of data memorization in LLMs scales nearly linearly with sequence entropy (Huang et al., 8 Jul 2025). Using the token-level edit distance $\mathcal{C}_B$ 7 as a memorization score, averaged entropy $\mathcal{C}_B$ 8 over all examples with distance $\mathcal{C}_B$ 9 fits:

$T\to 0$ 0

with $T\to 0$ 1, $T\to 0$ 2, and Pearson $T\to 0$ 3. This law holds even in human-perceived “gibberish” strings; these exhibit lower token entropy than typical text and are more easily memorized by LLMs due to tokenization effects. The same entropy-memorization law enables dataset inference attacks (EMBEDI): by regressing the entropy-memorization line from model generations, one can accurately distinguish member and non-member datasets in an unsupervised fashion, exposing both privacy and provenance vulnerabilities (Huang et al., 8 Jul 2025).

7. Memory Complexity and the Phase Transition in Entropy Estimation

The EM-Law additionally characterizes the scaling of memory requirements for entropy estimation in finite-state computational models observing i.i.d. sequences (Berg et al., 2024):

$T\to 0$ 4

for moderate accuracy $T\to 0$ 5, but

$T\to 0$ 6

as $T\to 0$ 7. Once fine accuracy is required, all distinct symbols must be essentially “memorized,” marking a sharp phase transition in estimator complexity. This law applies equally to mutual information estimation with a corresponding bivariate alphabet scaling.

This scaling law has far-reaching implications for stream data analytics, network monitoring, and online learning, as it precisely quantifies the tradeoff between memory resources, estimation accuracy, and the inherent necessity of memorizing data support at fine scales (Berg et al., 2024).

Across disciplines, the Entropy-Memorization Law provides a unifying principle for the quantification, detection, and management of memory, information loss, and constraint-induced structure—from physical systems and quantum measurement, through high-dimensional machine representations, to practical generative models and privacy diagnostics.