Regulation as Compression: A Unified Framework

Updated 20 November 2025

Regulation as compression is a framework that reinterprets regulatory processes in control systems and machine learning as forms of information compression, preserving only the useful variety.
It integrates principles from cybernetics, mutual-information bounds, and rate–distortion theory to explain model selection, signal restoration, and robust control.
This approach guides optimal hyperparameter tuning by balancing compression strength with empirical risk and generalization, with practical implications for deep learning and control applications.

Regulation as compression refers to a unified theoretical and algorithmic perspective in which regulatory processes—spanning cybernetic control, neural network regularization, model selection, signal restoration, and feedback control—are reinterpreted as special instances of information compression, complexity regularization, or rate–distortion allocation. This perspective has emerged through the confluence of classical cybernetics (Every Good Regulator Theorem), modern information theory (mutual-information bounds and rate–distortion functions), and practical advances in deep learning and signal processing. The central insight is that effective regulation, whether in controlling dynamical systems or synthesizing robust models, can be cast as identifying a compressed or regularized representation that preserves exactly the “useful variety” required for robust control, decision-making, or restoration—no more, no less.

1. Theoretical Foundations: Regulation, Compression, and Useful Variety

At the core of the regulation-as-compression philosophy is the Every Good Regulator Theorem (EGRT) of Conant and Ashby, which formalizes that “every good regulator of a system must be a model of that system.” Mathematically, this means a regulator $R$ is related to the system $S$ via a bijective mapping $f: R \leftrightarrow S$ on the relevant regulatory subspace. This mapping is a form of structured compression: the regulator $R$ is a lower-dimensional or simplified encoding of the full system states, but is sufficient to distinguish all states relevant for regulation. The compression must preserve the mutual information $I(R; S)$ that quantifies the entropy of the behaviors that matter in achieving regulatory goals; i.e., $I(R; S) \ge H(G)$ for some goal set $G$ (Alicea et al., 28 Jun 2025).

In information-theoretic regularization, this compression principle appears as a trade-off between the complexity of the representation and the loss in fidelity. Rate–distortion functions directly relate the allowable loss (regulation error, empirical risk, restoration distortion) to the minimal information content (rate) or model class complexity necessary to achieve a given regulatory or inferential objective (Bu et al., 2019).

2. Regularization via Compression in Machine Learning

In deep learning and statistical inference, model compression has been shown to act as a form of regularization. Fitted models, especially overparametrized neural networks, tend to overfit by encoding spurious noise from training data. Model compression, for example via quantization, clustering, or pruning, reduces the parameter space and, crucially, the mutual information $I(W; \widehat W)$ between the training data and the learned predictor, thus tightening generalization bounds. The population risk of a compressed predictor is given by

$\mathbb E[L_\mu(\widehat W)] = \mathbb E[L_S(W)] + \text{gen}(\mu, P_{\widehat W|S}) + D(R)$

where $D(R)$ is the empirical risk distortion induced by compression at rate $R$ (Bu et al., 2019).

A critical insight is that there exists a nontrivial “sweet spot” where the gain in generalization error (decreased overfitting due to reduced $I(W; \widehat W)$ ) outweighs the increase in empirical risk from lossy compression ( $D(R)$ ), yielding improved overall test accuracy for the compressed model. Practically, diameter-regularized Hessian-weighted $K$ -means quantization minimizes both rate–distortion and generalization bounds, while empirical studies on MNIST and CIFAR-10 validate this behavior (Bu et al., 2019).

3. Compression Frequency as a Regularization Hyperparameter

In the training of neural networks, compression is often applied either as a one-shot post-hoc operation or continuously during training (compression-aware training). The introduction of a “compression frequency” hyperparameter $f \in \mathbb N$ enables compression to act as an occasional regularizer: every $f$ mini-batches, a compression operator $C(\cdot)$ is applied to the weights $\theta$ , enforcing regularization at a tunable rate (Lee et al., 2021). The per-step regularization strength is then

$s(r, f) = \frac{e_w(r)}{f}$

where $e_w(r)$ is the average per-weight change from compression of strength $r$ . This provides two degrees of freedom—compression strength and frequency—for fine-grained control of regularization.

Empirical evidence corroborates that modulating $f$ achieves a Pareto-optimal trade-off between generalization and model performance. Occasional compression (large $f$ ) can match or exceed the performance of per-batch compression ( $f=1$ ) at lower computational and algorithmic costs, especially when the compression operator is expensive (e.g., SVD, heavy quantization). This principle is robust across architectures, datasets, and compression strategies (pruning, quantization, low-rank SVD) (Lee et al., 2021).

4. Goal-Oriented Compression in Control and Restoration

In control systems, particularly for linear dynamical systems, the problem of state regulation with communication or resource constraints can be mapped precisely to a rate–distortion allocation problem. The controller receives only a compressed version $\hat x_t$ of the true state $x_t$ , with the compression noise governed by the allocated rate $r_t$ (bits). The increase in control cost due to compression is

$J^{(c)} - J^{(p)} = \sum_{t=0}^{T-1} a_t 2^{-2r_t}$

where $a_t$ quantifies the cost sensitivity at $t$ (Wang et al., 14 Jul 2024). The optimal allocation of finite rate budget over time is solved by a water-filling strategy: $r_t^* = \max\left\{0, \frac{1}{2} \log_2\left(\frac{2a_t \ln2}{\lambda}\right)\right\}$ where $\lambda$ is chosen so that the total rate meets the resource constraint. This solution ensures that the regulatory error induced by compression is distributed optimally over time, with more resources allocated to time points with higher sensitivity—a direct manifestation of the “regulation as compression” principle.

In complexity-regularized signal restoration, restoration is performed by minimizing a sum of data-fidelity (e.g., squared error) and the bit-cost $s(x)$ of the compressed estimate $x$ , leading to a variational MAP problem: $\hat x = \arg\min_x \|y - Hx\|_2^2 + \mu s(x)$ with $s(x)$ computed using standard codecs (JPEG2000, HEVC) (Dar et al., 2017). The iterative ADMM algorithm alternates between deconvolution and rate–distortion compression, acting as a complexity-regularizer. In the circulant (DFT-diagonalizable) case, the optimal solution allocates more bits to frequencies less attenuated by the degradation—again, regulation (restoration accuracy) achieved by optimal bit allocation (compression) across subspaces.

5. Physical Paradigms and Modern AI Implementations

Physical systems such as self-organized criticality sandpiles and diffusion-based denoising notably instantiate regulation as compression. In these cases, a regulator does not track the entire microstate but instead compresses the dynamical history into just those statistics necessary for robust prediction or control: e.g., rank-ordered avalanche sizes, cumulative distribution functions, or learned low-dimensional representations (Alicea et al., 28 Jun 2025). The mutual information preserved by this compression matches the “useful variety” required for regulatory intervention.

Modern machine learning architectures apply these principles in constructing world models. Variational autoencoders, predictive latent-dynamics models (e.g., Dreamer), generative flow networks, and graph-SLAM formulate the representation (regulator) as a compressed but sufficient latent code of the environmental state, with explicit constraints or regularizers ensuring that only the task-relevant distinctions are preserved.

6. Methodological Summary and Practical Guidelines

The table below contrasts key regulatory-compression approaches across domains:

Domain	Compression Principle	Key Mathematical Formulation
Deep Learning	Occasional compression	$s(r, f) = e_w(r)/f$ for per-step regularization (Lee et al., 2021)
Model Selection	Rate–distortion tradeoff	$D(R)$ , $I(W; \widehat W)$ (Bu et al., 2019)
LQG Control	Water-filling allocation	$J^{(c)} - J^{(p)} = \sum a_t 2^{-2r_t}$ (Wang et al., 14 Jul 2024)
Signal Restoration	Complexity regularization	$\\|y - Hx\\|^2 + \mu s(x)$ (Dar et al., 2017)
Cybernetics (EGRT)	Useful variety encoding	$I(R; S) \geq H(G)$ (Alicea et al., 28 Jun 2025)

Practical guidelines recurring in these works include: (1) tuning compression frequency and/or bit allocation as an explicit hyperparameter to optimize the trade-off between generalization and fit, (2) viewing model compression not solely as an efficiency measure but as a regularization strategy, and (3) constructing regulatory mappings or representations that minimally suffice to encode only the distinctions that matter for the regulatory or control objective.

7. Implications, Extensions, and Open Questions

The regulation-as-compression paradigm establishes a unified language for understanding regularization, model reduction, feedback, restoration, and information flow. It quantifies the principle that complexity control—whether of weights, representations, or communication resources—should be directly informed by the information requirements of the regulatory goal. A plausible implication is that future learning and control algorithms could explicitly and adaptively match the compression of their representations to the evolving regulatory needs, informed by mutual information or rate–distortion measures.

Open challenges remain in extending these compression-based frameworks to highly non-Gaussian, nonstationary, or adversarial regimes found in modern reinforcement learning and world-model scenarios, where useful variety and regulatory goals are context-dependent and the mapping from regulator to system is neither fixed nor invertible. Nonetheless, the synthesis of cybernetics, rate–distortion theory, and regularization offers a robust platform for principled model and controller design across disciplines (Alicea et al., 28 Jun 2025, Bu et al., 2019, Lee et al., 2021, Wang et al., 14 Jul 2024, Dar et al., 2017).