A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws (2510.00504v1)

Published 1 Oct 2025 in stat.ML, cond-mat.dis-nn, cs.IT, cs.LG, and math.IT

Abstract: When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller models and substantially less data. In this work, we provide a positive and constructive answer. We prove that a generic permutation-invariant function of $d$ objects can be asymptotically compressed into a function of $\operatorname{polylog} d$ objects with vanishing error. This theorem yields two key implications: (Ia) a large neural network can be compressed to polylogarithmic width while preserving its learning dynamics; (Ib) a large dataset can be compressed to polylogarithmic size while leaving the loss landscape of the corresponding model unchanged. (Ia) directly establishes a proof of the \textit{dynamical} lottery ticket hypothesis, which states that any ordinary network can be strongly compressed such that the learning dynamics and result remain unchanged. (Ib) shows that a neural scaling law of the form $L\sim d^{-\alpha}$ can be boosted to an arbitrarily fast power law decay, and ultimately to $\exp(-\alpha' \sqrt[m]{d})$.

Summary

The paper introduces a universal compression theorem that uses permutation symmetry to significantly reduce neural network parameters and dataset size without loss in accuracy.
It demonstrates the Dynamical Lottery Ticket Hypothesis, revealing that smaller subnetworks exist within larger models that can match performance.
The study transforms traditional power-law scaling into stretched-exponential scaling, suggesting markedly lower data and computational requirements for efficient AI.

"A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws" (2510.00504)

Introduction

The paper explores a universal compression theory aimed at massively reducing the number of parameters and data points needed in neural networks, while maintaining or enhancing their performance. The authors provide a theoretical foundation that applies permutation symmetry to compress both neural networks and their training data. This leads to enhancements in neural scaling laws, proposing that significant compression can occur without degradation in model performance.

Figure 1: Illustration of the main idea behind the compressibility of neural networks and datasets.

Problem Setting

The core challenge addressed is the inefficiency in data utilization by large AI models. Neural scaling laws typically exhibit a slow power-law decay in error with respect to dataset size. This contradicts the biological efficiency seen in human learning. The authors propose a symmetric function framework for understanding this compression phenomenon and introduce permutation symmetry as a tool to reduce redundancy in datasets and network parameters.

Universal Compression Theorem

The paper formulates a universal compression theorem that demonstrates any permutation-symmetric function of a dataset can be expressed with far fewer objects, specifically $O(\operatorname{polylog}(d))$ in terms of necessary data points, without losing accuracy. The theorem underlines that many neural network and dataset parameters are redundant due to symmetric properties inherent in their construction.

Figure 2: Error scaling for compressing a general symmetric function.

Dynamical Lottery Ticket Hypothesis

A significant result of the paper is the assertion of the Dynamical Lottery Ticket Hypothesis. This hypothesis posits that within a large neural network, there exists a significantly smaller subnetwork that can be trained to achieve the same performance using the same initial parameter settings. This is proved by demonstrating that network dynamics, depicted mathematically as equivariant mappings, preserve performance when redundant neurons are compressed in accordance with symmetric set principles.

Figure 3: Dynamical LTH visualized showing comparable training dynamics before and after compression.

Improving Neural Scaling Laws

By employing compression techniques on both datasets and neural networks, the paper demonstrates the potential to convert power-law scaling ( $L \sim d^{-\alpha}$ ) into a more favorable stretched-exponential scaling ( $L \sim \exp(-\alpha' \sqrt[m]{d'})$ ). This enhancement drastically reduces data and computational resource requirements, suggesting opportunities for deploying AI models with lesser demands on extensive datasets.

Figure 4: Advantages of compression on scaling laws in dataset size and network width.

Practical Implications and Future Directions

The implications of this research are profound for the field of AI. By proving potential for significant reduction in data and parameter requirements, the paper opens pathways for the development of more efficient and ecological AI models capable of functioning at lower operational costs. Future directions suggested include refining compression algorithms for practical application and exploring new initialization schemes or sampling strategies to exploit these compression capabilities further.

Conclusion

"A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws" provides a robust theoretical framework for compressing neural networks and datasets using permutation symmetry. It challenges traditional scaling laws by empirically and theoretically proving that substantial performance gains can be achieved through intelligent compression, holding promise for the future of scalable and efficient AI development.