Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis (2508.11020v1)

Published 14 Aug 2025 in cs.LG

Abstract: Quantization is an essential technique for making neural networks more efficient, yet our theoretical understanding of it remains limited. Previous works demonstrated that extremely low-precision networks, such as binary networks, can be constructed by pruning large, randomly-initialized networks, and showed that the ratio between the size of the original and the pruned networks is at most polylogarithmic. The specific pruning method they employed inspired a line of theoretical work known as the Strong Lottery Ticket Hypothesis (SLTH), which leverages insights from the Random Subset Sum Problem. However, these results primarily address the continuous setting and cannot be applied to extend SLTH results to the quantized setting. In this work, we build on foundational results by Borgs et al. on the Number Partitioning Problem to derive new theoretical results for the Random Subset Sum Problem in a quantized setting. Using these results, we then extend the SLTH framework to finite-precision networks. While prior work on SLTH showed that pruning allows approximation of a certain class of neural networks, we demonstrate that, in the quantized setting, the analogous class of target discrete neural networks can be represented exactly, and we prove optimal bounds on the necessary overparameterization of the initial network as a function of the precision of the target network.

Summary

The paper extends the Strong Lottery Ticket Hypothesis by adapting it to finite-precision quantized networks, providing a rigorous theoretical framework.
It employs insights from RSSP and NPP along with a parameter counting argument to establish optimal bounds on network width relative to quantization precision.
The study offers actionable guidelines for designing energy-efficient neural networks by reducing overparameterization while ensuring exact model representation through pruning.

Detailed Analysis of "Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis"

Introduction

The paper "Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis" confronts the challenge of neural network efficiency through the lens of quantization and pruning. It extends the theoretical framework known as the Strong Lottery Ticket Hypothesis (SLTH) to the quantized domain, building on foundational work in combinatorial optimization, particularly results from the Random Subset Sum Problem (RSSP) and Number Partitioning Problem (NPP). This approach aims to provide a deeper theoretical understanding of quantization and its interplay with network overparameterization.

Theoretical Foundation and Methodology

The authors leverage the seminal results by Borgs et al. on the NPP to address the gap left by previous SLTH research concerning the quantized setting. Traditional SLTH research focused on continuous-weight scenarios, thus encountering limitations when attempting to apply these findings to quantized networks. The current work innovates by adapting SLTH to finite precision, thereby formalizing bounds on the overparameterization necessary for quantization.

To manage this, the authors utilize insights from RSSP, a problem akin to the formation of subnetworks within a neural network. They revisit the classical NPP framework, utilizing it to derive precise bounds for constructing discrete neural networks. The theoretical exploration herein is paramount, as it transitions the SLTH from its reliance on continuous values to a field where discrete, finite-precision values are managed explicitly.

Key Contributions

Exact Representation and Pruning

The paper advances the understanding that discrete neural networks can be represented exactly by appropriately pruned random subnetworks of quantized layers. This contrasts with previous assertions that focused on approximation only. The developed theory outlines optimal bounds concerning the size and precision of initial networks, effectively reducing the necessary overparameterization for achieving precise results in quantized settings.

Parameter Counting Argument

A parameter counting argument enforces lower bounds on network width in terms of quantization precision, corroborating the theoretical results. This establishes the critical notion that to represent certain classes of quantized networks, an increase in width is inevitable and optimally constrained by a logarithmic factor in relation to the quantization precision.

Practical Implications and Future Directions

Practically, this research underpins strategies for deploying neural networks on resource-constrained hardware, where energy efficiency is as critical as model accuracy. The results guide the construction of neural networks that are optimal in terms of both size and accuracy under quantization constraints, aligning model design with the hardware's capabilities.

The extension to other architectures, including convolutional and unfolding networks, and further investigation into mixed-precision strategies may expand the utility of these theoretical insights. Additionally, exploring stochastic noise resilience in quantized scenarios could further enhance applicability in real-world deployments.

Conclusion

The research presented in this paper culminates in a robust theoretical foundation for the SLTH in quantized settings, resolving critical questions about the interplay between quantization precision and network overparameterization. By establishing tight bounds and advocating for significant reductions in network size, this work marks a significant step toward efficient neural network deployment within limited computational resources. The methodologies and results outlined herein will undoubtedly act as a springboard for further research into neural network optimization through theoretical and practical lenses.