Frame Quantization of Neural Networks (2404.08131v1)
Abstract: We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory. Specifically, we use first-order Sigma-Delta ($\Sigma\Delta$) quantization for finite unit-norm tight frames to quantize weight matrices and biases in a neural network. In our scenario, we derive an error bound between the original neural network and the quantized neural network in terms of step size and the number of frame elements. We also demonstrate how to leverage the redundancy of frames to achieve a quantized neural network with higher accuracy.
- Güntürk, C. Sinan. Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation. Journal of the American Mathematical Society 17.1 (2004): 229-242.
- Güntürk, C. Sinan. One‐bit sigma‐delta quantization with exponential accuracy. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 56.11 (2003): 1608-1630.
- Wang, Yang. Sigma–Delta quantization errors and the traveling salesman problem. Advances in Computational Mathematics 28 (2008): 101-118.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.